Research: Model-data integration in hydrology

Uncertainty quantification for mountainous watersheds

Study site: East River, Colorado, United States

This project studys the impacts of uncertain factors, which range from bedrock to canopy to climate disturbances, within diverse floodplains and hillslopes situated in mountainous watersheds. We leverage advanced modeling, machine learning and decision science to learn the importance of underlying factors, ultimately enhancing our ability to quantify the uncertainty of both water quantity and quality under future climate disturbances.

Papers are in preparation.

The East River Watershed. The site photo is provided by Professor Kate Maher at Stanford University

Surrogate modeling for process-based modeling

Study site: Savannah River Site, South Carolina, United States

Understanding the climate impact on groundwater contamination requires us to quantify the uncertainty from both subsurface and climate properties. Doing so requires fast and accurate numerical simulations. We address the computational cost challenge for numerical models using a physics-informed neural network, where we combine the Neural Opertor with the physics-informed loss functions. We demonstrate our surrogate modeling approach at one testbed: Savannah River Site (SRS) F-Area. The faster surrogate model can help us assess the spatiotemporal variations of groundwater contamination under uncertain climate disturbances more efficiently. The ultimate goal is to provide decision solutions for contaminated sites monitoring.

Conference proceedings: Wang et al., Machine Learning and the Physical Sciences workshop, NeurIPS 2022

Journal paper: Meray* and Wang* et al., Computers & Geosciences, 2023

Statistical modeling as the alternative to process-based modeling

Study site: Jutland, Denmark

Agricultural nitrate pollutants infiltrate into the subsurface and contaminate groundwater. The redox environment in the subsurface is important for the natural removal of nitrate by denitrification. However, the redox structure modeling in 3D requires additional assumptions and the process-based modeling is difficult to formulate and to solve. This project combines the geophysical survey: towed transient electromagnetic resistivity (tTEM) and redox boreholes to model 3D redox architecture stochastically using statistical learning, geostatistics and local inversion methods. This statistical learning framework also provides important resistivity structures for domain experts to understand what controls the redox conditions. The highly accurate redox architecture supports a better agricultural regulation decision.

Paper: Wang et al., Hydrogeology Journal, 2023

Machine learning-based model calibration methods

Bayesian inference is commonly applied to quantify uncertainty of hydrological variables. However, Bayesian inference is usually focused on spatial hydrological properties instead of hyperparameters or non-gridded physical global variables. This project presents a hierarchical Bayesian framework to quantify uncertainty of both global and spatial variables. We propose a machine learning-based model calibration method to estimate the joint distribution of data and global variables directly without introducing a statistical likelihood. We propose a new local dimension reduction method: local principal component analysis (local PCA) to update large-scale spatial fields with local data more efficiently. With the help of machine learning and efficient dimension reduction, the model calibration becomes approachable for high-resolution hydrologic models.

Paper: Wang et al., Water Resources Research, 2022

Data-knowledge-driven geological interfaces modeling

Study site: Greenland and South Australia, Australia

Modeling complex geological interfaces is a common task in geosciences. Many data sources are available for geological interface modeling, including borehole data and geophysical surveys. Geological knowledge, such as the delineation from geologists, is difficult to quantify but likely adds value to geological interface modeling. To integrate all information, this project presents a data-knowledge-driven trend surface analysis method to construct stochastic geological interfaces. A Metropolis–Hastings sampling framework is designed to sample stochastic trend interfaces and quantify the uncertainty of geological interfaces. We demonstrate our method in three different test cases: modeling stochastic interfaces of Greenland subglacial topography, magmatic intrusion, and buried river valleys in Australia.

Paper: Wang et al., Computers & Geosciences, 2023

Active learning to iteratively label new geophysical data

Seismic interpretation plays an essential role in locating subsurface horizons and understanding geological formations. This project investigates how to use semi-supervised segmentation to improve horizon predictions, even if we have only a few labeled horizons. An active learning framework is also proposed to label the most uncertain unlabeled sections, given the uncertainty estimation using deep ensembles. We believe our work helps geophysicists reduce the amount of labeling effort and achieve higher facies classification accuracy with the same amount of labeling work.

Paper: Wang et al., Geophysics, 2023

Lijing Wang