Research: Model-data integration in hydrology
Uncertainty quantification for mountainous watersheds
- Study site: East River, Colorado, United States
This project studys the impacts of uncertain factors, which range from bedrock to canopy to climate disturbances, within diverse floodplains and hillslopes situated in mountainous watersheds. We leverage advanced modeling, machine learning and decision science to learn the importance of underlying factors, ultimately enhancing our ability to quantify the uncertainty of both water quantity and quality in the near future.
Papers are in preparation.
Surrogate modeling for process-based modeling
- Study site: Savannah River Site, South Carolina, United States
Understanding the climate impact on groundwater contamination requires us to quantify the uncertainty from both subsurface and climate properties. Doing so requires fast and accurate numerical simulations. We address the computational cost challenge for numerical models using a physics-informed neural network, where we combine the Neural Opertor with the physics-informed loss functions. We demonstrate our surrogate modeling approach at one testbed: Savannah River Site (SRS) F-Area. The faster surrogate model can help us assess the spatiotemporal variations of groundwater contamination under uncertain climate disturbances more efficiently. The ultimate goal is to provide decision solutions for contaminated sites monitoring.
Conference proceedings: Wang et al., Machine Learning and the Physical Sciences workshop, NeurIPS 2022
Journal paper is under review.
Statistical modeling as the alternative to process-based modeling
- Study site: Jutland, Denmark
Agricultural nitrate pollutants infiltrate into the subsurface and contaminate groundwater. The redox environment in the subsurface is important for the natural removal of nitrate by denitrification. However, the redox structure modeling in 3D requires additional assumptions and the process-based modeling is difficult to formulate and to solve. This project combines the geophysical survey: towed transient electromagnetic resistivity (tTEM) and redox boreholes to model 3D redox architecture stochastically using statistical learning, geostatistics and local inversion methods. This statistical learning framework also provides important resistivity structures for domain experts to understand what controls the redox conditions. The highly accurate redox architecture supports a better agricultural regulation decision.
Machine learning-based inversion methods
Bayesian inversion is commonly applied to quantify uncertainty of hydrological variables. However, Bayesian inversion is usually focused on spatial hydrological properties instead of hyperparameters or non-gridded physical global variables. This project presents a hierarchical Bayesian framework to quantify uncertainty of both global and spatial variables. We propose a machine learning-based inversion method to estimate the joint distribution of data and global variables directly without introducing a statistical likelihood. We propose a new local dimension reduction method: local principal component analysis (local PCA) to update large-scale spatial fields with local data more efficiently. With the help of machine learning and efficient dimension reduction, the inversion becomes approachable for high-resolution hydrologic models.
Data-knowledge-driven geological interfaces modeling
- Study site: Greenland and South Australia, Australia
Modeling complex geological interfaces is a common task in geosciences. Many data sources are available for geological interface modeling, including borehole data and geophysical surveys. Geological knowledge, such as the delineation from geologists, is difficult to quantify but likely adds value to geological interface modeling. To integrate all information, this project presents a data-knowledge-driven trend surface analysis method to construct stochastic geological interfaces. A Metropolis–Hastings sampling framework is designed to sample stochastic trend interfaces and quantify the uncertainty of geological interfaces. We demonstrate our method in three different test cases: modeling stochastic interfaces of Greenland subglacial topography, magmatic intrusion, and buried river valleys in Australia.
Active learning to iteratively label new geophysical data
Seismic interpretation plays an essential role in locating subsurface horizons and understanding geological formations. This project investigates how to use semi-supervised segmentation to improve horizon predictions, even if we have only a few labeled horizons. An active learning framework is also proposed to label the most uncertain unlabeled sections, given the uncertainty estimation using deep ensembles. We believe our work helps geophysicists reduce the amount of labeling effort and achieve higher facies classification accuracy with the same amount of labeling work.