Skip to content

Feat_emulators_module #603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 38 commits into
base: develop
Choose a base branch
from
Draft

Conversation

rhugman
Copy link
Contributor

@rhugman rhugman commented Jul 2, 2025

Add pyemu.emulators module for surrogate modeling capabilities

...thanks Claude for the PR description...

Work in progress. Would really appreciate some external opinions on what doesn't work well. What you would like different, etc. I tried to design the classes in what seemed a sensible manner...but other sets of eyes/opinions would be appreciated!

Summary

This PR introduces a new pyemu.emulators module that provides a comprehensive framework for building and deploying surrogate models (emulators) for computationally expensive simulations. The module includes three main emulator types: Data Space Inversion (DSI), Gaussian Process Regression (GPR), and Learning-based Pattern-driven Forecast Approach (LPFA), along with a robust data transformation pipeline.

Key Features

🔧 Base Architecture

  • Base Emulator Class: Common interface for all emulator implementations with standardized fit(), predict(), save(), and load() methods
  • Flexible Transform Pipeline: Comprehensive data transform.inverse transform pipeline, with support for log10, normal score, standard scaling, and min-max transformations
  • PEST++ Integration: integration with PEST++ workflows for optimization and uncertainty quantification

🎯 Emulator Implementations

1. Data Space Inversion (DSI)

  • Based on Sun & Durlofsky (2017) methodology
  • Uses Singular Value Decomposition (SVD) for dimensionality reduction
  • Supports energy-based truncation for computational efficiency
  • Includes Data Space Inversion Variable Control (DSIVC) for multi-objective optimization
  • Autoamted PEST++ template generation for history matching and optimization workflows

2. Gaussian Process Regression (GPR)

  • Scikit-learn based implementation with multiple kernel support
  • Uncertainty quantification through prediction standard deviations

3. Learning-based Pattern-driven Forecast Approach (LPFA)

  • Neural network-based emulator using scikit-learn MLPRegressor
  • Principal Component Analysis (PCA) for dimensionality reduction
  • Row-wise scaling for time-series data
  • Optional noise modeling for residual uncertainty
  • Early stopping and regularization support

🔄 Data Transformation Pipeline

  • AutobotsAssemble: Main transformation coordinator
  • Multiple Transformers: Log10, normal score, standard scaling, min-max scaling
  • Row-wise Scaling: Specialized for time-series and grouped data
  • Reversible Operations: Full inverse transformation support

Technical Implementation

Core Classes

# Base class
pyemu.emulators.Emulator

# Emulator implementations  
pyemu.emulators.DSI
pyemu.emulators.GPR
pyemu.emulators.LPFA

# Transformation framework
pyemu.emulators.transformers.AutobotsAssemble

PEST++ Integration

  • Automatic template folder generation
  • PyWorker helper functions for DSI and GPR

Example Usage

import pyemu
from pyemu.emulators import DSI, GPR, LPFA

# DSI Emulator
dsi = DSI(data=observation_ensemble, transforms=[
    {'type': 'normal_score', 'quadratic_extrapolation': True}
])
dsi.fit()
pst_dsi = dsi.prepare_pestpp("template_dir")

# GPR Emulator  
gpr = GPR(data=training_data, input_names=inputs, output_names=outputs)
gpr.fit()
gpr.prepare_pestpp("pest_dir", "case_name")

# LPFA Emulator
lpfa = LPFA(data=data, input_names=inputs, groups=groups, 
            fit_groups=fit_groups, output_names=forecasts)
lpfa.fit(epochs=200)
predictions = lpfa.predict(new_data)

Testing

  • test suite in emulator_tests.py
  • Tests for all three emulator types with various transformation combinations
  • Integration tests with PEST++ workflows
  • Verification against known analytical solutions (ZDT1 benchmark)

Breaking Changes

Legacy GPR helper functions are still supported
Legacy DSI helper functions are broken.

Dependencies

  • numpy, pandas (existing pyemu dependencies)
  • scikit-learn (for GPR and LPFA implementations)
  • Standard library modules: os, shutil, pickle, inspect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant