# Parameter Estimation

Parameter Estimation using parmest requires a Pyomo model, experimental data which defines multiple scenarios, and a list of parameter names (thetas) to estimate. parmest uses PySP [PyomoBookII] to solve a two-stage stochastic programming problem, where the experimental data is used to create a scenario tree. The objective function needs to be written in PySP form with the Pyomo Expression for first stage cost (named “FirstStageCost”) set to zero and the Pyomo Expression for second stage cost (named “SecondStageCost”) defined as the deviation between the model and the observations (typically defined as the sum of squared deviation between model values and observed values).

If the Pyomo model is not formatted as a two-stage stochastic programming problem in this format, the user can supply a custom function to use as the second stage cost and the Pyomo model will be modified within parmest to match the specifications required by PySP. The PySP callback function is also defined within parmest. The callback function returns a populated and initialized model for each scenario.

To use parmest, the user creates a `Estimator`

object
which includes the following methods:

`theta_est` |
Parameter estimation using all scenarios in the data |

`theta_est_bootstrap` |
Parameter estimation using bootstrap resampling of the data |

`theta_est_leaveNout` |
Parameter estimation where N data points are left out of each sample |

`objective_at_theta` |
Objective value for each theta |

`confidence_region_test` |
Confidence region test to determine if theta values are within a rectangular, multivariate normal, or Gaussian kernel density distribution for a range of alpha values |

`likelihood_ratio_test` |
Likelihood ratio test to identify theta values within a confidence region using the \(\chi^2\) distribution |

`leaveNout_bootstrap_test` |
Leave-N-out bootstrap test to compare theta values where N data points are left out to a bootstrap analysis using the remaining data, results indicate if theta is within a confidence region determined by the bootstrap analysis |

Additional functions are available in parmest to group data, plot results, and fit distributions to theta values.

`group_data` |
Group data by scenario |

`pairwise_plot` |
Plot pairwise relationship for theta values, and optionally alpha-level confidence intervals and objective value contours |

`grouped_boxplot` |
Plot a grouped boxplot to compare two datasets |

`grouped_violinplot` |
Plot a grouped violinplot to compare two datasets |

`fit_rect_dist` |
Fit an alpha-level rectangular distribution to theta values |

`fit_mvn_dist` |
Fit a multivariate normal distribution to theta values |

`fit_kde_dist` |
Fit a Gaussian kernel-density distribution to theta values |

A `Estimator`

object can be
created using the following code. A description of each argument is
listed below. Examples are provided in the Examples
Section.

```
>>> import pyomo.contrib.parmest.parmest as parmest
>>> pest = parmest.Estimator(model_function, data, theta_names, objective_function)
```

Optionally, solver options can be supplied, e.g.,

```
>>> solver_options = {"max_iter": 6000}
>>> pest = parmest.Estimator(model_function, data, theta_names, objective_function, solver_options)
```

## Model function

The first argument is a function which uses data for a single scenario to return a populated and initialized Pyomo model for that scenario. Parameters that the user would like to estimate must be defined as variables (Pyomo Var). The variables can be fixed (parmest unfixes variables that will be estimated). The model does not have to be specifically written for parmest. That is, parmest can modify the objective for PySP, see Objective function below.

## Data

The second argument is the data which will be used to populate the Pyomo model. Supported data formats include:

**Pandas Dataframe**where each row is a separate scenario and column names refer to observed quantities. Pandas DataFrames are easily stored and read in from csv, excel, or databases, or created directly in Python.**List of dictionaries**where each entry in the list is a separate scenario and the keys (or nested keys) refer to observed quantities. Dictionaries are often preferred over DataFrames when using static and time series data. Dictionaries are easily stored and read in from json or yaml files, or created directly in Python.**List of json file names**where each entry in the list contains a json file name for a separate scenario. This format is recommended when using large datasets in parallel computing.

The data must be compatible with the model function that returns a populated and initialized Pyomo model for a single scenario. Data can include multiple entries per variable (time series and/or duplicate sensors). This information can be included in custom objective functions, see Objective function below.

## Theta names

The third argument is a list of variable names that the user wants to estimate. The list contains strings with Var names from the Pyomo model.

## Objective function

The fourth argument is an optional argument which defines the optimization objective function to use in parameter estimation. If no objective function is specified, the Pyomo model is used “as is” and should be defined with “FirstStageCost” and “SecondStageCost” expressions that are used to build an objective for PySP. If the Pyomo model is not written as a two stage stochastic programming problem in this format, and/or if the user wants to use an objective that is different than the original model, a custom objective function can be defined for parameter estimation. The objective function arguments include model and data and the objective function returns a Pyomo expression which is used to define “SecondStageCost”. The objective function can be used to customize data points and weights that are used in parameter estimation.