MCMC¶

The pyRSD package wraps the emcee package to provide functionality for running MCMC chains to sample the posterior distributions of the model parameters.

Command-line Options¶

MCMC chains are run by passing the mcmc sub-command to the rsdfit executable. The calling sequence for the mcmc command is

$ rsdfit mcmc -h
usage: rsdfit [-h] [--version] {mcmc,nlopt,restart,analyze} ... 

From more help on each of the subcommands, type:
rsdfit mcmc -h
rsdfit nlopt -h
rsdfit restart -h
rsdfit analyze -h mcmc
       [-h] [-m MODEL] [-p PARAMS] [--silent] -w WALKERS -i ITERATIONS
       [-n NCHAINS] -o FOLDER [--debug] [--no-save-model]

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        file name holding the model path
  -p PARAMS, --params PARAMS
                        file name holding the driver, theory, and data
                        parameters
  --silent              silence the standard output to the console
  -w WALKERS            number of emcee walkers to run the MCMC chain
                        (required)
  -i ITERATIONS         number of steps to run in the MCMC chain (required)
  -n NCHAINS, --nchains NCHAINS
                        number of MCMC chains to run concurrently
  -o FOLDER, --output FOLDER
                        the folder where the results will be written
                        (required)
  --debug               whether to print more info about the mpi4py.Pool
                        object
  --no-save-model       do not save the model instance

The main options are the parameter file, passed by the -p option, the directory to save results, passed by the -o option, and the name of a model to load, passed by the -m file. In addition, there are some MCMC-specific options:

-w, —walkers

The number of emcee walkers to use. These walkers are responsible for exploring the relevant parameter space simulataneously.

Note

The number of walkers must be at least twice the number of model parameters, and generally, the more walkers used the better. However, a trade-off exists since more walkers can become computationally infeasible when model evaluation is slow. In the case of the default model with 13 parameters, we recommend using 30-40 walkers.
-i, —iterations

The number of iterations to run in the MCMC chain.

Note

Generally, the models in the pyRSD package require >1000 iterations to achieve convergence, depending on how the MCMC sampler is initialized.
–nchains

If the rsdfit is executed in parallel with multiple processes using MPI, it is possible to run multiple chains with MCMC concurrently with this option. This is ideal for testing the convergence of the sampler, as the chains will be compared statistically to determine if they have converged to a similar point in parameter space.

Initializing the MCMC Chains¶

The method used to initialize either the MCMC chains can be configured using the driver.init_from parameter. The allowed values of this parameter when using the MCMC solver are:

fiducial :

Initialize the parameters in a small ball around the fiducial parameter values, specified in the parameter file via the fiducial keyword for each free parameter
prior :

Initialize the free parameters by drawing a value from the prior probability specified for each parameter, which will be either a uniform or normal distribution
result :

Initialize the free parameters in a small ball around the best-fit parameters loaded from a previous result. In this case, the driver.start_from parameter should give the name of a .npz, which can be loaded into either a pyRSD.rsdfit.results.LBFGSResults or pyRSD.rsdfit.results.EmceeResults object.

Testing for Convergence¶

The convergence of the MCMC chain being run will be tested if the driver.test_convergence parameter is set to True. This test is performed using the Gelman and Rubin diagnostic, and the tolerance level for the test is specified via the driver.epsilon parameter. The convergence for MCMC chains is performed as

Remove the first half of the current chains
Calculate the within chain and between chain variances
Estimate the variance from the within chain and between chain variance
Calculate the potential scale reduction parameter and compare to epsilon

Note

Convergence testing is disabled for MCMC chains by default, as it is often easier to run a pre-defined number of iterations (passed by the -i flag), which will be specific to the user’s problem at hand.

Recommended Practices¶

The emcee package recommends initializing its MCMC sampler in a small ball around what the user believes to be the area in parameter space close to the maximum probability. As such, we recommend either of the following initialization methods for best results:

1. Run the NLOPT solver, starting from a fiducial set of values, and then inititialize the MCMC solver with the result of that NLOPT run.

2. Initialize the MCMC solver with values drawn from the parameters’ prior distribution and run a set of burn-in iterations (typically ~1000 or so). Then, initialize a second MCMC chain from the best-fit result of that burn-in period.

For more recommended practices regarding the emcee package, please see the FAQ on the emcee documentation here.

Finally, it is also useful for convergence reasons to run multiple chains at once in parallel. Often running two chains, independently initialized, with half the number of iterations is useful to asses if the chains have truly found the best-fit parameters.