Analyzing MCMC Results

The rsdfit executable includes an analyze sub-command that can run some basic analysis of the MCMC chains in a given directory, including automatically generating nice plot results, removing burn-in steps, and writing out best-fit parameter files.

The calling sequence is:

$ rsdfit analyze -h
usage: rsdfit [-h] [--version] {mcmc,nlopt,restart,analyze} ... 

From more help on each of the subcommands, type:
rsdfit mcmc -h
rsdfit nlopt -h
rsdfit restart -h
rsdfit analyze -h analyze
       [-h] [--minimal] [--bins BINS] [--show-mean] [--noplot] [--noplot-2d]
       [--show-fiducial] [--burnin BURNIN] [--contours-only] [--all]
       [--ext {pdf,png,eps}] [--fontsize FONTSIZE] [--ticksize TICKSIZE]
       [--line-width LINE_WIDTH] [--decimal DECIMAL] [--ticknumber TICKNUMBER]
       [--thin THIN] [--rescale-errors] [--extra OPTIONAL_PLOT_FILE]
       files [files ...]

positional arguments:
  files                 files to analyze: either a file(s), or a complete
                        directory name

optional arguments:
  -h, --help            show this help message and exit
  --minimal             use this flag to avoid computing the posterior
                        distribution (default: False)
  --bins BINS           number of bins in the histograms used to derive
                        posterior probabilities. Decrease this number for
                        smoother plots at the expense of masking details.
                        (default: 20)
  --show-mean           remove the mean likelihood from the 1D posterior plots
                        (default: False)
  --noplot              do not produce any plot, simply compute the posterior
                        (default: True)
  --noplot-2d           produce only the 1d posterior plot (default: True)
  --show-fiducial       don't include fiducial lines on 1D posterior plots
                        (default: False)
  --burnin BURNIN, -b BURNIN
                        the fraction of samples to consider burnin (default:
                        None)
  --contours-only       do not fill the contours on the 2d plot (default:
                        False)
  --all                 output every subplot and data in separate files
                        (default: False)
  --ext {pdf,png,eps}   change the extension for the output file (default:
                        pdf)
  --fontsize FONTSIZE   the desired fontsize of output fonts (default: 16)
  --ticksize TICKSIZE   the tick size on the plots (default: 14)
  --line-width LINE_WIDTH
                        the linewidth of 1d plots (default: 4)
  --decimal DECIMAL     number of decimal places on ticks (default: 3)
  --ticknumber TICKNUMBER
                        number of ticks on each axis (default: 3)
  --thin THIN           the thinning factor to use (default: 1)
  --rescale-errors      whether to rescale errors (default: False)
  --extra OPTIONAL_PLOT_FILE
                        extra file to customize the output plots. You can
                        actually set all the possible options in this file,
                        including line-width, ticknumber, ticksize, etc... You
                        can specify four fields, `info.redefine` (dict with
                        keys set to the previous variable, and the value set
                        to a numerical computation that should replace this
                        variable), `info.to_change` (dict with keys set to the
                        old variable name, and value set to the new variable
                        name), `info.to_plot` (list of variables with new
                        names to plot), and `info.new_scales` (dict with keys
                        set to the new variable names, and values set to the
                        number by which it should be multiplied in the graph).
                        For instance, .. code::
                        analyze.to_plot=['name1','name2','newname3',...] analy
                        ze.new_scales={'name1':number1,'name2':number2,...}
                        (default: )

The user can specify a results directory as the only positional argument, or one or more EmceeResults file names as the positional arguments. In the case of a directory, all valid .npz files in that directory will be analyzed.

The steps performed by the analyze sub-command are:

1. The first thing the code does is compare the convergence of all parameters in the result files (both free and constrained) using the Gelman-Rubin criteria and prints out this convergence. The best results are achieved when multiple, independent, results files are provided on the command-line.

2. Next, the code removes automatically trims the chains of the burn-in steps, removing iterations that are too far away from the maximum probability. Alternatively, the user can specify the fraction of initial samples to consider burnin via the -b, ---burnin flag.

3. After the burn-in steps are removed, a single MCMC chain is created, and written to the info/combined_result.npz path. Additionally, summary files about the best-fit parameters are saved to the info directory.

4. Several plots are generated, based on the options specified by the user on the command line. These figures are saved to the plots directory. The possible plots include figures of the 1D histograms and triangle plots showing the 2D correlations between parameters.

The user can specify groupings of parameters to plot by specifying the analyze.to_plot_1d and analyze.to_plot_2d parameters in a file and passing the name of that file via the ---extra command line option. For example, the file may include:

analyze.to_plot_2d = {'biases': ['b1_cA', 'b1_cB', 'b1_sA', 'b1_sB'], \
                      'fractions' : ['fs', 'fsB', 'fcB'], \
                      'cosmo' : ['f', 'sigma8_z', 'fsigma8', 'alpha_par', 'alpha_perp', 'b1sigma8', 'alpha', 'epsilon'], \
                      'sigmas' : ['sigma_c', 'sigma_s', 'sigma_sA', 'sigma_sB'], \
                      'nuisance' : ['Nsat_mult', 'f1h_sBsB', 'f1h_cBs', 'gamma_b1sB', 'gamma_b1sA']}

In this case, 2D triangle plots comparing each of these parameter groupings will be generated and saved in the plots directory.

Note

See the documentation of AnalysisDriver below for the accepted parameters that can be specified in the extra parameter file passed to the rsdfit analyze command.

API

class pyRSD.rsdfit.analysis.driver.AnalysisDriver(**kwargs)

A class to serve as the driver for analyzing MCMC chains

__init__(**kwargs)

Initialize the object by passing key/value pairs

Parameters:

files : list of str

list of a directory or series of files to analyze

minimal : bool, optional (False)

if True, only write the covmat and bestfit, without computing the posterior or making plots.

bins : int, optional (20)

number of bins in the histograms used to derive posterior

mean_likelihood : bool, optional (True)

show the mean likelihood on the 1D posterior plots

plot : bool, optional (True)

if False, do not make any plots, simply compute the posterior

plot_2d : bool, optional (True)

if False, do not produce the 2D posterior plots

contours_only : bool, optional (False)

if True, do not fill the contours on the 2d plots

subplot : bool, optional (False)

if True, output every subplot and data in separate files

extension : {‘pdf’, ‘png’, ‘eps’}, optional (pdf)

the extension to use for output plots

fontsize : int, optional (16)

the fontsize to use on the plots

ticksize : int, optional (14)

the ticksize to use on the plots

line_width : int, optional (4)

the line-width of 1D plots

decimal : int, optional (3)

the number of decimal places on ticks

ticknumber : int, optional (3)

the number of ticks on each axis

optional_plot_file : str, optional (“”)

extra file to customize the output plots

tex_names : dict, optional, ({})

dict holding a latex name to use for each parameter

to_plot_1d : list, optional ([])

list of parameters to plot 1D posteriors of

to_plot_2d : dict, optional ({})

dict holding groups of parameters to make 2D plots of

scales : dict, optional ({})

dict holding the rescaling factors that the posterior will be divided by

save_output : bool, optional (True)

if False, do not save any output or make new directories

show_fiducial : bool, optional (True)

whether to show the fiducial values as vertical lines on the 1D posterior plots

fiducial : dict, optional {{}}

a dictionary holding fidicual values to use, which will override the original fiducial values

burnin : float, optional (None)

the fraction of samples to consider burnin