Analyzing MCMC Results¶
The rsdfit
executable includes an analyze
sub-command that can
run some basic analysis of the MCMC chains in a given directory, including
automatically generating nice plot results, removing burn-in steps, and
writing out best-fit parameter files.
The calling sequence is:
$ rsdfit analyze -h
usage: rsdfit [-h] [--version] {mcmc,nlopt,restart,analyze} ...
From more help on each of the subcommands, type:
rsdfit mcmc -h
rsdfit nlopt -h
rsdfit restart -h
rsdfit analyze -h analyze
[-h] [--minimal] [--bins BINS] [--show-mean] [--noplot] [--noplot-2d]
[--show-fiducial] [--burnin BURNIN] [--contours-only] [--all]
[--ext {pdf,png,eps}] [--fontsize FONTSIZE] [--ticksize TICKSIZE]
[--line-width LINE_WIDTH] [--decimal DECIMAL] [--ticknumber TICKNUMBER]
[--thin THIN] [--rescale-errors] [--extra OPTIONAL_PLOT_FILE]
files [files ...]
positional arguments:
files files to analyze: either a file(s), or a complete
directory name
optional arguments:
-h, --help show this help message and exit
--minimal use this flag to avoid computing the posterior
distribution (default: False)
--bins BINS number of bins in the histograms used to derive
posterior probabilities. Decrease this number for
smoother plots at the expense of masking details.
(default: 20)
--show-mean remove the mean likelihood from the 1D posterior plots
(default: False)
--noplot do not produce any plot, simply compute the posterior
(default: True)
--noplot-2d produce only the 1d posterior plot (default: True)
--show-fiducial don't include fiducial lines on 1D posterior plots
(default: False)
--burnin BURNIN, -b BURNIN
the fraction of samples to consider burnin (default:
None)
--contours-only do not fill the contours on the 2d plot (default:
False)
--all output every subplot and data in separate files
(default: False)
--ext {pdf,png,eps} change the extension for the output file (default:
pdf)
--fontsize FONTSIZE the desired fontsize of output fonts (default: 16)
--ticksize TICKSIZE the tick size on the plots (default: 14)
--line-width LINE_WIDTH
the linewidth of 1d plots (default: 4)
--decimal DECIMAL number of decimal places on ticks (default: 3)
--ticknumber TICKNUMBER
number of ticks on each axis (default: 3)
--thin THIN the thinning factor to use (default: 1)
--rescale-errors whether to rescale errors (default: False)
--extra OPTIONAL_PLOT_FILE
extra file to customize the output plots. You can
actually set all the possible options in this file,
including line-width, ticknumber, ticksize, etc... You
can specify four fields, `info.redefine` (dict with
keys set to the previous variable, and the value set
to a numerical computation that should replace this
variable), `info.to_change` (dict with keys set to the
old variable name, and value set to the new variable
name), `info.to_plot` (list of variables with new
names to plot), and `info.new_scales` (dict with keys
set to the new variable names, and values set to the
number by which it should be multiplied in the graph).
For instance, .. code::
analyze.to_plot=['name1','name2','newname3',...] analy
ze.new_scales={'name1':number1,'name2':number2,...}
(default: )
The user can specify a results directory as the only positional argument, or
one or more EmceeResults
file names
as the positional arguments. In the case of a directory, all valid
.npz
files in that directory will be analyzed.
The steps performed by the analyze
sub-command are:
1. The first thing the code does is compare the convergence of all parameters in the result files (both free and constrained) using the Gelman-Rubin criteria and prints out this convergence. The best results are achieved when multiple, independent, results files are provided on the command-line.
2. Next, the code removes automatically trims the chains of the burn-in steps,
removing iterations that are too far away from the maximum probability.
Alternatively, the user can specify the fraction of initial samples to
consider burnin via the -b, ---burnin
flag.
3. After the burn-in steps are removed, a single MCMC chain is created,
and written to the info/combined_result.npz
path. Additionally,
summary files about the best-fit parameters are saved to the info
directory.
4. Several plots are generated, based on the options specified by the user
on the command line. These figures are saved to the plots
directory.
The possible plots include figures of the 1D histograms and triangle
plots showing the 2D correlations between parameters.
The user can specify groupings of parameters to plot by specifying the
analyze.to_plot_1d
and analyze.to_plot_2d
parameters in a file and
passing the name of that file via the ---extra
command line option.
For example, the file may include:
analyze.to_plot_2d = {'biases': ['b1_cA', 'b1_cB', 'b1_sA', 'b1_sB'], \
'fractions' : ['fs', 'fsB', 'fcB'], \
'cosmo' : ['f', 'sigma8_z', 'fsigma8', 'alpha_par', 'alpha_perp', 'b1sigma8', 'alpha', 'epsilon'], \
'sigmas' : ['sigma_c', 'sigma_s', 'sigma_sA', 'sigma_sB'], \
'nuisance' : ['Nsat_mult', 'f1h_sBsB', 'f1h_cBs', 'gamma_b1sB', 'gamma_b1sA']}
In this case, 2D triangle plots comparing each of these parameter groupings
will be generated and saved in the plots
directory.
Note
See the documentation of AnalysisDriver
below for the accepted parameters that can be specified in the extra
parameter file passed to the rsdfit analyze
command.
API¶
-
class
pyRSD.rsdfit.analysis.driver.
AnalysisDriver
(**kwargs)¶ A class to serve as the driver for analyzing MCMC chains
-
__init__
(**kwargs)¶ Initialize the object by passing key/value pairs
Parameters: files : list of str
list of a directory or series of files to analyze
minimal : bool, optional (False)
if True, only write the covmat and bestfit, without computing the posterior or making plots.
bins : int, optional (20)
number of bins in the histograms used to derive posterior
mean_likelihood : bool, optional (True)
show the mean likelihood on the 1D posterior plots
plot : bool, optional (True)
if False, do not make any plots, simply compute the posterior
plot_2d : bool, optional (True)
if False, do not produce the 2D posterior plots
contours_only : bool, optional (False)
if True, do not fill the contours on the 2d plots
subplot : bool, optional (False)
if True, output every subplot and data in separate files
extension : {‘pdf’, ‘png’, ‘eps’}, optional (pdf)
the extension to use for output plots
fontsize : int, optional (16)
the fontsize to use on the plots
ticksize : int, optional (14)
the ticksize to use on the plots
line_width : int, optional (4)
the line-width of 1D plots
decimal : int, optional (3)
the number of decimal places on ticks
ticknumber : int, optional (3)
the number of ticks on each axis
optional_plot_file : str, optional (“”)
extra file to customize the output plots
tex_names : dict, optional, ({})
dict holding a latex name to use for each parameter
to_plot_1d : list, optional ([])
list of parameters to plot 1D posteriors of
to_plot_2d : dict, optional ({})
dict holding groups of parameters to make 2D plots of
scales : dict, optional ({})
dict holding the rescaling factors that the posterior will be divided by
save_output : bool, optional (True)
if False, do not save any output or make new directories
show_fiducial : bool, optional (True)
whether to show the fiducial values as vertical lines on the 1D posterior plots
fiducial : dict, optional {{}}
a dictionary holding fidicual values to use, which will override the original fiducial values
burnin : float, optional (None)
the fraction of samples to consider burnin
-