MCMC

The main result of running rsdfit mcmc ... is a *.npz file saved to the output directory for each MCMC chain that was run. These results can be loaded from file using the pyRSD.rsdfit.results.EmceeResults class.

The EmceeResults stores the entire MCMC chain for each free parameter and can return statistics based on this chain for each parameter, i.e., the median, \(1\sigma\) and \(2\sigma\) errors, etc. This object also computes the corresponding MCMC chain for the constrained parameter values and can return information for each constrained parameter.

The median parameter values, as well as the 68% and 95% confidence intervals can be quickly displayed by printing the EmceeResults object. For example,

In [1]: from pyRSD.rsdfit.results import EmceeResults

In [2]: results = EmceeResults.from_npz('mcmc_result.npz')

# print out a summary of the parameters, with mean values and 68% and 95% intervals
In [3]: print(results)
Free parameters [ median (+/-68%) (+/-95%) ]
_______________
<Parameter Nsat_mult:      2.423 (+0.2082 -0.1809) (+0.4361 -0.3432)>
<Parameter alpha_par:      1.009 (+0.00682 -0.007681) (+0.01353 -0.01379)>
<Parameter alpha_perp:     1.005 (+0.004241 -0.004049) (+0.008281 -0.008406)>
<Parameter b1_cA:          1.999 (+0.05749 -0.05801) (+0.1126 -0.1265)>
<Parameter f:              0.867 (+0.02897 -0.02502) (+0.05642 -0.05073)>
<Parameter f1h_sBsB:       3.569 (+0.5521 -0.6841) (+1.224 -1.396)>
<Parameter fs:             0.1434 (+0.007553 -0.008046) (+0.01554 -0.01544)>
<Parameter fsB:            0.4662 (+0.08146 -0.0792) (+0.1685 -0.1606)>
<Parameter gamma_b1sA:     1.31 (+0.1001 -0.1032) (+0.2119 -0.2205)>
<Parameter gamma_b1sB:     2.343 (+0.1661 -0.181) (+0.3225 -0.3636)>
<Parameter sigma8_z:       0.5411 (+0.01224 -0.01362) (+0.02611 -0.02452)>
<Parameter sigma_c:        0.9297 (+0.06205 -0.06473) (+0.1126 -0.1177)>
<Parameter sigma_sA:       3.443 (+0.2779 -0.2702) (+0.5436 -0.5378)>

Constrained parameters [ median (+/-68%) (+/-95%) ]
______________________
<Parameter NsBsB:          5.178e+04 (+1.721e+04 -1.36e+04) (+4.16e+04 -2.44e+04)>
<Parameter b1_sA:          2.617 (+0.169 -0.1761) (+0.3037 -0.3725)>
<Parameter sigma_sB:       5.71 (+0.3067 -0.2659) (+0.6624 -0.4928)>
<Parameter NcBs:           2.261e+04 (+2900 -2096) (+6130 -4001)>
<Parameter b1_sB:          4.686 (+0.2679 -0.3235) (+0.5596 -0.672)>
<Parameter b1_cB:          3.175 (+0.1504 -0.1736) (+0.2942 -0.3522)>
<Parameter fcB:            0.122 (+0.01336 -0.0149) (+0.02856 -0.02761)>
<Parameter b1_c:           2.141 (+0.04721 -0.04684) (+0.08834 -0.08337)>
<Parameter b1:             2.344 (+0.05369 -0.04764) (+0.1051 -0.1053)>
<Parameter fsigma8:        0.4689 (+0.009673 -0.009076) (+0.01866 -0.01848)>
<Parameter b1_s:           3.575 (+0.1907 -0.2037) (+0.3854 -0.4186)>
<Parameter alpha:          1.006 (+0.004107 -0.004331) (+0.007706 -0.0083)>
<Parameter epsilon:        0.00125 (+0.002298 -0.002462) (+0.004696 -0.004652)>
<Parameter b1sigma8:       1.268 (+0.007575 -0.008006) (+0.01531 -0.01495)>
<Parameter F_AP:           1.004 (+0.006927 -0.007386) (+0.01419 -0.01393)>

The EmceeParameter Object

The EmceeResults object provides access to the parameters via a dictionary-like behavior. When accessing parameters using the name as the key, a EmceeParameter object is returned. This object has several useful attributes holding information about the parameter:

  1. flat_trace : the flattened MCMC chain for this parameter
  2. median : the median of the trace
  3. mean: the average value of the trace
  4. one_sigma : tuple of the lower and upper \(1\sigma\) errors
  5. two_sigma : tuple of the lower and upper \(2\sigma\) errors
  6. three_sigma : tuple of the lower and upper \(3\sigma\) errors
  7. stderr : the average of the upper and lower \(1\sigma\) error

For example,

In [4]: f = results['f']

In [5]: print(f.median, f.one_sigma, f.two_sigma)
0.8669514683620745 [-0.02501616327798617, 0.02896606398246815] [-0.05073347536029271, 0.05641800142862763]

In [6]: sigma8 = results['sigma8_z']

In [7]: print(sigma8.median, sigma8.one_sigma, sigma8.two_sigma)
0.5410678224052163 [-0.013623734433715451, 0.01224434253786777] [-0.02452235949056536, 0.026109037129819712]

# access to constrained parameters too
In [8]: fs8 = results['fsigma8']

In [9]: print(fs8)
<Parameter fsigma8:        0.4689 (+0.009673 -0.009076) (+0.01866 -0.01848)>

Specifying the Burn-in Period

The user can specify a “burn-in” period for a given results object, by changing the burnin attribute of the EmceeResults object. The value of this attribute specifies the number of steps to ignore, starting from the beginning of the MCMC chain. The ignored steps are not included when computing any statistics from the parameter traces.

The attributes of the EmceeParameter object automatically take into account the value of the burnin attribute. Thus, the user just needs to set the burnin and the parameter values will automatically adjust accordingly.

For example,

In [10]: results.burnin = 0 # ignore 0 steps

In [11]: print(results['fsigma8'].median)
0.4689141809940338

In [12]: results.burnin = 500 # ignore the first 500 steps

In [13]: print(results['fsigma8'].median) # slight change in value
0.468679279088974

Ideally, if the chain has converged for a given parameter, the user should see little change to the parameter’s value when adjusting the burnin period.

The Best-fit Values

The user can quickly access the best-fit parameter vector using the values attribute, which returns the median value of each free parameter. Similarly, the constrained_values attribute returns the median value of each constrained parameter.

Alternatively, the user can access the parameter vector that maximizes the log probability by accessing the max_lnprob_values and max_lnprob_constrained_values attributes.