Analysis of Markov-Chains

The class TraceAnalysis provides some convenient methods to carry out an analysis of the Markov-Chains resulting from an MCMC “fit”.

Examples for using the class are given in the associated tutorial: Analyze Markov-Chains using TraceAnalysis

class PyAstronomy.funcFit.TraceAnalysis(resource, db='pickle')

Support to analyze MCMC chains.

This class provides a number of plotting methods. Note that you still need to call *show()* from pylab to see the result.

Parameters
resourcestring or pymc database object

If string, it assumed to be the filename of the Markov Chain file. Otherwise, it is supposed to be a pymc database object. If the filename is of the form “*.emcee”, it is assumed to be a trace produced by emcee.

Attributes
burnint

Applies a burn-in. All iterations earlier than burn will be neglected.

thinint

Applies thinning to each chain. Retains every k th sample, where k is an integer value.

Methods

availableParameters()

Returns list of available parameter names.

availableTraces()

Returns a list of available PyMC Trace objects

correlationMatrix([toScreen, method, ...])

Calculates the correlation or covariance matrix.

correlationTable([parsList, coeff, noPrint])

Calculate and show parameter correlations

hpd(parm[, trace, cred])

Calculates highest probability density interval (HPD, minimum width BCI).

mean(parm)

Calculate mean.

median(parm)

Calculate median.

numberOfWalkers()

Get number of walkers in emcee chain.

parameterSet([prescription])

Find parameter values for a particular prescription.

pearsonr(parm1, parm2)

Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.

plotCorr([parsList])

Produces correlation plots.

plotCorrEnh([parsList])

Produces enhanced correlation plots.

plotDeviance([parsList])

Plots value of deviance over parameter values encountered during sampling.

plotHist([parsList])

Plots distributions for a number of traces.

plotTrace(parm[, fmt])

Plots the trace.

plotTraceHist(parm)

Plots trace and histogram (distribution).

quantiles(parm[, qlist])

Quantiles for given trace.

selectWalkers(ws)

Select walkers for emcee chains.

selectedWalkers()

Get the list of selected walkers.

setBurn(burn)

Change value of "post burn-in".

setThin(thin)

Change value of "post thinning".

setToState(model[, state, verbose])

Set the parameter values to a certain state.

show()

Call show() from matplotlib to bring graphs to screen.

spearmanr(parm1, parm2)

Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.

state()

Returns dictionary containing basic information on the sampling process.

std(parm)

Calculate standard deviation.

plotHists

availableParameters()

Returns list of available parameter names.

availableTraces()

Returns a list of available PyMC Trace objects

correlationMatrix(toScreen=True, method='pearson', parList=None, covariance=False)

Calculates the correlation or covariance matrix.

Parameters
parListlist of strings, optional

The list of parameters used in the calculation. If not given, all available parameters will be used.

toScreenboolean, optional

If True, the result will be printed to stdout

methodstring, {“pearson”, “spearman”}

The correlation coefficient to be used.

covarianceboolean, optional

If True, the covariance will be returned instead of the correlation. The default is False.

Returns
Parlistlist

Parameter names in the order used in the calculation.

Correlation matrix2d array

The correlation matrix

lineslist of strings

Formatted version of the correlation matrix in the form of a list of strings.

correlationTable(parsList=None, coeff='pearson', noPrint=False)

Calculate and show parameter correlations

Parameters
parsListlist of strings, optional

A list of parameters for which to calculate the correlation. If None, all available parameters will be used.

coeffstring, {“pearson”, “spearman”}, optional

The coefficient to be used. By default, Pearson’s correlation coefficient will be used.

noPrintboolean, optional

If True, the table output will be suppressed.

Returns
Coefficientsdictionary

Maps each tuple of two parameter names to the associated correlation coefficient.

hpd(parm, trace=None, cred=0.95)

Calculates highest probability density interval (HPD, minimum width BCI).

Parameters
parmstring

Name of parameter

credfloat, optional

Credibility level. Defaults to 0.95, i.e., the 95% HPD will be calculated.

tracearray, optional

If a trace is given, it will be used in the calculation instead of the trace for parm stored in the class. Note that the parm will be ignored in this case!

Returns
HPDtuple

The lower and upper bound of the credibility interval.

mean(parm)

Calculate mean.

Parameters
parmstring

Name of parameter.

Returns
The meanfloat
median(parm)

Calculate median.

Parameters
parmstring

Name of parameter.

Returns
The medianfloat
numberOfWalkers()

Get number of walkers in emcee chain.

Returns
nWalkersint

Number of available walkers

parameterSet(prescription='lowestDev')

Find parameter values for a particular prescription.

Parameters
prescriptionstring, {“lowestDev”, “mean”, “median”}

Which parameter set to find. If ‘lowestDev’ is used, the method will return the parameter set pertaining to the lowest deviance solution. If ‘mean’ or ‘median’ are specified, the mean or median parameter values determined from the Markov Chains are returned.

Returns
Parameter setdictionary

A dictionary mapping all parameter names to the value derived using the specified prescription.

Lowest deviance indexint

The index of the lowest deviance solution. Only returned if the prescription is ‘lowestDev’

pearsonr(parm1, parm2)

Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.

Parameters
parm1, parm2string

The names of the two parameters used in the evaluation.

Returns
Pearson correlation coefficientfloat
p-valuefloat

Notes

Uses SciPy’s scipy.stats.pearsonr to evaluate.

The SciPy documentation of scipy.stats.pearsonr:

The Pearson correlation coefficient measures the linear relationship between two data sets. Strictly speaking, Pearson’s correlation requires that each data set be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases. The p-value roughly indicates the probability of an uncorrelated system producing data sets that have a Pearson correlation at least as extreme as the one computed from these data sets. The p-values are not entirely reliable but are probably reasonable for data sets larger than 500 or so.

plotCorr(parsList=None, **plotArgs)

Produces correlation plots.

Parameters
parsListlist of string, optional

If not given, all available traces are used. Otherwise a list of at least two parameters has to be specified.

plotArgsdict, optional

Keyword arguments handed to plot procedure of pylab.

plotCorrEnh(parsList=None, **plotArgs)

Produces enhanced correlation plots.

Parameters
parsListlist of string, optional

If not given, all available traces are used. Otherwise a list of at least two parameters has to be specified.

plotArgsdict, optional

Keyword arguments handed to plot procedures of pylab. The following keywords are available: contour,bins,cmap,origin,interpolation,colors

plotDeviance(parsList=None)

Plots value of deviance over parameter values encountered during sampling.

Parameters
parsListstring or list of strings, optional,

Refers to a parameter name or a list of parameter names. If None, all available parameters are plotted.

plotHist(parsList=None)

Plots distributions for a number of traces.

Parameters
parsListstring or list of strings, optional,

Refers to a parameter name or a list of parameter names. If None, all available parameters are plotted.

plotTrace(parm, fmt='b-')

Plots the trace.

Parameters
parmstring

The variable name.

fmtstring, optional

The matplotlib format string used to plot the trace. Default is ‘b-‘.

plotTraceHist(parm)

Plots trace and histogram (distribution).

Parameters
parmstring

The variable name.

quantiles(parm, qlist=None)

Quantiles for given trace.

Parameters
parmstring

Name of parameter

qlistlist of floats (0-100), optional

Specifies which quantiles shall be calculated. The default is 2.5, 25, 50, 75, and 97.5 percent.

Returns
Quantilesdictionary

For each quantile (in percent) the corresponding value.

selectWalkers(ws)

Select walkers for emcee chains.

Parameters
wslist or array of integers

The walker to be considered in the analysis. Counting starts at zero.

selectedWalkers()

Get the list of selected walkers.

Parameters
walkersarray

The selected walkers.

setBurn(burn)

Change value of “post burn-in”.

In the case of an emcee trace, the “post burn-in” is applied to the trace of all walkers.

Parameters
burnint

The number of samples to be neglected.

setThin(thin)

Change value of “post thinning”.

Parameters
thinint

Applies thinning to each chain. Retains every k th sample, where k is an integer value.

Notes

Use the “post thinning” to thin out your chains.

setToState(model, state='best', verbose=True)

Set the parameter values to a certain state.

Parameters
model - fitting object

The fitting model object whose parameters will be updated.

state{“best”, “mean”}, optional
“best”Set parameters to the “best fit” state as measured

by deviance. This is the default.

“mean” : Set parameters to mean value of trace.

verbosebool, optional

If False, no output about what is done will be generated (default is True).

show()

Call show() from matplotlib to bring graphs to screen.

spearmanr(parm1, parm2)

Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.

Parameters
parm1, parm2string

The names of the two parameters used in the evaluation.

Returns
Spearman rank-order correlation coefficientfloat
p-valuefloat

Notes

Uses SciPy’s scipy.stats.spearmanr to evaluate.

The SciPy documentation of scipy.stats.spearmanr:

The Spearman correlation is a nonparametric measure of the monotonicity of the relationship between two data sets. Unlike the Pearson correlation, the Spearman correlation does not assume that both data sets are normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact monotonic relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases. The p-value roughly indicates the probability of an uncorrelated system producing data sets that have a Spearman correlation at least as extreme as the one computed from these data sets. The p-values are not entirely reliable but are probably reasonable for data sets larger than 500 or so.

state()

Returns dictionary containing basic information on the sampling process.

std(parm)

Calculate standard deviation.

Parameters
parmstring

Name of parameter.

Returns
The standard deviationfloat
PyAstronomy.funcFit.hpd(trace, cred)

Estimate the highest probability density interval.

This function determines the shortest, continuous interval containing the specified fraction (cred) of steps of the Markov chain. Note that multi-modal distribution may require further scrutiny.

Parameters
tracearray

The steps of the Markov chain.

credfloat

The probability mass to be included in the interval (between 0 and 1).

Returns
start, endfloat

The start and end points of the interval.

PyAstronomy.funcFit.quantiles(trace, qs)

Get quantiles for trace.

Parameters
tracearray

The steps of the Markov chain.

qslist or array

The quantiles in percent.

Returns
Quantilesdictionary

For each quantile, the corresponding value.

PyAstronomy.funcFit.modeGrenander(trace, k, p)

Estimate the mode using estimator by Grenander 1965

Evaluate the estimator, where x represents the trace sorted in ascending order,

\[M^*_{p,k} = \frac{ \frac{1}{2} \sum_{i=1}^{n-k} (x_{i+k} + x_k)/(x_{i+k} - x_k)^p }{\sum_{i=1}^{n-k} 1/(x_{i+k} - x_k)^p }\]

given by Grenander 1965 (“Some Direct Estimates of the Mode.” Ann. Math. Statist. 36 (1) 131 - 138).

Parameters
kint

Index offset (should be >2*p for asymptotic normality)

pfloat

Parameter of estimator (1 < p < k)

Returns
modefloat

Estimate of mode