Analysis of Markov-Chains¶

The class TraceAnalysis provides some convenient methods to carry out an analysis of the Markov-Chains resulting from an MCMC “fit”.

Examples for using the class are given in the associated tutorial: Analyze Markov-Chains using TraceAnalysis

class PyAstronomy.funcFit.TraceAnalysis(resource, db='pickle')¶

Support to analyze MCMC chains.

This class provides a number of plotting methods. Note that you still need to call *show()* from pylab to see the result.

Parameters

resourcestring or pymc database object: If string, it assumed to be the filename of the Markov Chain file. Otherwise, it is supposed to be a pymc database object. If the filename is of the form “*.emcee”, it is assumed to be a trace produced by emcee.

Attributes

burnint: Applies a burn-in. All iterations earlier than burn will be neglected.
thinint: Applies thinning to each chain. Retains every k th sample, where k is an integer value.

Methods

`availableParameters`()	Returns list of available parameter names.
`availableTraces`()	Returns a list of available PyMC Trace objects
`correlationMatrix`([toScreen, method, ...])	Calculates the correlation or covariance matrix.
`correlationTable`([parsList, coeff, noPrint])	Calculate and show parameter correlations
`hpd`(parm[, trace, cred])	Calculates highest probability density interval (HPD, minimum width BCI).
`mean`(parm)	Calculate mean.
`median`(parm)	Calculate median.
`numberOfWalkers`()	Get number of walkers in emcee chain.
`parameterSet`([prescription])	Find parameter values for a particular prescription.
`pearsonr`(parm1, parm2)	Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.
`plotCorr`([parsList])	Produces correlation plots.
`plotCorrEnh`([parsList])	Produces enhanced correlation plots.
`plotDeviance`([parsList])	Plots value of deviance over parameter values encountered during sampling.
`plotHist`([parsList])	Plots distributions for a number of traces.
`plotTrace`(parm[, fmt])	Plots the trace.
`plotTraceHist`(parm)	Plots trace and histogram (distribution).
`quantiles`(parm[, qlist])	Quantiles for given trace.
`selectWalkers`(ws)	Select walkers for emcee chains.
`selectedWalkers`()	Get the list of selected walkers.
`setBurn`(burn)	Change value of "post burn-in".
`setThin`(thin)	Change value of "post thinning".
`setToState`(model[, state, verbose])	Set the parameter values to a certain state.
`show`()	Call show() from matplotlib to bring graphs to screen.
`spearmanr`(parm1, parm2)	Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.
`state`()	Returns dictionary containing basic information on the sampling process.
`std`(parm)	Calculate standard deviation.

plotHists

availableParameters()¶: Returns list of available parameter names.

availableTraces()¶: Returns a list of available PyMC Trace objects

correlationMatrix(toScreen=True, method='pearson', parList=None, covariance=False)¶

Calculates the correlation or covariance matrix.

Parameters

parListlist of strings, optional: The list of parameters used in the calculation. If not given, all available parameters will be used.
toScreenboolean, optional: If True, the result will be printed to stdout
methodstring, {“pearson”, “spearman”}: The correlation coefficient to be used.
covarianceboolean, optional: If True, the covariance will be returned instead of the correlation. The default is False.

Returns

Parlistlist: Parameter names in the order used in the calculation.
Correlation matrix2d array: The correlation matrix
lineslist of strings: Formatted version of the correlation matrix in the form of a list of strings.

correlationTable(parsList=None, coeff='pearson', noPrint=False)¶

Calculate and show parameter correlations

Parameters

parsListlist of strings, optional: A list of parameters for which to calculate the correlation. If None, all available parameters will be used.
coeffstring, {“pearson”, “spearman”}, optional: The coefficient to be used. By default, Pearson’s correlation coefficient will be used.
noPrintboolean, optional: If True, the table output will be suppressed.

Returns

Coefficientsdictionary: Maps each tuple of two parameter names to the associated correlation coefficient.

hpd(parm, trace=None, cred=0.95)¶

Calculates highest probability density interval (HPD, minimum width BCI).

Parameters

parmstring: Name of parameter
credfloat, optional: Credibility level. Defaults to 0.95, i.e., the 95% HPD will be calculated.
tracearray, optional: If a trace is given, it will be used in the calculation instead of the trace for parm stored in the class. Note that the parm will be ignored in this case!

Returns

HPDtuple: The lower and upper bound of the credibility interval.

mean(parm)¶

Calculate mean.

Parameters

parmstring: Name of parameter.

Returns

The meanfloat

median(parm)¶

Calculate median.

Parameters

parmstring: Name of parameter.

Returns

The medianfloat

numberOfWalkers()¶

Get number of walkers in emcee chain.

Returns

nWalkersint: Number of available walkers

parameterSet(prescription='lowestDev')¶

Find parameter values for a particular prescription.

Parameters

prescriptionstring, {“lowestDev”, “mean”, “median”}: Which parameter set to find. If ‘lowestDev’ is used, the method will return the parameter set pertaining to the lowest deviance solution. If ‘mean’ or ‘median’ are specified, the mean or median parameter values determined from the Markov Chains are returned.

Returns

Parameter setdictionary: A dictionary mapping all parameter names to the value derived using the specified prescription.
Lowest deviance indexint: The index of the lowest deviance solution. Only returned if the prescription is ‘lowestDev’

pearsonr(parm1, parm2)¶

Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.

Parameters

parm1, parm2string: The names of the two parameters used in the evaluation.

Returns

Pearson correlation coefficientfloat
p-valuefloat

Notes

Uses SciPy’s scipy.stats.pearsonr to evaluate.

The SciPy documentation of scipy.stats.pearsonr:

The Pearson correlation coefficient measures the linear relationship between two data sets. Strictly speaking, Pearson’s correlation requires that each data set be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases. The p-value roughly indicates the probability of an uncorrelated system producing data sets that have a Pearson correlation at least as extreme as the one computed from these data sets. The p-values are not entirely reliable but are probably reasonable for data sets larger than 500 or so.

plotCorr(parsList=None, **plotArgs)¶

Produces correlation plots.

Parameters

parsListlist of string, optional: If not given, all available traces are used. Otherwise a list of at least two parameters has to be specified.
plotArgsdict, optional: Keyword arguments handed to plot procedure of pylab.

plotCorrEnh(parsList=None, **plotArgs)¶

Produces enhanced correlation plots.

Parameters

parsListlist of string, optional: If not given, all available traces are used. Otherwise a list of at least two parameters has to be specified.
plotArgsdict, optional: Keyword arguments handed to plot procedures of pylab. The following keywords are available: contour,bins,cmap,origin,interpolation,colors

plotDeviance(parsList=None)¶

Plots value of deviance over parameter values encountered during sampling.

Parameters

parsListstring or list of strings, optional,: Refers to a parameter name or a list of parameter names. If None, all available parameters are plotted.

plotHist(parsList=None)¶

Plots distributions for a number of traces.

Parameters

parsListstring or list of strings, optional,: Refers to a parameter name or a list of parameter names. If None, all available parameters are plotted.

plotTrace(parm, fmt='b-')¶

Plots the trace.

Parameters

parmstring: The variable name.
fmtstring, optional: The matplotlib format string used to plot the trace. Default is ‘b-‘.

plotTraceHist(parm)¶

Plots trace and histogram (distribution).

Parameters

parmstring: The variable name.

quantiles(parm, qlist=None)¶

Quantiles for given trace.

Parameters

parmstring: Name of parameter
qlistlist of floats (0-100), optional: Specifies which quantiles shall be calculated. The default is 2.5, 25, 50, 75, and 97.5 percent.

Returns

Quantilesdictionary: For each quantile (in percent) the corresponding value.

selectWalkers(ws)¶

Select walkers for emcee chains.

Parameters

wslist or array of integers: The walker to be considered in the analysis. Counting starts at zero.

selectedWalkers()¶

Get the list of selected walkers.

Parameters

walkersarray: The selected walkers.

setBurn(burn)¶

Change value of “post burn-in”.

In the case of an emcee trace, the “post burn-in” is applied to the trace of all walkers.

Parameters

burnint: The number of samples to be neglected.

setThin(thin)¶

Change value of “post thinning”.

Parameters

thinint: Applies thinning to each chain. Retains every k th sample, where k is an integer value.

Notes

Use the “post thinning” to thin out your chains.

setToState(model, state='best', verbose=True)¶

Set the parameter values to a certain state.

Parameters

model - fitting object

The fitting model object whose parameters will be updated.

state{“best”, “mean”}, optional

“best”Set parameters to the “best fit” state as measured: by deviance. This is the default.

“mean” : Set parameters to mean value of trace.

verbosebool, optional

If False, no output about what is done will be generated (default is True).

show()¶: Call show() from matplotlib to bring graphs to screen.

spearmanr(parm1, parm2)¶

Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.

Parameters

parm1, parm2string: The names of the two parameters used in the evaluation.

Returns

Spearman rank-order correlation coefficientfloat
p-valuefloat

Notes

Uses SciPy’s scipy.stats.spearmanr to evaluate.

The SciPy documentation of scipy.stats.spearmanr:

The Spearman correlation is a nonparametric measure of the monotonicity of the relationship between two data sets. Unlike the Pearson correlation, the Spearman correlation does not assume that both data sets are normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact monotonic relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases. The p-value roughly indicates the probability of an uncorrelated system producing data sets that have a Spearman correlation at least as extreme as the one computed from these data sets. The p-values are not entirely reliable but are probably reasonable for data sets larger than 500 or so.

state()¶: Returns dictionary containing basic information on the sampling process.

std(parm)¶

Calculate standard deviation.

Parameters

parmstring: Name of parameter.

Returns

The standard deviationfloat

PyAstronomy.funcFit.hpd(trace, cred)¶

Estimate the highest probability density interval.

This function determines the shortest, continuous interval containing the specified fraction (cred) of steps of the Markov chain. Note that multi-modal distribution may require further scrutiny.

Parameters

tracearray: The steps of the Markov chain.
credfloat: The probability mass to be included in the interval (between 0 and 1).

Returns

start, endfloat: The start and end points of the interval.

PyAstronomy.funcFit.quantiles(trace, qs)¶

Get quantiles for trace.

Parameters

tracearray: The steps of the Markov chain.
qslist or array: The quantiles in percent.

Returns

Quantilesdictionary: For each quantile, the corresponding value.

PyAstronomy.funcFit.modeGrenander(trace, k, p)¶

Estimate the mode using estimator by Grenander 1965

Evaluate the estimator, where x represents the trace sorted in ascending order,

\[M^*_{p,k} = \frac{ \frac{1}{2} \sum_{i=1}^{n-k} (x_{i+k} + x_k)/(x_{i+k} - x_k)^p }{\sum_{i=1}^{n-k} 1/(x_{i+k} - x_k)^p }\]

given by Grenander 1965 (“Some Direct Estimates of the Mode.” Ann. Math. Statist. 36 (1) 131 - 138).

Parameters

kint: Index offset (should be >2*p for asymptotic normality)
pfloat: Parameter of estimator (1 < p < k)

Returns

modefloat: Estimate of mode

Analysis of Markov-Chains¶

Table of Contents

Previous topic

Next topic

This Page