Analysis of Markov-Chains¶
The class TraceAnalysis provides some convenient methods to carry out an analysis of the Markov-Chains resulting from an MCMC “fit”.
Examples for using the class are given in the associated tutorial: Analyze Markov-Chains using TraceAnalysis
- class PyAstronomy.funcFit.TraceAnalysis(resource, db='pickle')¶
Support to analyze MCMC chains.
This class provides a number of plotting methods. Note that you still need to call *show()* from pylab to see the result.
- Parameters
- resourcestring or pymc database object
If string, it assumed to be the filename of the Markov Chain file. Otherwise, it is supposed to be a pymc database object. If the filename is of the form “*.emcee”, it is assumed to be a trace produced by emcee.
- Attributes
- burnint
Applies a burn-in. All iterations earlier than burn will be neglected.
- thinint
Applies thinning to each chain. Retains every k th sample, where k is an integer value.
Methods
Returns list of available parameter names.
Returns a list of available PyMC Trace objects
correlationMatrix
([toScreen, method, ...])Calculates the correlation or covariance matrix.
correlationTable
([parsList, coeff, noPrint])Calculate and show parameter correlations
hpd
(parm[, trace, cred])Calculates highest probability density interval (HPD, minimum width BCI).
mean
(parm)Calculate mean.
median
(parm)Calculate median.
Get number of walkers in emcee chain.
parameterSet
([prescription])Find parameter values for a particular prescription.
pearsonr
(parm1, parm2)Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.
plotCorr
([parsList])Produces correlation plots.
plotCorrEnh
([parsList])Produces enhanced correlation plots.
plotDeviance
([parsList])Plots value of deviance over parameter values encountered during sampling.
plotHist
([parsList])Plots distributions for a number of traces.
plotTrace
(parm[, fmt])Plots the trace.
plotTraceHist
(parm)Plots trace and histogram (distribution).
quantiles
(parm[, qlist])Quantiles for given trace.
selectWalkers
(ws)Select walkers for emcee chains.
Get the list of selected walkers.
setBurn
(burn)Change value of "post burn-in".
setThin
(thin)Change value of "post thinning".
setToState
(model[, state, verbose])Set the parameter values to a certain state.
show
()Call show() from matplotlib to bring graphs to screen.
spearmanr
(parm1, parm2)Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.
state
()Returns dictionary containing basic information on the sampling process.
std
(parm)Calculate standard deviation.
plotHists
- availableParameters()¶
Returns list of available parameter names.
- availableTraces()¶
Returns a list of available PyMC Trace objects
- correlationMatrix(toScreen=True, method='pearson', parList=None, covariance=False)¶
Calculates the correlation or covariance matrix.
- Parameters
- parListlist of strings, optional
The list of parameters used in the calculation. If not given, all available parameters will be used.
- toScreenboolean, optional
If True, the result will be printed to stdout
- methodstring, {“pearson”, “spearman”}
The correlation coefficient to be used.
- covarianceboolean, optional
If True, the covariance will be returned instead of the correlation. The default is False.
- Returns
- Parlistlist
Parameter names in the order used in the calculation.
- Correlation matrix2d array
The correlation matrix
- lineslist of strings
Formatted version of the correlation matrix in the form of a list of strings.
- correlationTable(parsList=None, coeff='pearson', noPrint=False)¶
Calculate and show parameter correlations
- Parameters
- parsListlist of strings, optional
A list of parameters for which to calculate the correlation. If None, all available parameters will be used.
- coeffstring, {“pearson”, “spearman”}, optional
The coefficient to be used. By default, Pearson’s correlation coefficient will be used.
- noPrintboolean, optional
If True, the table output will be suppressed.
- Returns
- Coefficientsdictionary
Maps each tuple of two parameter names to the associated correlation coefficient.
- hpd(parm, trace=None, cred=0.95)¶
Calculates highest probability density interval (HPD, minimum width BCI).
- Parameters
- parmstring
Name of parameter
- credfloat, optional
Credibility level. Defaults to 0.95, i.e., the 95% HPD will be calculated.
- tracearray, optional
If a trace is given, it will be used in the calculation instead of the trace for parm stored in the class. Note that the parm will be ignored in this case!
- Returns
- HPDtuple
The lower and upper bound of the credibility interval.
- mean(parm)¶
Calculate mean.
- Parameters
- parmstring
Name of parameter.
- Returns
- The meanfloat
- median(parm)¶
Calculate median.
- Parameters
- parmstring
Name of parameter.
- Returns
- The medianfloat
- numberOfWalkers()¶
Get number of walkers in emcee chain.
- Returns
- nWalkersint
Number of available walkers
- parameterSet(prescription='lowestDev')¶
Find parameter values for a particular prescription.
- Parameters
- prescriptionstring, {“lowestDev”, “mean”, “median”}
Which parameter set to find. If ‘lowestDev’ is used, the method will return the parameter set pertaining to the lowest deviance solution. If ‘mean’ or ‘median’ are specified, the mean or median parameter values determined from the Markov Chains are returned.
- Returns
- Parameter setdictionary
A dictionary mapping all parameter names to the value derived using the specified prescription.
- Lowest deviance indexint
The index of the lowest deviance solution. Only returned if the prescription is ‘lowestDev’
- pearsonr(parm1, parm2)¶
Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.
- Parameters
- parm1, parm2string
The names of the two parameters used in the evaluation.
- Returns
- Pearson correlation coefficientfloat
- p-valuefloat
Notes
Uses SciPy’s scipy.stats.pearsonr to evaluate.
The SciPy documentation of scipy.stats.pearsonr:
The Pearson correlation coefficient measures the linear relationship between two data sets. Strictly speaking, Pearson’s correlation requires that each data set be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases. The p-value roughly indicates the probability of an uncorrelated system producing data sets that have a Pearson correlation at least as extreme as the one computed from these data sets. The p-values are not entirely reliable but are probably reasonable for data sets larger than 500 or so.
- plotCorr(parsList=None, **plotArgs)¶
Produces correlation plots.
- Parameters
- parsListlist of string, optional
If not given, all available traces are used. Otherwise a list of at least two parameters has to be specified.
- plotArgsdict, optional
Keyword arguments handed to plot procedure of pylab.
- plotCorrEnh(parsList=None, **plotArgs)¶
Produces enhanced correlation plots.
- Parameters
- parsListlist of string, optional
If not given, all available traces are used. Otherwise a list of at least two parameters has to be specified.
- plotArgsdict, optional
Keyword arguments handed to plot procedures of pylab. The following keywords are available: contour,bins,cmap,origin,interpolation,colors
- plotDeviance(parsList=None)¶
Plots value of deviance over parameter values encountered during sampling.
- Parameters
- parsListstring or list of strings, optional,
Refers to a parameter name or a list of parameter names. If None, all available parameters are plotted.
- plotHist(parsList=None)¶
Plots distributions for a number of traces.
- Parameters
- parsListstring or list of strings, optional,
Refers to a parameter name or a list of parameter names. If None, all available parameters are plotted.
- plotTrace(parm, fmt='b-')¶
Plots the trace.
- Parameters
- parmstring
The variable name.
- fmtstring, optional
The matplotlib format string used to plot the trace. Default is ‘b-‘.
- plotTraceHist(parm)¶
Plots trace and histogram (distribution).
- Parameters
- parmstring
The variable name.
- quantiles(parm, qlist=None)¶
Quantiles for given trace.
- Parameters
- parmstring
Name of parameter
- qlistlist of floats (0-100), optional
Specifies which quantiles shall be calculated. The default is 2.5, 25, 50, 75, and 97.5 percent.
- Returns
- Quantilesdictionary
For each quantile (in percent) the corresponding value.
- selectWalkers(ws)¶
Select walkers for emcee chains.
- Parameters
- wslist or array of integers
The walker to be considered in the analysis. Counting starts at zero.
- selectedWalkers()¶
Get the list of selected walkers.
- Parameters
- walkersarray
The selected walkers.
- setBurn(burn)¶
Change value of “post burn-in”.
In the case of an emcee trace, the “post burn-in” is applied to the trace of all walkers.
- Parameters
- burnint
The number of samples to be neglected.
- setThin(thin)¶
Change value of “post thinning”.
- Parameters
- thinint
Applies thinning to each chain. Retains every k th sample, where k is an integer value.
Notes
Use the “post thinning” to thin out your chains.
- setToState(model, state='best', verbose=True)¶
Set the parameter values to a certain state.
- Parameters
- model - fitting object
The fitting model object whose parameters will be updated.
- state{“best”, “mean”}, optional
- “best”Set parameters to the “best fit” state as measured
by deviance. This is the default.
“mean” : Set parameters to mean value of trace.
- verbosebool, optional
If False, no output about what is done will be generated (default is True).
- show()¶
Call show() from matplotlib to bring graphs to screen.
- spearmanr(parm1, parm2)¶
Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.
- Parameters
- parm1, parm2string
The names of the two parameters used in the evaluation.
- Returns
- Spearman rank-order correlation coefficientfloat
- p-valuefloat
Notes
Uses SciPy’s scipy.stats.spearmanr to evaluate.
The SciPy documentation of scipy.stats.spearmanr:
The Spearman correlation is a nonparametric measure of the monotonicity of the relationship between two data sets. Unlike the Pearson correlation, the Spearman correlation does not assume that both data sets are normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact monotonic relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases. The p-value roughly indicates the probability of an uncorrelated system producing data sets that have a Spearman correlation at least as extreme as the one computed from these data sets. The p-values are not entirely reliable but are probably reasonable for data sets larger than 500 or so.
- state()¶
Returns dictionary containing basic information on the sampling process.
- std(parm)¶
Calculate standard deviation.
- Parameters
- parmstring
Name of parameter.
- Returns
- The standard deviationfloat
- PyAstronomy.funcFit.hpd(trace, cred)¶
Estimate the highest probability density interval.
This function determines the shortest, continuous interval containing the specified fraction (cred) of steps of the Markov chain. Note that multi-modal distribution may require further scrutiny.
- Parameters
- tracearray
The steps of the Markov chain.
- credfloat
The probability mass to be included in the interval (between 0 and 1).
- Returns
- start, endfloat
The start and end points of the interval.
- PyAstronomy.funcFit.quantiles(trace, qs)¶
Get quantiles for trace.
- Parameters
- tracearray
The steps of the Markov chain.
- qslist or array
The quantiles in percent.
- Returns
- Quantilesdictionary
For each quantile, the corresponding value.
- PyAstronomy.funcFit.modeGrenander(trace, k, p)¶
Estimate the mode using estimator by Grenander 1965
Evaluate the estimator, where x represents the trace sorted in ascending order,
\[M^*_{p,k} = \frac{ \frac{1}{2} \sum_{i=1}^{n-k} (x_{i+k} + x_k)/(x_{i+k} - x_k)^p }{\sum_{i=1}^{n-k} 1/(x_{i+k} - x_k)^p }\]given by Grenander 1965 (“Some Direct Estimates of the Mode.” Ann. Math. Statist. 36 (1) 131 - 138).
- Parameters
- kint
Index offset (should be >2*p for asymptotic normality)
- pfloat
Parameter of estimator (1 < p < k)
- Returns
- modefloat
Estimate of mode