Analyze Markov-Chains using TraceAnalysis

After having using pymc, e.g., via PyAstronomy’s wrapper fitMCMC and having created a Markov Chain, the information needs to be extracted from that Chain. TraceAnalysis is a helper class designed to help you doing this.

This collection of examples demonstrates how to create a Markov Chain and carry out a basic analysis.

Create a Markov-Chain to work with

The following example demonstrates how to create the Markov-Chain, which will later be analyzed.

from __future__ import print_function, division
# Import some required modules
import numpy as np
import matplotlib.pylab as plt
# ... and now the funcFit package
from PyAstronomy import funcFit as fuf

# Starting from with Voigt profile
vp = fuf.Voigt1d()
# Set some values to create a model
vp["A"] = -0.4
vp["al"] = 0.7
vp["mu"] = 5500.
vp["ad"] = 0.3
vp["off"] = 1.0

x = np.linspace(5490., 5510., 200)
# Create our data with some noise
yerr = np.ones(len(x))*0.01
y = vp.evaluate(x) + np.random.normal(0.0, 0.01, len(x))

# Say, we have a guess of the parameters, which is, however,
# not entirely correct
vp["A"] = -0.376
vp["al"] = 0.9
vp["mu"] = 5499.7
vp["ad"] = 0.4
vp["off"] = 1.0

# Plot the data and our guess
plt.errorbar(x, y, yerr=yerr, fmt='b.-')
plt.plot(x, vp.evaluate(x), 'r--')
plt.show()

# Thaw the parameters, which we wish to vary
# during the sampling
vp.thaw(["A", "al", "mu", "ad"])

# Use current parameters as starting point for the sampling
X0 = vp.freeParameters()
print("Starting point for sampling: ", X0)

priors = {"al":fuf.FuFPrior("limuniform", lower=0.0, upper=100.), \
"ad":fuf.FuFPrior("limuniform", lower=0.0, upper=100.)}

# Note that the filename should end in .emcee. Substitute this filename
# in the following examples.
vp.fitEMCEE(x, y, yerr=yerr, sampleArgs={"iters":2500}, \
dbfile="mcmcTA.emcee", priors=priors)

Investigate convergence behavior using the “deviance”

This example demonstrates how to investigate whether the chain has converged using the deviance and setting an appropriate burn-in.

from PyAstronomy import funcFit as fuf

# Create an instance of TraceAnalysis
# telling it which file to use
ta = fuf.TraceAnalysis("mcmcTA.emcee")

# Have a look at the deviance to check if and when
# the chains reached equilibrium.
ta.plotTrace("deviance")
ta.show()

# Say, we are sure that after 500 iterations, the chain
# reached equilibrium. We use this as the burn-in phase
ta.setBurn(500)

# Have a second look at the deviance, this time considering
# the burn-in. Note that the first 500 iterations are not
# removed from the chain. They are just not considered any
# more.
ta.plotTrace("deviance")
ta.show()

Accessing the traces

Here, we show how to access the traces and calculate mean, median, standard deviation, and Highest Probability Density (HPD) credibility intervals.

from __future__ import print_function, division
from PyAstronomy import funcFit as fuf

# Create an instance of TraceAnalysis
# telling it which file to use
ta = fuf.TraceAnalysis("mcmcTA.emcee")

# Use the burn-in from the previous example
ta.setBurn(500)

# See which model parameters have been sampled
print("Available parameters: ", ta.availableParameters())

# Access the traces of these parameters
print("Trace for A: ", ta["A"])

# Calculate mean, median, standard deviation, and
# credibility interval for the available parameters
for p in ta.availableParameters():
  hpd = ta.hpd(p, cred=0.95)
  print("Parameter %5s, mean = % g, median = % g, std = % g, 95%% HPD = % g - % g" \
        % (p, ta.mean(p), ta.median(p), ta.std(p), hpd[0], hpd[1]))

Analyze parameter correlations

Parameter correlations are important in many analysis. This example shows how such analyses can be studied using the Markov-Chain.

from __future__ import print_function, division
from PyAstronomy import funcFit as fuf

# Create an instance of TraceAnalysis
# telling it which file to use
ta = fuf.TraceAnalysis("mcmcTA.emcee")

# Use the burn-in from the previous example
ta.setBurn(500)

# Have a look at the parameter correlations
ta.correlationTable()

# Calculate Pearson's and Spearman's r-coefficients
print("Pearson: ", ta.pearsonr("ad", "al"))
print("Spearman: ", ta.spearmanr("ad", "al"))

# Show a plot of the correlation
# Note that the plotCorrEnh method can also
# be used, which is useful in the case of long
# chains.
ta.plotCorr(parsList=["ad", "al"])
# ta.plotCorrEnh(parsList=["ad", "al"])
ta.show()

Obtain lowest-deviance, mean, and median set of parameters

This example demonstrates how to obtain certain parameter sets and use them with a model.

from __future__ import print_function, division
from PyAstronomy import funcFit as fuf
import matplotlib.pylab as plt
import numpy as np

# Create an instance of TraceAnalysis
# telling it which file to use
ta = fuf.TraceAnalysis("mcmcTA.emcee")

# Use the burn-in from the previous example
ta.setBurn(500)

# Find sets of parameters
# First, the lowest deviance set
lds, index = ta.parameterSet(prescription="lowestDev")
print("Lowest deviance set: ", lds)
print("  at chain index: ", index)
means = ta.parameterSet(prescription="mean")
print("Set of mean values: ", means)
medians = ta.parameterSet(prescription="median")
print("Set of median values: ", means)

# Create Voigt model and plot the models belonging
# to the lowest deviance, mean, and median parameter
# set.
vp = fuf.Voigt1d()
# Generate the model wavelength axis
x = np.linspace(5490., 5510., 200)
# Calculate and plot the models
vp.assignValues(lds)
plt.plot(x, vp.evaluate(x), 'b.-')
vp.assignValues(means)
plt.plot(x, vp.evaluate(x), 'r.-')
vp.assignValues(medians)
plt.plot(x, vp.evaluate(x), 'g.-')
plt.show()

Other useful visualization utilities

This example shows a couple of other useful routines to visualize and analyze the Markov-Chains.

from PyAstronomy import funcFit as fuf

# Create an instance of TraceAnalysis
# telling it which file to use
ta = fuf.TraceAnalysis("mcmcTA.emcee")

# Use the burn-in from the previous example
ta.setBurn(500)

# Investigate a trace
ta.plotTrace("mu")
ta.show()
# and its distribution.
ta.plotHist("mu")
ta.show()
# Combine trace and distribution
ta.plotTraceHist("mu")
ta.show()
# Plot correlations
ta.plotCorr(parsList=["mu", "ad", "al"])
ta.show()