Examples of Beta Sigma Noise Estimation

In the following, a number of examples demonstrate aspects of the \(\beta\sigma\) procedure. Please see the corresponding paper for further discussion.

Noise estimates with different orders of approximations (N)

The required order of approximation (N) depends on the level of variation in the data (or, equivalently, the sampling cadence in relation to the variability time scale of the signal). Comparing estimates obtained from various orders of approximation provides a useful cross-check for the plausibility of the result. In the example below, it is clearly seen that the zeroth order yields a too high estimate for the amplitude of noise because the signal varies too quickly.

from __future__ import print_function, division
import numpy as np
import matplotlib.pylab as plt
from PyAstronomy import pyasl


def g(t):
    """
    Function determining the behavior of the data.
    """
    return 1.3 - 0.003*t + 1.3*np.sin(t/5.) * np.exp(-t/100.)


# Number of data points
nd = 200

ti = np.arange(float(nd))
gi = g(ti)

mdiff = np.max(np.abs(gi[1:] - gi[0:-1]))
print("""Maximum absolute difference between consecutive
    values of g(t): """, mdiff)

# Standard deviation of noise
istd = 0.02

print("Input standard deviation: ", istd)
print("Number of 'data points': ", nd)
print()

# Add Gaussian noise to data
yi = gi + np.random.normal(0.0, istd, nd)

# Create class instance for equidistant sampling
bseq = pyasl.BSEqSamp()

# Specify jump parameter (j) for construction of beta sample
j = 1

# Order of approximation to use
Ns = [0, 1, 2, 3]

# Use to store noise estimates
smads, dsmads = [], []

# Loop over orders of approximation between 0 and 3
for N in Ns:

    # Get estimates of standard deviation based on robust (MAD-based) estimator
    smad, dsmad = bseq.betaSigma(yi, N, j, returnMAD=True)
    print("Order of approximation (N): ", N)

    print("    Size of beta sample: ", len(bseq.betaSample))
    print("    Robust estimate of noise std: %6.3f +/- %6.3f" % (smad, dsmad))
    # Save result
    smads.append(smad)
    dsmads.append(dsmad)

# Plot g(t) and the synthetic data
plt.subplot(2, 1, 1)
plt.title("Data (top) and noise estimates (bottom)")
plt.plot(ti, gi, 'b.-', label="$g(t_i)$")
plt.errorbar(ti, yi, yerr=np.ones(nd)*istd, fmt='r+', label="$y_i$")
plt.legend()
plt.subplot(2, 1, 2)
plt.title("N=0 is insufficient")
plt.errorbar(Ns, smads, yerr=dsmads, fmt='k+', label="Noise estimates")
plt.plot([min(Ns)-0.5, max(Ns)+0.5], [istd]*2, 'k--', label="Input value")
plt.legend()
plt.xlabel("Order of approximation (N)")
plt.ylabel("Noise STD")
plt.tight_layout()
plt.show()

The effect of outliers: A case for robust estimation

Outliers are a common nuisance in working with real data. They can strongly affect the \(\beta\sigma\) noise estimate even in otherwise very benign data sets. However, the effect of outliers on the noise estimate is much less significant, if robust estimation is used. In many cases, the gain in precision from using a more efficient estimator is insignificant, because the main uncertainties arise from deviations from the assumption of Gaussian noise.

from __future__ import print_function, division
import numpy as np
from PyAstronomy import pyasl
import matplotlib.pylab as plt

# Number of data points
nd = 200

# Input standard deviation
istd = 0.2

# Create some synthetic data (Gaussian noise) with
# input standard deviation.
y = np.random.normal(1.0, istd, nd)

# Introduce some outliers
# Number of outliers
no = 15
for _ in range(no):
    # Choose an index (could occur more than once)
    index = np.random.randint(0, high=nd)
    # Add point from normal distribution with
    # STD of 50
    y[index] = np.random.normal(1.0, 50.0)

# Create class instance for equidistant sampling
bseq = pyasl.BSEqSamp()

# Specify order of approximation (N) and jump parameter (j) for
# construction of beta sample
N = 0
j = 1

print("Order of approximation: ", N)
print("Jump parameter: ", j)
print()

# Get estimates of standard deviation based on MV estimator and ...
smv, dsmv = bseq.betaSigma(y, N, j, returnMAD=False)
# ... robust (MAD-based) estimator
smad, dsmad = bseq.betaSigma(y, N, j, returnMAD=True)

print("Input standard deviation: ", istd)
print("Number of 'data points': ", nd)
print("Size of beta sample: ", len(bseq.betaSample))
print()
print("Minimum-variance estimate: %6.3f +/- %6.3f" % (smv, dsmv))
print("Robust estimate: %6.3f +/- %6.3f" % (smad, dsmad))

plt.subplot(2, 1, 1)
plt.title("Synthetic data")
plt.plot(y, 'bp')
plt.subplot(2, 1, 2)
plt.title("Histogram of $\\beta$ sample")
plt.hist(bseq.betaSample, 30)
plt.show()

Equidistant vs. arbitrary sampling

The classes BSEqSamp() and BSArbSamp() treat the cases of equidistant and arbitrary sampling of the signal explicitly. Equidistant sampling is technically more simple to treat and is often a good approximation even if the actual sampling is not equidistant. This is true when there is a “not too complicated” transformation relating the actual and an equidistant sampling axes. The following example demonstrates a case, where equidistant sampling can be assumed but a higher order of approximation is required.

from __future__ import print_function, division
import numpy as np
import matplotlib.pylab as plt
from PyAstronomy import pyasl


def g(t):
    """
    Function determining the behavior of the data.
    """
    return 1.3 - 10.0*t


# Number of data points
nd = 30

# Creating non-equidistant samping axis (ti)
te = np.arange(float(nd))
ti = (te**3) / float(nd**2)

# Get values of g(t)
gi = g(ti)

# Standard deviation of noise
istd = 0.3

# Add Gaussian noise to data
yi = gi + np.random.normal(0.0, istd, nd)

print("Input standard deviation: ", istd)
print("Number of 'data points': ", nd)
print()

# Create class instance for equidistant sampling
bseq = pyasl.BSEqSamp()
# Create class instance for arbitrary sampling
bsar = pyasl.BSArbSamp()

# Get estimates assung equidistant and arbitrary sampling
# using N = 1 and j = 1. From the definition of g(t), N = 1
# will be sufficient for the case of arbitrary sampling, but
# not necessarily for (assumed) equidistant sampling.
smv_es, dsmv_es = bseq.betaSigma(yi, 1, 1)
smv_as, dsmv_as = bsar.betaSigma(ti, yi, 1, 1)

print("Estimates for N=1 and j=1")
print("    Equidistant sampling: %5.3f +/- %5.3f" % (smv_es, dsmv_es))
print("    Arbitrary sampling: %5.3f +/- %5.3f" % (smv_as, dsmv_as))
print()

# Get estimates for N=2 and 3 assuming equidistant sampling
smv_es2, dsmv_es2 = bseq.betaSigma(yi, 2, 1)
smv_es3, dsmv_es3 = bseq.betaSigma(yi, 3, 1)

print("Estimates for N=2 and 3 based on equidistant sampling")
print("    N = 2: %5.3f +/- %5.3f" % (smv_es2, dsmv_es2))
print("    N = 3: %5.3f +/- %5.3f" % (smv_es3, dsmv_es3))

plt.subplot(2, 1, 1)
plt.title("Data with true sampling")
plt.plot(ti, gi, 'b-')
plt.errorbar(ti, yi, yerr=np.ones(nd)*istd, fmt='b+')
plt.subplot(2, 1, 2)
plt.title("Same data assuming equidistant sampling")
plt.plot(te, gi, 'r-')
plt.errorbar(te, yi, yerr=np.ones(nd)*istd, fmt='r+')
plt.tight_layout()
plt.show()