Cross-correlation

PyAstronomy.pyasl.crosscorrRV(w, f, tw, tf, rvmin, rvmax, drv, mode='doppler', skipedge=0, edgeTapering=None, weights=None, meanwvl=None)

Cross-correlate a spectrum with a template.

The algorithm implemented here works as follows: For each RV shift to be considered, the wavelength axis of the template is shifted, either linearly or using a proper Doppler shift depending on the mode. The shifted template is then linearly interpolated at the wavelength points of the observation (spectrum) to calculate the cross-correlation function.

If N data points \(f_i\) at wavelengths \(w_i\) are given, the cross correlation function \(CC(v_j)\), depending on the velocity \(v_j\) and optional weights \(\alpha_i\) is calculated as:

\[CC(v_j) = \sum_{i=1}^N \alpha_i \times \left(f_i \times t(w_i-\Delta_{i,j}) \right)\]

If the mode is lin, the shift is implemented as \(\Delta_{i,j} = \bar{w} (v_j/c)\), where \(\bar{w}\) is the mean wavelength (i.e., the mean of the input w unless meanwvl is specified). For any velocity, the shift is a constant across the entire wavelength axis. If the mode is doppler, the shift is implemented as \(\Delta_{i,j} = w_i (v_j/c)\), which depends both on the wavelength and the velocity. The evaluation of the shifted template \(t(w_i-\Delta_{i,j})\) is carried out using linear interpolation.

If no weights are specified, \(\alpha_i = 1\) is adopted. If uncertainties, \(\sigma_i\), for the \(f_i\) are known, a useful choice for the weights is \(\alpha_i = \sigma_i^{-2}\), because this makes maximizing the CCF equivalent to maximizing the likelihood (see, e.g., Lookwood et al. 2014, ApJ 783).

Parameters
warray

The wavelength axis of the observation.

farray

The flux axis of the observation.

twarray

The wavelength axis of the template.

tfarray

The flux axis of the template.

rvminfloat

Minimum radial velocity for which to calculate the cross-correlation function [km/s].

rvmaxfloat

Maximum radial velocity for which to calculate the cross-correlation function [km/s].

drvfloat

The width of the radial-velocity steps to be applied in the calculation of the cross-correlation function [km/s].

modestring, {lin, doppler}, optional

The mode determines how the wavelength axis will be modified to represent a RV shift. If “lin” is specified, a mean wavelength shift will be calculated based on the mean wavelength of the observation. The wavelength axis will then be shifted by that amount. If “doppler” is specified (the default), the wavelength axis will properly be Doppler-shifted.

skipedgeint, optional

If larger zero, the specified number of bins will be skipped from the begin and end of the observation. This may be useful if the template does not provide sufficient coverage of the observation.

edgeTaperingfloat or tuple of two floats

If not None, the method will “taper off” the edges of the observed spectrum by multiplying with a sine function. If a float number is specified, this will define the width (in wavelength units) to be used for tapering on both sides. If different tapering widths shall be used, a tuple with two (positive) numbers must be given, specifying the width to be used on the low- and high wavelength end. If a nonzero ‘skipedge’ is given, it will be applied first. Edge tapering can help to avoid edge effects (see, e.g., Gullberg and Lindegren 2002, A&A 390).

weightsarray of floats, optional

If given, it specifies weights for individual data points (given by f). A possible choice for weights is 1/sigma_i**2 where sigma_i is the uncertainty of the i-th data point.

meanwvlfloat, optional

If given, this value will be adopted for the mean wavelength \(\bar{w}\) to calculate shifts in ‘lin’ mode instead of the mean of the input array w.

Returns
dRVarray

The RV axis of the cross-correlation function. The radial velocity refer to a shift of the template, i.e., positive values indicate that the template has been red-shifted and negative numbers indicate a blue-shift of the template. The numbers are given in km/s.

CCarray

The cross-correlation function.

See also

An algorithm for finding the extreme points by parabolic approximation (quadExtreme()).

Example: Cross-correlation with a Gaussian

from __future__ import print_function, division
from PyAstronomy import pyasl
import numpy as np
import matplotlib.pylab as plt

# Create the template
tw = np.linspace(5000, 5010, 1000)
tf = np.exp(-(tw-5004.0)**2/(2.*0.1**2))

# Create data, which are not that well sampled
dw = np.linspace(5000, 5010, 200)
df = np.exp(-(dw-5004.17)**2/(2.*0.1**2))

# Plot template and data
plt.title("Template (blue) and data (red)")
plt.plot(tw, tf, 'b.-')
plt.plot(dw, df, 'r.-')
plt.show()

# Carry out the cross-correlation.
# The RV-range is -30 - +30 km/s in steps of 0.6 km/s.
# The first and last 20 points of the data are skipped.
rv, cc = pyasl.crosscorrRV(dw, df, tw, tf, -30., 30., 30./50., skipedge=20)

# Find the index of maximum cross-correlation function
maxind = np.argmax(cc)

print("Cross-correlation function is maximized at dRV = ", rv[maxind], " km/s")
if rv[maxind] > 0.0:
    print("  A red-shift with respect to the template")
else:
    print("  A blue-shift with respect to the template")

plt.plot(rv, cc, 'bp-')
plt.plot(rv[maxind], cc[maxind], 'ro')
plt.show()

Example: Cross-correlation including weights

from __future__ import print_function, division
from PyAstronomy import pyasl
import numpy as np
import matplotlib.pylab as plt

# Create the template
tw = np.linspace(5000, 5010, 1000)
tf = np.exp(-(tw-5004.0)**2/(2.*0.1**2))

# Create data, which are not that well sampled
dw = np.linspace(5000, 5010, 200)
df = np.exp(-(dw-5004.17)**2/(2.*0.1**2))
df += 0.2*np.exp(-(dw-5004.67)**2/(2.*0.1**2))
# Define errors, use large errors for strong peak
err = np.ones_like(dw) * 0.01
err[(dw > 5004) & (dw < 5004.4)] *= 40

# Plot template and data
plt.subplot(2,1,1)
plt.title("Template (blue) and data (red)")
plt.plot(tw, tf, 'b.-', label="Template")
plt.errorbar(dw, df, yerr=err, fmt='r+-', label="Date")
plt.legend()

# Carry out the cross-correlation.
# The RV-range is -30 - +100 km/s in steps of 0.6 km/s.
# The first and last 50 points of the data are skipped.
rv, cc = pyasl.crosscorrRV(dw, df, tw, tf, -30., 100., 1., skipedge=50)

# Include weights (1/err**2)
rvw, ccw = pyasl.crosscorrRV(dw, df, tw, tf, -30., 100., 1., skipedge=50, \
    weights=1/err**2)

plt.subplot(2,1,2)
plt.plot(rv, cc/max(cc), 'b-', label="normalized unweighted CCF")
plt.plot(rvw, ccw/max(ccw), 'r.-', label="normalized weighted CCF")
plt.legend()
plt.show()