{ "cells": [ { "cell_type": "markdown", "id": "73506d0e", "metadata": {}, "source": [ "SYSREM, Principle Component Analysis, and eigenvectors\n", "===================================================" ] }, { "cell_type": "markdown", "id": "e94df4f3", "metadata": {}, "source": [ "A comparison between the results of SYSREM, a PCA, and the eigenvectors of a small artificial data matrix, using equal uncertainties for all data points. In this case, all of these are basically equivalent." ] }, { "cell_type": "code", "execution_count": 1, "id": "e3d340d1", "metadata": {}, "outputs": [], "source": [ "from PyAstronomy import pyasl\n", "import numpy as np\n", "from sklearn.decomposition import PCA\n", "np.set_printoptions(precision=4)" ] }, { "cell_type": "code", "execution_count": 2, "id": "de91d740", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Observations\n", "[[ 1. 2. 3. 4.]\n", " [ 2. 5. 16. 41.]\n", " [ 3. 8. 29. 78.]\n", " [ 4. 11. 42. 115.]\n", " [ 5. 14. 55. 152.]\n", " [ 6. 17. 68. 189.]\n", " [ 7. 20. 81. 226.]]\n" ] } ], "source": [ "# n observations (e.g., light curves) with m data points each\n", "n = 4\n", "m = 7\n", "\n", "# Some arbitrary observations (with observations is COLUMNS)\n", "obs = np.zeros( (m,n) )\n", "for i in range(0, n):\n", " for j in range(m):\n", " obs[j,i] = j*i**3+j*i**2+(j+i+1)\n", "# Equal error for all data points\n", "sigs = np.ones_like(obs)\n", "\n", "print(\"Observations\")\n", "print(obs)" ] }, { "cell_type": "markdown", "id": "a3c7d8f3", "metadata": {}, "source": [ "*PCA*" ] }, { "cell_type": "code", "execution_count": 3, "id": "5df110db", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PCA analysis with sklearn\n", "PCA components:\n", "[[ 0.0074 0.1106 0.2137 0.3168 0.4199 0.523 0.6261]\n", " [-0.6813 -0.523 -0.3646 -0.2062 -0.0478 0.1106 0.2689]\n", " [ 0.7132 -0.4009 -0.4557 -0.098 -0.0708 -0.0164 0.3286]\n", " [ 0.0125 -0.3426 0.009 0.1088 0.8224 -0.3686 -0.2414]]\n", "N features = 7\n", "N samples = 4\n", "Means = [ 2.5 16. 29.5 43. 56.5 70. 83.5]\n" ] } ], "source": [ "print(\"PCA analysis with sklearn\")\n", "pca = PCA()\n", "# Use transpose to arrange observations along rows\n", "res = pca.fit(obs.T)\n", "print(f\"PCA components:\")\n", "print(pca.components_)\n", "print(f\"N features = {pca.n_features_in_}\")\n", "print(f\"N samples = {pca.n_samples_}\")\n", "print(f\"Means = {pca.mean_}\")" ] }, { "cell_type": "markdown", "id": "8d4479cd", "metadata": {}, "source": [ "*Eigenvalues and -vectors (of covarinace matrix)*" ] }, { "cell_type": "code", "execution_count": 4, "id": "cf0fa560", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Centering data matrix\n", "Eigenvalues = [2.5680e+04 5.2070e-01 6.0519e-13 6.0519e-13 4.1886e-13 5.3808e-14\n", " 2.8087e-14]\n", "Eigenvalue and corresponding eigenvector\n", " 25680.1 [0.0074 0.1106 0.2137 0.3168 0.4199 0.523 0.6261]\n", " 0.520696 [-0.6813 -0.523 -0.3646 -0.2062 -0.0478 0.1106 0.2689]\n" ] } ], "source": [ "print(\"Centering data matrix\")\n", "obscentered = obs.copy()\n", "for i in range(obs.shape[0]):\n", " obscentered[i,::] -= np.mean(obscentered[i,::])\n", "\n", "# Covariance matrix\n", "covm = np.matmul(obscentered, obscentered.T) / (n-1)\n", "V, W = np.linalg.eig(covm)\n", "V = np.abs(V)\n", "print(f\"Eigenvalues = {np.array(sorted(V, reverse=True))}\")\n", "indi = np.argsort(V)\n", "print(\"Eigenvalue and corresponding eigenvector\")\n", "for i in range(2):\n", " print(\" %g \" % V[indi[-1-i]], np.real(W[::,indi[-1-i]]))" ] }, { "cell_type": "markdown", "id": "1fd8282a", "metadata": {}, "source": [ "*SYSREM*\n", "\n", "The unit length vectors 'a' of the SYSREM model may be compared with the PCA components and the eigenvectors pertaining to the largest two eigenvalues. They are equal except for, potentially, a sign. " ] }, { "cell_type": "code", "execution_count": 5, "id": "622cd6b5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First SYSREM iteration\n", "unit vector a = [0.0074 0.1106 0.2137 0.3168 0.4199 0.523 0.6261]\n", "last_ac_iterations: 1\n", "Second SYSREM iteration\n", "unit vector a = [-0.6813 -0.523 -0.3646 -0.2062 -0.0478 0.1106 0.2689]\n", "last_ac_iterations: 0\n" ] } ], "source": [ "# Instatiate SYSREM object. Apply centering across fetaures but not along observations \n", "sr = pyasl.SysRem(obs, sigs, ms_obs=False, ms_feat=True)\n", "print(\"First SYSREM iteration\")\n", "r1, a1, c1 = sr.iterate()\n", "print(\"unit vector a = \", a1/np.linalg.norm(a1))\n", "print(\"last_ac_iterations: \", sr.last_ac_iterations)\n", "print(\"Second SYSREM iteration\")\n", "r2, a2, c2 = sr.iterate()\n", "print(\"unit vector a = \", a2/np.linalg.norm(a2))\n", "print(\"last_ac_iterations: \", sr.last_ac_iterations)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }