Outlier Map of Classical and Robust Principal Component Analysis
Keywords:
Outliers, Classical Principal Component Analysis (CPCA), Robust PCA (ROBPCA)Abstract
Classical Principal Component Analysis (CPCA) is widely used for dimensionality reduction, but it is highly sensitive to the presence of outliers, leading to distorted covariance estimates and unreliable principal components. To address this, Robust PCA (ROBPCA) integrates robust covariance estimation and projection pursuit to minimize the effects of outliers. Although CPCA and ROBPCA are often utilized for high-dimensional data , it is equally effective in low-dimensional settings, particularly when handling large datasets. This research illustrates the benefit of ROBPCA over CPCA by analyzing a large-scale gene expression dataset with 22 features and 47231 observations to demonstrate its efficiency in identifying and classifying outliers using outlier maps. Findings reveal that CPCA misidentifies outliers, leading to inflated variance structures and poor principal component estimation, whereas ROBPCA successfully isolates outliers, preserving data integrity and enhancing interpretability. This research emphasizes how ROBPCA improves data reliability and offers a reliable method for identifying outliers.