Comparative Analysis of Principal Component Regression (PCR) and Partial Least Squares Regression (PLS) on Air Quality Data Using R

This project provides a comparative analysis of Principal Component Regression
(PCR) and Partial Least Squares Regression (PLS) for predicting benzene (C6H6) concentrations
using the Air Quality Dataset from the UCI Machine Learning Repository. The primary goal is to
address multicollinearity and dimensionality reduction to improve predictive accuracy. Results
showed that PLS had superior performance with RMSE of 0.972 and R-squared of 0.974, while
PCR had a higher RMSE of 1.572 and R-squared of 0.933.

Tools/Skills

R, Principal Component Analysis, PLS, Regression Models, Data Cleaning