This project focuses on predicting the onset of diabetes using logistic regression on the Pima Indians Diabetes Database from Kaggle. The dataset includes variables such as glucose concentration, blood pressure, BMI, and age. The model achieved an AUC value of 0.8396 and an overall accuracy of 78.39%, effectively identifying key risk factors associated with diabetes.
This project explores Ridge and Lasso regression techniques in predicting college graduation rates using the College dataset. Ridge regression slightly outperformed Lasso regression in terms of predictive accuracy, while Lasso offered more interpretable results by performing feature selection.
This project provides a comparative analysis of Principal Component Regression (PCR) and Partial Least Squares Regression (PLS) for predicting benzene (C6H6) concentrations using the Air Quality Dataset from the UCI Machine Learning Repository. The primary goal is to address multicollinearity and dimensionality reduction to improve predictive accuracy. Results showed that PLS had superior performance with RMSE of 0.972 and R-squared of 0.974, while PCR had a higher RMSE of 1.572 and R-squared of 0.933.
Tools/Skills
R, Principal Component Analysis, PLS, Regression Models, Data Cleaning
The A/B Testing Project aims to explore and analyze the effectiveness of a new design variant compared to an existing one through rigorous statistical analysis and experimentation. By leveraging user interaction data, the project seeks to uncover actionable insights into various metrics such as completion rates, time spent on steps, error rates, and abandonment rates. Through data preparation, exploration, analysis, and statistical testing using Python libraries such as scipy and statsmodels, the project determines whether the proposed design changes lead to meaningful improvements in user engagement and overall user experience.
Tools/Skills
Python, A/B Testing, Hypothesis Testing, Data Visualization, Exploratory Data Analysis
A comprehensive data analysis project focused on the FIFA World Cup 2022, sourced from HiCounselor.com. The project involved advanced SQL techniques, including complex joins, subqueries, Common Table Expressions (CTEs), and stored procedures, to analyze player and team performance, match outcomes, and key tournament insights.
This project focuses on analyzing the top 50 charts of all Spanish-speaking countries (except Cuba) for a specific week. The analysis includes data collection from Spotify, data storage and management using SQL, detailed data analysis through SQL queries in Python, and visualization using Tableau.
Tools/Skills
MySQL, Python, Tableau, Data Collection, Data Analysis
This project provides an in-depth analysis of the electric vehicle (EV) landscape in Washington State using Tableau. It covers data from the Washington State Department of Licensing, detailing the distribution, growth, and characteristics of Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs).
This Streamlit app predicts monthly rent prices for apartments across various states in the U.S. The project involved data cleaning, exploratory data analysis, and applying machine learning models to identify the best predictors. The app allows users to input apartment features and compare rental price distributions and averages across states.
Tools/Skills
Python, Streamlit, Machine Learning, Data Cleaning, Data Analysis
Cryptocurrency Explorer is a comprehensive application built to provide real-time data tracking, historical analysis, and predictive modeling of cryptocurrency prices. It leverages technologies like Streamlit to offer a user-friendly experience for both casual enthusiasts and data professionals alike. Users can track real-time cryptocurrency prices, analyze historical trends, and predict future prices using the ARIMA forecasting model.
Tools/Skills
Streamlit, Python, Data Visualization, ARIMA, Time Series Forecasting, CoinGecko API