Leonardo Alvarino

Leonardo Alvarino

Data Analyst

South Jordan, UT

I build KPI's dashboards, data pipelines and ML models that turn messy data into clear decisions.

SQL Python R Power BI Tableau PySpark ETL Pipelines Snowflake Databricks Machine Learning

Projects

Early Sepsis Detection Using Machine Learning

Machine Learning & Healthcare

Early Sepsis Detection in the ICU Using Machine Learning

Built an end-to-end ML pipeline on 790,000+ ICU patient records to detect sepsis a median of 23 hours before clinical diagnosis. Achieved AUC of 0.885 using XGBoost with SHAP explainability.

Python XGBoost SHAP Pandas
Healthcare Adverse Events Pipeline

Data Engineering

Healthcare Adverse Events Pipeline

Built an end-to-end ETL pipeline that extracts drug adverse event reports from the FDA's OpenFDA API, transforms nested JSON into three structured tables, and loads them into Snowflake using idempotent MERGE statements. Processed 51,265 reports in a 16-day window, surfacing fentanyl and alcohol as the highest death-rate drugs reported to the FDA.

Python Snowflake OpenFDA API pandas ETL Pipelines
Market Stress Detection

Data Wrangling & ML

Predicting Tomorrow's Closing Price of the S&P 500

Built a Random Forest model to predict next-day S&P 500 direction using Yahoo Finance and FRED API data with feature engineering across 5 time horizons. Backtested a model-guided dollar-cost averaging strategy against passive DCA, both returning ~15% on $27.5K invested.

Python Scikit-Learn yfinance API FRED API
Analyzing Value Retention in the BMW Used Car Market

Data Wrangling & Visualization

Analyzing Value Retention in the BMW Used Car Market

Researched which BMW I should buy next — balancing fun and savings — by analyzing two Kaggle datasets on used car sales. Used linear regression to model depreciation across engine sizes and SUV models, revealing the 2 Series (3.0L) and X4 as the best bets for value retention.

R ggplot2 dplyr Linear Regression
Market Stress Detection

Data Wrangling & ML

Market Stress Detection

Built a market stress early warning system using S&P 500 data. Engineered risk features — rolling volatility, cumulative returns, and drawdowns — then trained a Logistic Regression model to estimate the probability of market stress. Found that risk spikes days before price drops, showing markets can look calm while becoming fragile.

Python Scikit-Learn Pandas Matplotlib
Predicting House Age using Machine Learning

Machine Learning

Predicting House Age using Machine Learning

Trained a Random Forest classifier to predict whether a house was built before or after 1980 using 48 features from a 22,900-row dataset. Achieved 92.3% accuracy, with living area, number of bathrooms, and stories as the top predictors.

Python Scikit-Learn Pandas Plotly

Ask the FDA Adverse Events Data

Powered by the Healthcare Adverse Events Pipeline — ask anything about drug safety reports stored in Snowflake. The AI writes the SQL, queries the database live, and explains the results in plain English.

Hi! I can answer questions about FDA drug adverse event data. Try one of the suggestions above or type your own question.