Software Engineer and Digital Transformation Specialist building machine learning solutions applied to real business problems. Based in Quito, Ecuador.
Project 01 â Classification
ML model to predict customer churn in the telecom sector. Compares Logistic Regression, Random Forest and XGBoost with SMOTE balancing. Identifies that month-to-month contracts have a 42.7% churn rate vs 2.8% for two-year contracts.
Project 02 â NLP
NLP pipeline to classify 27,948 app reviews as Positive, Negative or Neutral. TF-IDF vectorization with bigrams captures negations like "not good". Logistic Regression achieved F1 of 0.87 on a 3-class imbalanced dataset.
Project 03 â Unsupervised ML
K-Means clustering on 76,310 U.S. companies with multi-agency federal enforcement records. Identifies 4 risk profiles from systemic violators to wage theft patterns. Applied log transformation + RobustScaler to handle extreme financial outliers.
Project 04 â Imbalanced Classification
Fraud detection on 284,807 real transactions with extreme class imbalance (0.17% fraud). Dual strategy: SMOTE inside pipeline + class_weight balancing. Random Forest detects 80 of 98 real frauds with only 11 false alarms.
Project 05 â Medical Classification
ML model to classify patients into 3 orthopaedic diagnoses (Disc Hernia, Normal, Spondylolisthesis) from 6 biomechanical measurements obtained from spinal X-rays. Spondylolisthesis detected with F1 of 0.98. Includes interactive HTML report.
Project 06 â E-Commerce Analytics
End-to-end analysis of 805,549 real transactions from a UK online retailer. Covers business EDA, RFM segmentation (K-Means k=4), sales forecasting with Gradient Boosting, and an item-based recommendation system using cosine similarity.
Project 07 â D2C Analytics
Full analysis of a D2C skincare brand across 6 relational tables. Part A: custom Profitability Score combining margin, discounts and return rate. Part B: CLV by acquisition channel, cohort retention analysis and return breakdown. Two interactive HTML reports.
Project 08 â Sports Prediction
ML model trained on 184 historical World Cup matches (1930â2022) to predict all 72 group stage fixtures for 2026. Logistic Regression with Bayesian smoothing and FIFA ranking features. Includes interactive HTML report with all group predictions and probability bars.
Project 09 â Geospatial Analytics
Geospatial clustering of 1,432 volcanoes worldwide using K-Means. The algorithm rediscovered the 4 major volcanic belts (Ring of Fire, Andes, East African Rift, Asian arc) without any geographic labels. Includes interactive Plotly world map with hover details for every volcano.
Project 10 â Predictive Health
Two models, two stories: a height predictor that couldn't beat an 1886 statistical formula (Galton's midparent rule), and a health risk classifier that achieved 78% accuracy using family disease history and parental age as the strongest predictors. Includes blood group inheritance patterns following real Mendelian ABO rules.
Machine Learning
NLP
Data
Visualization
Engineering