Data Scientist & AI/ML Enthusiast
Building end-to-end Data and ML systems that are powerful, explainable, and production-ready. From fleet telematics to healthcare AI and finance, I turn raw data into decisions that matter.
// who I am
I'm a Data Scientist with 2.5 years of industry experience building end-to-end ML pipelines that actually ship. At LocoNav, I worked across computer vision, NLP, time-series forecasting, and graph-based optimization — always with a focus on production-quality, low-latency systems.
Currently pursuing my MS in Computer Science at the University at Buffalo, I'm deepening my expertise in Generative AI, Multimodal RAG, and Explainable AI. I believe the best models are ones you can trust — and explain.
I'm actively seeking roles in Data Science, ML Engineering, AI Engineering, and Data Engineering where I can build impactful, interpretable systems at scale.
// where I've been
// what I've built
End-to-end systems — from messy data to deployed, explainable intelligence.
End-to-end ML pipeline using XGBoost & Logistic Regression on 10k+ customers. Achieved ROC-AUC of 0.99 and 90.5% recall. Deployed an interactive Streamlit app with SHAP global importance & waterfall plots for full model interpretability.
Analyzed 100k+ row UCI diabetes dataset with SQL profiling. Engineered 44 features, validated 11 risk factors via chi-square & Welch's t-tests. Auto-generated clinical decision reports with SHAP waterfall charts, deployed on cloud.
Full NLP pipeline processing 24k+ Amazon reviews with VADER + DistilBERT sentiment analysis and LDA topic modeling. Modular spaCy preprocessing, parquet schema, 4-page interactive Plotly dashboard with time-series trend analysis.
Multimodal RAG pipeline on MIMIC-CXR (250k+ chest X-ray pairs). BioMedCLIP (ViT-B/16) cross-modal embeddings, FAISS retrieval, dual-layer generation. <1s query latency via memory-constrained architecture. Evaluated with CheXbert, BLEU, ROUGE-L, BERTScore.
Full MLOps pipeline — daily yFinance ingestion into S3, ETL via Apache Airflow across 4 DAGs. RF, XGBoost & LSTM models served via Dockerized REST API. Evidently AI + KS-tests for data & prediction drift detection per ticker, with CI/CD retraining triggers via GitHub Actions.
Multi-model forecasting system for 10 tickers (ARIMA, Prophet, PyTorch LSTM) with 18% RMSE improvement over naive baseline on 10 years of OHLCV data. Auto daily retraining via GitHub Actions + AWS Lambda + EventBridge, artifacts in S3, Docker images in ECR, live Plotly/Streamlit dashboard with 95% CI bands.
Fine-tuned a 7B LLM (LLaMA-3/Mistral) on 2k+ curated medical Q&A examples using QLoRA via HuggingFace TRL + PEFT on a free GPU. Quantified performance delta vs GPT-4o baseline with ROUGE & BLEU. LoRA adapter served via FastAPI endpoint backed by vLLM for efficient inference.
NLP analytics pipeline over 2k+ job descriptions using BERTopic + TF-IDF vectorization and NER for skill extraction. Identified top 10 in-demand skills stored in PostgreSQL. Interactive Plotly dashboard for skill frequency, salary-to-skill correlation & keyword trends. LLM-powered (OpenAI) resume fit scoring module.
// my toolkit
Tools I use daily — and ones I'm actively sharpening. ★ marks high-demand market skills.
// my edge
Not just tools, real strengths that show up in every project.
I don't just build black boxes. I build models you can explain to a doctor, an exec, or a regulator using SHAP, LIME, and TreeExplainer.
From raw data ingestion to cloud deployment, I own the full stack. ETL, feature engineering, model training, validation, and production monitoring.
Built real-time inference pipelines with <1s latency at scale. I understand the gap between a notebook model and a production system.
Hypothesis testing, chi-square, Welch's t-tests, cross-validation, I validate results statistically, not just by vibes or leaderboard scores.
Fleet telematics, clinical healthcare, finance | I've applied ML across radically different domains, adapting techniques to context rapidly.
LangChain, LangGraph, FAISS, multimodal RAG | I build GenAI systems that are grounded, efficient, and actually useful in real applications.
Real-world CV at scale | drowsiness detection, facial landmark tracking on video streams, and multimodal image-text alignment pipelines.
Experienced in Agile workflows — planning, standups, retrospectives. I bridge the gap between data teams and engineering/product stakeholders.
I don't just train models — I monitor them in production. Drift detection with Evidently AI, KS-tests, and CI/CD retraining pipelines via GitHub Actions + Airflow keep models reliable long after deployment.
From instruction-tuning a 7B LLM with QLoRA on consumer hardware to serving LoRA adapters via vLLM + FastAPI — I bridge research-grade fine-tuning and production-ready inference.
// fuel for the mind
In God we trust. All others must bring data.
// Let's build together
Whether it's a role, a collaboration, or just a great Data, AI, ML related chat, my inbox is open.
I'm actively looking for Data Scientist, Data Engineer, AI Engineer and ML Engineer roles. If you're building something interesting with data and AI, let's talk.