Harshith Bhattaram

// who I am

The Data Behind the Developer

harshith.py

            # Who I am in Python 🐍

            class Harshith:

              def __init__(self):

                self.role = "Data Scientist"

                self.education = "MS CS @ UB"

                self.experience = 2.5 # years

                self.location = "NY, USA"

                self.domains = [

                  "Fleet Telematics",

                  "Healthcare AI",

                  "Finance ML"

                ]

                self.superpower = "XAI Pipelines"

              def passion(self):

                return "AI that explains itself"

From Data to Decisions

I'm a Data Scientist with 2.5 years of industry experience building end-to-end ML pipelines that actually ship. At LocoNav, I worked across computer vision, NLP, time-series forecasting, and graph-based optimization — always with a focus on production-quality, low-latency systems.

Currently pursuing my MS in Computer Science at the University at Buffalo, I'm deepening my expertise in Generative AI, Multimodal RAG, and Explainable AI. I believe the best models are ones you can trust — and explain.

I'm actively seeking roles in Data Science, ML Engineering, AI Engineering, and Data Engineering where I can build impactful, interpretable systems at scale.

🎓 MS CS @ UB — Feb 2026 📍 NY, USA ✉️ harshithbhattaram@gmail.com 🚀 Open to Work

// where I've been

Battle-Tested in Production

Software Engineer — Data Scientist

LocoNav Fleet Management Solutions · Gurugram, India

Jun 2022 – Jun 2024

Built a driver drowsiness & intoxication classifier on 10k+ video frames — achieving 87% accuracy with OpenCV + TensorFlow.
Reduced false negative drowsiness detections by 25% via facial landmark tracking and yawn-angle alertness modeling.
Cut model inference latency by 20% via optimized pipeline triggering real-time low-latency in-app alerts.
Developed GBM + LSTM ETA prediction models on 50k+ trips, improving forecast accuracy by 15%.
Reduced average trip duration by 12% via graph-based route optimization with real-time traffic data.

OpenCVTensorFlowLSTM GBMGPS TelematicsGraph Optimization Computer VisionReal-time ML

Data Science Intern

LocoNav Fleet Management Solutions · Gurugram, India

Jan 2022 – May 2022

Built a predictive maintenance pipeline from time-series telemetry data, handling class imbalance via SMOTE.
Achieved 85% precision in component failure prediction using ensemble classifiers with engineered degradation features.
Reduced unplanned vehicle downtime by 18%, deployed automated ML alert pipeline into fleet operations dashboard.

Time-SeriesSMOTEEnsemble Classifiers Feature EngineeringMLOpsFastAPI

// what I've built

Proof of Work

End-to-end systems — from messy data to deployed, explainable intelligence.

Credit Card Churn Prediction + XAI Dashboard

End-to-end ML pipeline using XGBoost & Logistic Regression on 10k+ customers. Achieved ROC-AUC of 0.99 and 90.5% recall. Deployed an interactive Streamlit app with SHAP global importance & waterfall plots for full model interpretability.

XGBoostSHAPStreamlitScikit-LearnEDAXAI

Code

Healthcare Patient Readmission Risk Modeling

Analyzed 100k+ row UCI diabetes dataset with SQL profiling. Engineered 44 features, validated 11 risk factors via chi-square & Welch's t-tests. Auto-generated clinical decision reports with SHAP waterfall charts, deployed on cloud.

XGBoostSQLSHAPHypothesis TestingAWSClinical AI

Code

NLP Sentiment & Topic Modeling Platform

Full NLP pipeline processing 24k+ Amazon reviews with VADER + DistilBERT sentiment analysis and LDA topic modeling. Modular spaCy preprocessing, parquet schema, 4-page interactive Plotly dashboard with time-series trend analysis.

DistilBERTVADERLDAspaCyPlotlyStreamlit

Code

Clinical Radiology Report Generation via Multimodal RAG

Multimodal RAG pipeline on MIMIC-CXR (250k+ chest X-ray pairs). BioMedCLIP (ViT-B/16) cross-modal embeddings, FAISS retrieval, dual-layer generation. <1s query latency via memory-constrained architecture. Evaluated with CheXbert, BLEU, ROUGE-L, BERTScore.

BioMedCLIPFAISSRAGViTBERTScoreMIMIC-CXR

Code

MLOps

MLOps Stock Price Forecasting Pipeline with Drift Monitoring

Full MLOps pipeline — daily yFinance ingestion into S3, ETL via Apache Airflow across 4 DAGs. RF, XGBoost & LSTM models served via Dockerized REST API. Evidently AI + KS-tests for data & prediction drift detection per ticker, with CI/CD retraining triggers via GitHub Actions.

Apache AirflowEvidently AIXGBoostLSTMDockerAWS S3GitHub ActionsCI/CD

Code

Automated Time-Series Stock Forecasting with Cloud Deployment

Multi-model forecasting system for 10 tickers (ARIMA, Prophet, PyTorch LSTM) with 18% RMSE improvement over naive baseline on 10 years of OHLCV data. Auto daily retraining via GitHub Actions + AWS Lambda + EventBridge, artifacts in S3, Docker images in ECR, live Plotly/Streamlit dashboard with 95% CI bands.

ARIMAProphetPyTorch LSTMAWS LambdaEventBridgeECRStreamlitPlotly

Code

GenAI

QLoRA Fine-Tuning Pipeline for Medical Q&A with FastAPI Serving

Fine-tuned a 7B LLM (LLaMA-3/Mistral) on 2k+ curated medical Q&A examples using QLoRA via HuggingFace TRL + PEFT on a free GPU. Quantified performance delta vs GPT-4o baseline with ROUGE & BLEU. LoRA adapter served via FastAPI endpoint backed by vLLM for efficient inference.

QLoRALLaMA-3HuggingFace TRLPEFTvLLMFastAPIROUGEBLEU

Code

NLP

NLP-Driven Job Market Intelligence Platform

NLP analytics pipeline over 2k+ job descriptions using BERTopic + TF-IDF vectorization and NER for skill extraction. Identified top 10 in-demand skills stored in PostgreSQL. Interactive Plotly dashboard for skill frequency, salary-to-skill correlation & keyword trends. LLM-powered (OpenAI) resume fit scoring module.

BERTopicTF-IDFNERPostgreSQLOpenAI APIPlotlyLLM Scoring

Code

// my toolkit

The Full Arsenal

Tools I use daily — and ones I'm actively sharpening. ★ marks high-demand market skills.

Languages & Core Tools

PythonSQLRBashPostgreSQLGitDocker★ Spark PySpark

ML & Data Science

Scikit-LearnXGBoostRandom ForestARIMASMOTEPCASHAPLIMEHypothesis Testing★ MLflow★ DVC

Deep Learning & NLP

TensorFlowPyTorchKerasHuggingFaceDistilBERTBioMedCLIPOpenCVspaCyLDATF-IDFNERBERTopic

Generative AI & LLMs

LangChainLangGraphRAGFAISSQLoRAFine-tuningOpenAI APIGeminiGroqLlamaVLMsvLLMPEFT★ TRL★ Prompt Engineering

Cloud, MLOps & Deployment

AWS S3AWS LambdaAWS RDSAzureGCPFastAPIStreamlitDockerRender★ Airflow★ Snowflake★ dbt★ CI/CDAWS ECRAWS EventBridge

Visualization & Data Handling

TableauPlotlyMatplotlibSeabornPandasNumPyEDAFeature EngineeringParquet★ Kafka

MLOps & Data Engineering

★ Apache Airflow★ Evidently AI★ GitHub ActionsAWS ECRAWS EventBridgeCI/CD Pipelines★ Drift MonitoringvLLMPEFT / TRLyFinanceProphet

// my edge

Things I'm Good At

Not just tools, real strengths that show up in every project.

Explainable AI

I don't just build black boxes. I build models you can explain to a doctor, an exec, or a regulator using SHAP, LIME, and TreeExplainer.

End-to-End ML Pipelines

From raw data ingestion to cloud deployment, I own the full stack. ETL, feature engineering, model training, validation, and production monitoring.

Low-Latency Production Systems

Built real-time inference pipelines with <1s latency at scale. I understand the gap between a notebook model and a production system.

Statistical Rigor

Hypothesis testing, chi-square, Welch's t-tests, cross-validation, I validate results statistically, not just by vibes or leaderboard scores.

Multi-Domain Adaptability

Fleet telematics, clinical healthcare, finance | I've applied ML across radically different domains, adapting techniques to context rapidly.

Generative AI & RAG

LangChain, LangGraph, FAISS, multimodal RAG | I build GenAI systems that are grounded, efficient, and actually useful in real applications.

Computer Vision

Real-world CV at scale | drowsiness detection, facial landmark tracking on video streams, and multimodal image-text alignment pipelines.

Cross-Functional Collaboration

Experienced in Agile workflows — planning, standups, retrospectives. I bridge the gap between data teams and engineering/product stakeholders.

MLOps & Drift-Aware Systems

I don't just train models — I monitor them in production. Drift detection with Evidently AI, KS-tests, and CI/CD retraining pipelines via GitHub Actions + Airflow keep models reliable long after deployment.

LLM Fine-Tuning & Serving

From instruction-tuning a 7B LLM with QLoRA on consumer hardware to serving LoRA adapters via vLLM + FastAPI — I bridge research-grade fine-tuning and production-ready inference.

The Data Behind the Developer

From Data to Decisions

Battle-Tested in Production

Proof of Work

Credit Card Churn Prediction + XAI Dashboard

Healthcare Patient Readmission Risk Modeling

NLP Sentiment & Topic Modeling Platform

Clinical Radiology Report Generation via Multimodal RAG

MLOps Stock Price Forecasting Pipeline with Drift Monitoring

Automated Time-Series Stock Forecasting with Cloud Deployment

QLoRA Fine-Tuning Pipeline for Medical Q&A with FastAPI Serving

NLP-Driven Job Market Intelligence Platform

The Full Arsenal

Things I'm Good At

Explainable AI

End-to-End ML Pipelines

Low-Latency Production Systems

Statistical Rigor

Multi-Domain Adaptability

Generative AI & RAG

Computer Vision

Cross-Functional Collaboration

MLOps & Drift-Aware Systems

LLM Fine-Tuning & Serving

Words That Drive the Work

Open a Conversation

Let's Connect