Open to Opportunities

Harshith Bhattaram

Data Scientist & AI/ML Enthusiast

Building end-to-end Data and ML systems that are powerful, explainable, and production-ready. From fleet telematics to healthcare AI and finance, I turn raw data into decisions that matter.

Harshith Bhattaram
2.5+
Yrs Experience
6+
Projects
3+
Domains

// who I am

The Data Behind the Developer

harshith.py
# Who I am in Python 🐍
class Harshith:
  def __init__(self):
    self.role = "Data Scientist"
    self.education = "MS CS @ UB"
    self.experience = 2.5 # years
    self.location = "NY, USA"
    self.domains = [
      "Fleet Telematics",
      "Healthcare AI",
      "Finance ML"
    ]
    self.superpower = "XAI Pipelines"

  def passion(self):
    return "AI that explains itself"

From Data to Decisions

I'm a Data Scientist with 2.5 years of industry experience building end-to-end ML pipelines that actually ship. At LocoNav, I worked across computer vision, NLP, time-series forecasting, and graph-based optimization — always with a focus on production-quality, low-latency systems.

Currently pursuing my MS in Computer Science at the University at Buffalo, I'm deepening my expertise in Generative AI, Multimodal RAG, and Explainable AI. I believe the best models are ones you can trust — and explain.

I'm actively seeking roles in Data Science, ML Engineering, AI Engineering, and Data Engineering where I can build impactful, interpretable systems at scale.

🎓 MS CS @ UB — Feb 2026 📍 NY, USA ✉️ harshithbhattaram@gmail.com 🚀 Open to Work

// where I've been

Battle-Tested in Production

Software Engineer — Data Scientist
LocoNav Fleet Management Solutions · Gurugram, India
Jun 2022 – Jun 2024
  • Built a driver drowsiness & intoxication classifier on 10k+ video frames — achieving 87% accuracy with OpenCV + TensorFlow.
  • Reduced false negative drowsiness detections by 25% via facial landmark tracking and yawn-angle alertness modeling.
  • Cut model inference latency by 20% via optimized pipeline triggering real-time low-latency in-app alerts.
  • Developed GBM + LSTM ETA prediction models on 50k+ trips, improving forecast accuracy by 15%.
  • Reduced average trip duration by 12% via graph-based route optimization with real-time traffic data.
OpenCVTensorFlowLSTM GBMGPS TelematicsGraph Optimization Computer VisionReal-time ML
Data Science Intern
LocoNav Fleet Management Solutions · Gurugram, India
Jan 2022 – May 2022
  • Built a predictive maintenance pipeline from time-series telemetry data, handling class imbalance via SMOTE.
  • Achieved 85% precision in component failure prediction using ensemble classifiers with engineered degradation features.
  • Reduced unplanned vehicle downtime by 18%, deployed automated ML alert pipeline into fleet operations dashboard.
Time-SeriesSMOTEEnsemble Classifiers Feature EngineeringMLOpsFastAPI

// what I've built

Proof of Work

End-to-end systems — from messy data to deployed, explainable intelligence.

Credit Card Churn Prediction + XAI Dashboard

End-to-end ML pipeline using XGBoost & Logistic Regression on 10k+ customers. Achieved ROC-AUC of 0.99 and 90.5% recall. Deployed an interactive Streamlit app with SHAP global importance & waterfall plots for full model interpretability.

XGBoostSHAPStreamlitScikit-LearnEDAXAI

Healthcare Patient Readmission Risk Modeling

Analyzed 100k+ row UCI diabetes dataset with SQL profiling. Engineered 44 features, validated 11 risk factors via chi-square & Welch's t-tests. Auto-generated clinical decision reports with SHAP waterfall charts, deployed on cloud.

XGBoostSQLSHAPHypothesis TestingAWSClinical AI

NLP Sentiment & Topic Modeling Platform

Full NLP pipeline processing 24k+ Amazon reviews with VADER + DistilBERT sentiment analysis and LDA topic modeling. Modular spaCy preprocessing, parquet schema, 4-page interactive Plotly dashboard with time-series trend analysis.

DistilBERTVADERLDAspaCyPlotlyStreamlit

Clinical Radiology Report Generation via Multimodal RAG

Multimodal RAG pipeline on MIMIC-CXR (250k+ chest X-ray pairs). BioMedCLIP (ViT-B/16) cross-modal embeddings, FAISS retrieval, dual-layer generation. <1s query latency via memory-constrained architecture. Evaluated with CheXbert, BLEU, ROUGE-L, BERTScore.

BioMedCLIPFAISSRAGViTBERTScoreMIMIC-CXR
MLOps

MLOps Stock Price Forecasting Pipeline with Drift Monitoring

Full MLOps pipeline — daily yFinance ingestion into S3, ETL via Apache Airflow across 4 DAGs. RF, XGBoost & LSTM models served via Dockerized REST API. Evidently AI + KS-tests for data & prediction drift detection per ticker, with CI/CD retraining triggers via GitHub Actions.

Apache AirflowEvidently AIXGBoostLSTMDockerAWS S3GitHub ActionsCI/CD

Automated Time-Series Stock Forecasting with Cloud Deployment

Multi-model forecasting system for 10 tickers (ARIMA, Prophet, PyTorch LSTM) with 18% RMSE improvement over naive baseline on 10 years of OHLCV data. Auto daily retraining via GitHub Actions + AWS Lambda + EventBridge, artifacts in S3, Docker images in ECR, live Plotly/Streamlit dashboard with 95% CI bands.

ARIMAProphetPyTorch LSTMAWS LambdaEventBridgeECRStreamlitPlotly
GenAI

QLoRA Fine-Tuning Pipeline for Medical Q&A with FastAPI Serving

Fine-tuned a 7B LLM (LLaMA-3/Mistral) on 2k+ curated medical Q&A examples using QLoRA via HuggingFace TRL + PEFT on a free GPU. Quantified performance delta vs GPT-4o baseline with ROUGE & BLEU. LoRA adapter served via FastAPI endpoint backed by vLLM for efficient inference.

QLoRALLaMA-3HuggingFace TRLPEFTvLLMFastAPIROUGEBLEU
NLP

NLP-Driven Job Market Intelligence Platform

NLP analytics pipeline over 2k+ job descriptions using BERTopic + TF-IDF vectorization and NER for skill extraction. Identified top 10 in-demand skills stored in PostgreSQL. Interactive Plotly dashboard for skill frequency, salary-to-skill correlation & keyword trends. LLM-powered (OpenAI) resume fit scoring module.

BERTopicTF-IDFNERPostgreSQLOpenAI APIPlotlyLLM Scoring

// my toolkit

The Full Arsenal

Tools I use daily — and ones I'm actively sharpening. marks high-demand market skills.

Languages & Core Tools
PythonSQLRBashPostgreSQLGitDocker★ Spark PySpark
ML & Data Science
Scikit-LearnXGBoostRandom ForestARIMASMOTEPCASHAPLIMEHypothesis Testing★ MLflow★ DVC
Deep Learning & NLP
TensorFlowPyTorchKerasHuggingFaceDistilBERTBioMedCLIPOpenCVspaCyLDATF-IDFNERBERTopic
Generative AI & LLMs
LangChainLangGraphRAGFAISSQLoRAFine-tuningOpenAI APIGeminiGroqLlamaVLMsvLLMPEFT★ TRL★ Prompt Engineering
Cloud, MLOps & Deployment
AWS S3AWS LambdaAWS RDSAzureGCPFastAPIStreamlitDockerRender★ Airflow★ Snowflake★ dbt★ CI/CDAWS ECRAWS EventBridge
Visualization & Data Handling
TableauPlotlyMatplotlibSeabornPandasNumPyEDAFeature EngineeringParquet★ Kafka
MLOps & Data Engineering
★ Apache Airflow★ Evidently AI★ GitHub ActionsAWS ECRAWS EventBridgeCI/CD Pipelines★ Drift MonitoringvLLMPEFT / TRLyFinanceProphet

// my edge

Things I'm Good At

Not just tools, real strengths that show up in every project.

Explainable AI

I don't just build black boxes. I build models you can explain to a doctor, an exec, or a regulator using SHAP, LIME, and TreeExplainer.

End-to-End ML Pipelines

From raw data ingestion to cloud deployment, I own the full stack. ETL, feature engineering, model training, validation, and production monitoring.

Low-Latency Production Systems

Built real-time inference pipelines with <1s latency at scale. I understand the gap between a notebook model and a production system.

Statistical Rigor

Hypothesis testing, chi-square, Welch's t-tests, cross-validation, I validate results statistically, not just by vibes or leaderboard scores.

Multi-Domain Adaptability

Fleet telematics, clinical healthcare, finance | I've applied ML across radically different domains, adapting techniques to context rapidly.

Generative AI & RAG

LangChain, LangGraph, FAISS, multimodal RAG | I build GenAI systems that are grounded, efficient, and actually useful in real applications.

Computer Vision

Real-world CV at scale | drowsiness detection, facial landmark tracking on video streams, and multimodal image-text alignment pipelines.

Cross-Functional Collaboration

Experienced in Agile workflows — planning, standups, retrospectives. I bridge the gap between data teams and engineering/product stakeholders.

MLOps & Drift-Aware Systems

I don't just train models — I monitor them in production. Drift detection with Evidently AI, KS-tests, and CI/CD retraining pipelines via GitHub Actions + Airflow keep models reliable long after deployment.

LLM Fine-Tuning & Serving

From instruction-tuning a 7B LLM with QLoRA on consumer hardware to serving LoRA adapters via vLLM + FastAPI — I bridge research-grade fine-tuning and production-ready inference.

// fuel for the mind

Words That Drive the Work

"

In God we trust. All others must bring data.

— W. Edwards Deming

// Let's build together

Open a Conversation

Whether it's a role, a collaboration, or just a great Data, AI, ML related chat, my inbox is open.

Let's Connect

I'm actively looking for Data Scientist, Data Engineer, AI Engineer and ML Engineer roles. If you're building something interesting with data and AI, let's talk.