Jiten Bhalavat

AI / ML Engineer

I always start by understanding what the user actually needs, study the problem deeply, and then build solutions that truly solve it end-to-end. I work on AI systems like RAG pipelines and LLM fine-tuning, with a focus on building agents that can reason and take actions.

I’m also exploring post-training and scalable inference, while building voice AI and fine-tuning open-source models to compete with larger systems.

Currently: MS in Applied Machine Learning at University of Maryland, College Park
Previously: ML Engineer Intern at Plutomen Technologies · Research Assistant at CHARUSAT

My Best Past Work

Plutomen Technologies · Gujarat, India
AI Engineer Intern
Sept 2023 – April 2024

Built RAG pipelines for document intelligence with ~89% data extraction accuracy. Used LLM-as-a-judge evaluation to cut hallucinations and improve relevance, and shipped low-latency (<200ms) inference APIs.

89% data extraction
accuracy
<200ms API
latency
View profile
Nxon · Gujarat, India
ML Engineer Intern
May 2022 – June 2022

Built transformer models using PEFT (LoRA), reducing parameters by 99% and speeding up training by 60% on multi-GPU setups, while improving multilingual code intelligence with strong evaluation (CodeBLEU) and scalable pipelines.

10 languages
supported
LoRA fine-tuning
approach
View profile
View Full Timeline

Projects

InterviewAI — Speech-to-Speech Mock Interview Platform

Featured

AI-powered mock interview platform combining real-time speech-to-speech AI with expert-led sessions. Uses ElevenLabs for voice synthesis, GCP for backend, and Firebase for auth — giving job seekers AI-driven practice and expert feedback in one place.

  • Voice pipeline: Real-time Speech-to-Speech using ElevenLabs for natural conversational AI interactions
  • Cloud backend: Google Cloud Platform for scalable hosting and ML inference serving
  • Auth & DB: Firebase for authentication and real-time data synchronization
  • Frontend: TypeScript with responsive UI for a seamless interview experience
  • AI evaluation: LLM-based feedback on answer quality, clarity, and communication structure
ElevenLabs
GCP
Firebase
TypeScript
Voice AI
LLM
Question Speech Input Transcribe LLM Evaluate Voice Reply Feedback

Finetuning Llama 3.1 8B with GRPO

New

Fine-tuned Llama 3.1 8B for math reasoning using GRPO reinforcement learning + LoRA, achieving 78.5% GSM8K accuracy. Enabled efficient consumer-GPU training via 4-bit quantization without quality loss.

  • Training method: GRPO (Group Relative Policy Optimization) for RL fine-tuning with verifiable math rewards
  • Parameter efficiency: LoRA adapters for lightweight fine-tuning on a 4-bit quantized base model
  • Reward design: Custom reward function for math problem verification using symbolic equivalence checking
  • Evaluation: GSM8K benchmark achieving 78.5% accuracy — competitive with much larger models
  • Hardware: Consumer GPU training via 4-bit QLoRA quantization — no expensive A100 needed
Python
PyTorch
HuggingFace
LoRA
GRPO
CUDA
Llama 3.1 8B 4-bit Quantize LoRA Adapters GRPO Training Reward Verify 78.5% GSM8K
View All Projects

Latest Blog Posts

Understanding LLM Quantization: Why FP32, FP16, BF16 and INT8 Matter for Modern AI Systems

Towards AI · LLMs & Quantization

How floating-point and INT8 formats shape model quality, speed, and memory when you deploy and scale modern LLM systems.

Think You Know RAG? Take This 16-Point Checklist and Prove It Wrong

Medium · RAG & LLMs

A comprehensive 16-point checklist to evaluate and improve your RAG pipelines. Most people get at least 5 of these wrong.

Proven Techniques to Accurately Parse Your PDFs

Medium · RAG & Document AI

PDF parsing is a bottleneck for most RAG systems. These techniques improve extraction accuracy and downstream retrieval quality.

Read All Posts

Get in Touch