SimRAG Reproduction Study
Reproduction study of SimRAG paper implementing similarity-based RAG with two-stage fine-tuning on consumer hardware, analyzing model capacity limitations and retriever-generator coupling.
Research Approach
Reproduction study of the SimRAG paper exploring similarity-based Retrieval Augmented Generation techniques. Built a modular implementation to understand RAG fundamentals, fine-tuning concepts, and practical ML engineering workflows.
Documents → Embeddings → Vector DB → Stage 1 Fine-tuning → QA Generation → Stage 2 Fine-tuning → EvaluationFocused on learning through implementation rather than just theoretical understanding.
Key Features
Provider-Agnostic Interface
Supports both local (Ollama) and cloud (Purdue GenAI) LLMs with automatic provider selection.
RAG Implementation
Sentence Transformers for embeddings, Qdrant vector storage, and context-aware question answering with source citations.
Two-Stage Fine-Tuning
QLoRA fine-tuning: Stage 1 for instruction following, Stage 2 for domain adaptation with synthetic QA pairs.
Test Suite
Test suite with mocked external dependencies for reproducible testing and validation.
Technical Details
Workflow
Two-stage fine-tuning process: instruction following, then domain adaptation with synthetic QA pairs.
Setup & Document Ingestion
Load documents, chunk text, generate embeddings, store in Qdrant.
Stage 1: Instruction Following
QLoRA fine-tuning on general instructions (~4-6 hours).
Generate QA Pairs
Create domain-specific training data from documents.
Stage 2: Domain Adaptation
QLoRA fine-tuning on domain QA dataset (~30 minutes).
Testing & Comparison
Compare baseline RAG vs fine-tuned RAG performance.
Implementation
Successfully trained and tested model on personal hardware, demonstrating practical ML engineering skills.
Experiment framework for research reproducibility with logging, result tracking, and automated testing.
Impact & Results
Understanding of RAG fundamentals and fine-tuning concepts through hands-on implementation, demonstrating practical ML engineering skills
Key Achievements
Designed provider-agnostic interface supporting both local (Ollama) and cloud (Purdue GenAI) LLMs with automatic provider selection
Built RAG system with Sentence Transformers for embeddings, Qdrant vector storage, and context-aware question answering
Implemented both synchronous and asynchronous API calls for flexible integration patterns
Trained and tested model on personal hardware (RTX 3080, 10GB VRAM). Results: context relevance unchanged (0.316), answer quality decreased 0.1-1.9%, response time increased 52-53%. Findings attributed to model capacity limitations (1.5B vs. original 8B/27B) and lack of retriever fine-tuning
Created test suite with mocked external dependencies for reproducible testing
Technical Highlights
- • Modular RAG implementation
- • Provider-agnostic LLM interface
- • QLoRA fine-tuning on consumer hardware
- • Vector storage with Qdrant
- • Experiment framework for reproducibility
- • Test suite with mocked dependencies