AI Research Assistant
A RAG pipeline that ingests research papers and answers questions with grounded, citation-enforced responses — built on FastAPI, Qdrant, BGE embeddings, and Gemini.
1024-dim
BGE-large semantic embeddings
Citation-first
enforced [Source > Page] grounding
Layout-aware
structure-preserving PDF parsing
[ role ] Sole engineer · Applied AI research
Project specs
Tech Stack
R&D Focus
Retrieval-Augmented Generation & NLP
Complexity
A retrieval-augmented research assistant that turns uploaded papers into a queryable knowledge base. It answers questions using only retrieved context and enforces strict inline citations, so every claim is traceable back to a section and page.
Problem
General-purpose LLMs hallucinate and rarely cite sources, which makes them untrustworthy for academic work. The goal was a system that grounds every answer in the user’s own documents, preserves the structure of research papers (sections, sub-sections, page numbers), and forces verifiable citations.
Approach
- › Layout-aware ingestion: Uploaded PDFs are parsed with
pdfplumber(layout=True), chunked by paragraph, and tagged with structural metadata — H1 section, H2 sub-section, and page number. - › Semantic indexing: Each chunk is embedded into a 1024-dimensional vector with
BAAI/bge-large-en-v1.5(via LangChain HuggingFace) and upserted into a Qdrantresearch_paperscollection using dot-product distance. - › Retrieval: A user question is embedded with the same model; the retriever runs a vector search to pull the top-3 most relevant chunks and formats them with their source metadata.
- › Grounded generation: A system prompt instructs Gemini 1.5 Flash to act as an elite academic assistant, use only the provided context, and emit strict inline citations in the form
[Source: H1 > H2 | Page: X]. - › API surface: FastAPI exposes
/api/upload(index a PDF),/api/documents(list indexed docs), and/api/ask(answer a question).
Outcome
- › End-to-end RAG flow: upload → parse → embed → store → retrieve → cite.
- › Answers are constrained to the corpus and always carry section/page citations, dramatically reducing hallucination.
- › A clean Next.js chat + upload UI backed by typed FastAPI endpoints.
Stack notes
- › Frontend: Next.js 15 (React) with
useFileUploadanduseChathooks. - › Backend: FastAPI with a modular pipeline —
parser.py(structure-aware chunking),embedder.py(BGE embeddings),database.py(Qdrant), andretriever.py(semantic search). - › Models:
BAAI/bge-large-en-v1.5for embeddings, Google Gemini 1.5 Flash for generation.