ai rag python nextjs vector-search

AI Research Assistant

A RAG pipeline that ingests research papers and answers questions with grounded, citation-enforced responses — built on FastAPI, Qdrant, BGE embeddings, and Gemini.

1024-dim

BGE-large semantic embeddings

Citation-first

enforced [Source > Page] grounding

Layout-aware

structure-preserving PDF parsing

[ role ] Sole engineer · Applied AI research

Project specs

Tech Stack

Next.js 15 FastAPI Qdrant Gemini 1.5 LangChain BGE

R&D Focus

Retrieval-Augmented Generation & NLP

Complexity

A retrieval-augmented research assistant that turns uploaded papers into a queryable knowledge base. It answers questions using only retrieved context and enforces strict inline citations, so every claim is traceable back to a section and page.

Problem

General-purpose LLMs hallucinate and rarely cite sources, which makes them untrustworthy for academic work. The goal was a system that grounds every answer in the user’s own documents, preserves the structure of research papers (sections, sub-sections, page numbers), and forces verifiable citations.

Approach

  • Layout-aware ingestion: Uploaded PDFs are parsed with pdfplumber ( layout=True ), chunked by paragraph, and tagged with structural metadata — H1 section, H2 sub-section, and page number.
  • Semantic indexing: Each chunk is embedded into a 1024-dimensional vector with BAAI/bge-large-en-v1.5 (via LangChain HuggingFace) and upserted into a Qdrant research_papers collection using dot-product distance.
  • Retrieval: A user question is embedded with the same model; the retriever runs a vector search to pull the top-3 most relevant chunks and formats them with their source metadata.
  • Grounded generation: A system prompt instructs Gemini 1.5 Flash to act as an elite academic assistant, use only the provided context, and emit strict inline citations in the form [Source: H1 > H2 | Page: X] .
  • API surface: FastAPI exposes /api/upload (index a PDF), /api/documents (list indexed docs), and /api/ask (answer a question).

Outcome

  • End-to-end RAG flow: upload → parse → embed → store → retrieve → cite.
  • Answers are constrained to the corpus and always carry section/page citations, dramatically reducing hallucination.
  • A clean Next.js chat + upload UI backed by typed FastAPI endpoints.

Stack notes

  • Frontend: Next.js 15 (React) with useFileUpload and useChat hooks.
  • Backend: FastAPI with a modular pipeline — parser.py (structure-aware chunking), embedder.py (BGE embeddings), database.py (Qdrant), and retriever.py (semantic search).
  • Models: BAAI/bge-large-en-v1.5 for embeddings, Google Gemini 1.5 Flash for generation.