Skip to content

Latest commit

 

History

History
57 lines (44 loc) · 1.48 KB

File metadata and controls

57 lines (44 loc) · 1.48 KB

SmartRAG Assistant

This project allows you to ask questions about any PDF document using a hybrid retrieval-augmented generation (RAG) pipeline. It combines semantic search (FAISS) and keyword search (BM25) with Google Gemini LLM to provide accurate answers.


Features

  • Upload and read PDF documents
  • Chunk and preprocess text for better retrieval
  • Semantic embeddings with HuggingFaceBgeEmbeddings
  • Keyword search using BM25
  • Hybrid retrieval with EnsembleRetriever
  • Answer questions using Google Gemini LLM
  • Interactive web interface via Gradio

Environment Setup

Install required packages:

pip install -q langchain langchain-community langchain-google-genai langchain-text-splitters faiss-cpu pypdf2 sentence_transformers gradio rank_bm25

Set your Google API key:

import os
os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY"

Usage

  • Upload a PDF
  • Read the PDF text
  • Split text into chunks
  • Initialize embeddings
  • Set up retrievers
  • Initialize Google Gemini LLM
  • Create prompt template
  • Build RAG pipeline
  • Ask questions
  • Interactive web interface

Requirements

  • Python 3.9+
  • Google API key with access to Gemini models
  • Packages listed in Environment Setup

Notes

  • Ensure your PDF contains text (not scanned images) for proper extraction.
  • The ensemble retriever combines semantic and keyword-based search for better results.
  • Designed to run in Google Colab, but can be adapted locally.