SmartRAG Assistant

This project allows you to ask questions about any PDF document using a hybrid retrieval-augmented generation (RAG) pipeline. It combines semantic search (FAISS) and keyword search (BM25) with Google Gemini LLM to provide accurate answers.

Features

Upload and read PDF documents
Chunk and preprocess text for better retrieval
Semantic embeddings with HuggingFaceBgeEmbeddings
Keyword search using BM25
Hybrid retrieval with EnsembleRetriever
Answer questions using Google Gemini LLM
Interactive web interface via Gradio

Environment Setup

Install required packages:

pip install -q langchain langchain-community langchain-google-genai langchain-text-splitters faiss-cpu pypdf2 sentence_transformers gradio rank_bm25

Set your Google API key:

import os
os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY"

Usage

Upload a PDF
Read the PDF text
Split text into chunks
Initialize embeddings
Set up retrievers
Initialize Google Gemini LLM
Create prompt template
Build RAG pipeline
Ask questions
Interactive web interface

Requirements

Python 3.9+
Google API key with access to Gemini models
Packages listed in Environment Setup

Notes

Ensure your PDF contains text (not scanned images) for proper extraction.
The ensemble retriever combines semantic and keyword-based search for better results.
Designed to run in Google Colab, but can be adapted locally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SmartRAG Assistant

Features

Environment Setup

Usage

Requirements

Notes

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

SmartRAG Assistant

Features

Environment Setup

Usage

Requirements

Notes