Skip to content

MohamedASAK/SmartRAG-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

SmartRAG Assistant

This project allows you to ask questions about any PDF document using a hybrid retrieval-augmented generation (RAG) pipeline. It combines semantic search (FAISS) and keyword search (BM25) with Google Gemini LLM to provide accurate answers.


Features

  • Upload and read PDF documents
  • Chunk and preprocess text for better retrieval
  • Semantic embeddings with HuggingFaceBgeEmbeddings
  • Keyword search using BM25
  • Hybrid retrieval with EnsembleRetriever
  • Answer questions using Google Gemini LLM
  • Interactive web interface via Gradio

Environment Setup

Install required packages:

pip install -q langchain langchain-community langchain-google-genai langchain-text-splitters faiss-cpu pypdf2 sentence_transformers gradio rank_bm25

Set your Google API key:

import os
os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY"

Usage

  • Upload a PDF
  • Read the PDF text
  • Split text into chunks
  • Initialize embeddings
  • Set up retrievers
  • Initialize Google Gemini LLM
  • Create prompt template
  • Build RAG pipeline
  • Ask questions
  • Interactive web interface

Requirements

  • Python 3.9+
  • Google API key with access to Gemini models
  • Packages listed in Environment Setup

Notes

  • Ensure your PDF contains text (not scanned images) for proper extraction.
  • The ensemble retriever combines semantic and keyword-based search for better results.
  • Designed to run in Google Colab, but can be adapted locally.

About

This project allows you to ask questions about any PDF document using a hybrid retrieval-augmented generation (RAG) pipeline. It combines semantic search (FAISS) and keyword search (BM25) with Google Gemini LLM to provide accurate answers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors