Skip to content

Mayankrai449/AI_Search_Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NeoSearch

NeoSearch is an AI-powered semantic search engine that offers lightning-fast, privacy-focused querying of large text documents and images. It allows users to upload documents, generate embeddings offline, and retrieve answers in real-time, all within a local environment.

Powered by FAISS and the google/siglip-so400m-patch14-384 model, NeoSearch blends keyword relevance (BM25) with semantic understanding for highly accurate results across both text and images. The application features a modern React frontend and a FastAPI backend for optimal performance and responsiveness.

Features

  • Local AI Document Search: Fully local, offline-first search engine with no external API calls for privacy.
  • Semantic + Keyword Ranking: Combines FAISS vector search with BM25-based re-ranking for hybrid accuracy.
  • Real-Time Querying: Get answers from large documents in under a second.
  • Persistent Embeddings: Avoid re-indexing on repeated document uploads.
  • AI-Powered Responses: Utilizes Qwen2-1.5B-Instruct to generate tailored responses based on retrieved document chunks and user queries.
  • Single Chat UI: Intuitive React interface with "Fetched data" buttons below each response that display source documents with PDF names and page numbers.
  • Image Search Capability: Semantically search and retrieve relevant images from documents with an image toggle button.
  • Image Source Information: Shows which PDF and page number each retrieved image is from, similar to text chunks.
  • Asynchronous Processing: Embedding and chunking tasks run in parallel using asyncio and ProcessPoolExecutor.
  • FastAPI Backend: Clean, efficient, and async-capable backend for rapid query processing.

Chat

Data

Tech Stack

  • Frontend: React.js, Axios
  • Backend: FastAPI, Uvicorn, PostgreSQL, FAISS
  • AI Models:
    • google/siglip-so400m-patch14-384 (Multimodal embedding model for both text and images)
    • Qwen2-1.5B-Instruct (Response generation)
  • Search Engine: FAISS (Semantic) + BM25 (Keyword)
  • Parallelism: asyncio, ProcessPoolExecutor
  • Vector Storage: FAISS (in-memory or persisted locally as .npy files)
  • Database: PostgreSQL (storing chat windows, documents, and text chunks)
  • Text Processing: Pymupdf, NLTK

Setup Instructions

  1. Clone the Repository
git clone https://github.com/Mayankrai449/AI_Search_Engine.git
cd AI_Search_Engine
  1. Frontend Setup
cd frontend
npm install
  1. Backend Setup
cd ../backend
pip install -r requirements.txt
  1. Run the Application
  • Start Backend:
cd /app
uvicorn main:app --reload
  • Start Frontend:
cd ../frontend
npm start
  1. Access NeoSearch

Open your browser at http://localhost:3000 to begin querying your documents!

Usage

  1. Upload Document: Add one or more plain-text or PDF files.
  2. Query in Chat Window: Ask your questions using natural language.
  3. View Source Context: Click "Fetched data" below responses to see the exact document chunks, source PDF names, and page numbers used to generate the answer.
  4. Search Images: Use the image toggle button to fetch relevant images when available.
  5. View Image Sources: See which PDF and page number each retrieved image comes from.
  6. Re-query Without Reprocessing: Upload once, reuse embeddings.

Thank you for using NeoSearch! Feel free to reach out with any questions or issues — [email protected]. Happy querying! 🧠📄🖼️

About

AI powered search engine for large local data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published