NeoSearch

NeoSearch is an AI-powered semantic search engine that offers lightning-fast, privacy-focused querying of large text documents and images. It allows users to upload documents, generate embeddings offline, and retrieve answers in real-time, all within a local environment.

Powered by FAISS and the google/siglip-so400m-patch14-384 model, NeoSearch blends keyword relevance (BM25) with semantic understanding for highly accurate results across both text and images. The application features a modern React frontend and a FastAPI backend for optimal performance and responsiveness.

Features

Local AI Document Search: Fully local, offline-first search engine with no external API calls for privacy.
Semantic + Keyword Ranking: Combines FAISS vector search with BM25-based re-ranking for hybrid accuracy.
Real-Time Querying: Get answers from large documents in under a second.
Persistent Embeddings: Avoid re-indexing on repeated document uploads.
AI-Powered Responses: Utilizes Qwen2-1.5B-Instruct to generate tailored responses based on retrieved document chunks and user queries.
Single Chat UI: Intuitive React interface with "Fetched data" buttons below each response that display source documents with PDF names and page numbers.
Image Search Capability: Semantically search and retrieve relevant images from documents with an image toggle button.
Image Source Information: Shows which PDF and page number each retrieved image is from, similar to text chunks.
Asynchronous Processing: Embedding and chunking tasks run in parallel using asyncio and ProcessPoolExecutor.
FastAPI Backend: Clean, efficient, and async-capable backend for rapid query processing.

Tech Stack

Frontend: React.js, Axios
Backend: FastAPI, Uvicorn, PostgreSQL, FAISS
AI Models:
- google/siglip-so400m-patch14-384 (Multimodal embedding model for both text and images)
- Qwen2-1.5B-Instruct (Response generation)
Search Engine: FAISS (Semantic) + BM25 (Keyword)
Parallelism: asyncio, ProcessPoolExecutor
Vector Storage: FAISS (in-memory or persisted locally as .npy files)
Database: PostgreSQL (storing chat windows, documents, and text chunks)
Text Processing: Pymupdf, NLTK

Setup Instructions

Clone the Repository

git clone https://github.com/Mayankrai449/AI_Search_Engine.git
cd AI_Search_Engine

Frontend Setup

cd frontend
npm install

Backend Setup

cd ../backend
pip install -r requirements.txt

Run the Application

Start Backend:

cd /app
uvicorn main:app --reload

Start Frontend:

cd ../frontend
npm start

Access NeoSearch

Open your browser at http://localhost:3000 to begin querying your documents!

Usage

Upload Document: Add one or more plain-text or PDF files.
Query in Chat Window: Ask your questions using natural language.
View Source Context: Click "Fetched data" below responses to see the exact document chunks, source PDF names, and page numbers used to generate the answer.
Search Images: Use the image toggle button to fetch relevant images when available.
View Image Sources: See which PDF and page number each retrieved image comes from.
Re-query Without Reprocessing: Upload once, reuse embeddings.

Thank you for using NeoSearch! Feel free to reach out with any questions or issues — [email protected]. Happy querying! 🧠📄🖼️

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
backend		backend
frontend		frontend
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NeoSearch

Features

Tech Stack

Setup Instructions

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Mayankrai449/AI_Search_Engine

Folders and files

Latest commit

History

Repository files navigation

NeoSearch

Features

Tech Stack

Setup Instructions

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages