NeoSearch is an AI-powered semantic search engine that offers lightning-fast, privacy-focused querying of large text documents and images. It allows users to upload documents, generate embeddings offline, and retrieve answers in real-time, all within a local environment.
Powered by FAISS and the google/siglip-so400m-patch14-384 model, NeoSearch blends keyword relevance (BM25) with semantic understanding for highly accurate results across both text and images. The application features a modern React frontend and a FastAPI backend for optimal performance and responsiveness.
- Local AI Document Search: Fully local, offline-first search engine with no external API calls for privacy.
- Semantic + Keyword Ranking: Combines FAISS vector search with BM25-based re-ranking for hybrid accuracy.
- Real-Time Querying: Get answers from large documents in under a second.
- Persistent Embeddings: Avoid re-indexing on repeated document uploads.
- AI-Powered Responses: Utilizes Qwen2-1.5B-Instruct to generate tailored responses based on retrieved document chunks and user queries.
- Single Chat UI: Intuitive React interface with "Fetched data" buttons below each response that display source documents with PDF names and page numbers.
- Image Search Capability: Semantically search and retrieve relevant images from documents with an image toggle button.
- Image Source Information: Shows which PDF and page number each retrieved image is from, similar to text chunks.
- Asynchronous Processing: Embedding and chunking tasks run in parallel using asyncio and ProcessPoolExecutor.
- FastAPI Backend: Clean, efficient, and async-capable backend for rapid query processing.
- Frontend: React.js, Axios
- Backend: FastAPI, Uvicorn, PostgreSQL, FAISS
- AI Models:
google/siglip-so400m-patch14-384
(Multimodal embedding model for both text and images)Qwen2-1.5B-Instruct
(Response generation)
- Search Engine: FAISS (Semantic) + BM25 (Keyword)
- Parallelism:
asyncio
,ProcessPoolExecutor
- Vector Storage: FAISS (in-memory or persisted locally as .npy files)
- Database: PostgreSQL (storing chat windows, documents, and text chunks)
- Text Processing: Pymupdf, NLTK
- Clone the Repository
git clone https://github.com/Mayankrai449/AI_Search_Engine.git
cd AI_Search_Engine
- Frontend Setup
cd frontend
npm install
- Backend Setup
cd ../backend
pip install -r requirements.txt
- Run the Application
- Start Backend:
cd /app
uvicorn main:app --reload
- Start Frontend:
cd ../frontend
npm start
- Access NeoSearch
Open your browser at http://localhost:3000 to begin querying your documents!
- Upload Document: Add one or more plain-text or PDF files.
- Query in Chat Window: Ask your questions using natural language.
- View Source Context: Click "Fetched data" below responses to see the exact document chunks, source PDF names, and page numbers used to generate the answer.
- Search Images: Use the image toggle button to fetch relevant images when available.
- View Image Sources: See which PDF and page number each retrieved image comes from.
- Re-query Without Reprocessing: Upload once, reuse embeddings.
Thank you for using NeoSearch! Feel free to reach out with any questions or issues — [email protected]. Happy querying! 🧠📄🖼️