Skip to content

hoangsonww/RAG-LangChain-AI-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

RAG AI Portfolio Support Platform: Product And Operations Handbook

A comprehensive agentic RAG platform for portfolio intelligence, evidence-backed chat, and API-enriched responses.

This repository ships a complete application stack:

  • frontend (React + Vite + MUI) for chat, strategy controls, sessions, and traceability.
  • rag-app (Flask + Socket.IO + LangChain) for retrieval, orchestration, and response generation, with reranking support.
  • backend (Express + MongoDB) for structured portfolio data APIs used by tool chaining.
  • Deployment and operations assets for Docker, Kubernetes, progressive delivery, and Terraform.

RAG System Diagram


Table Of Contents

  1. Platform Overview
  2. Core Capabilities
  3. Technology Stack
  4. Architecture Overview
  5. Repository Layout
  6. Runtime Contracts
  7. End-To-End Data Lifecycle
  8. Quick Start
  9. Configuration And Secrets
  10. API Surface
  11. Operations Toolkit
  12. Deployment And Infrastructure
  13. Production Governance And Release Decision Model
  14. Testing And Quality Gates
  15. Security And Production Notes
  16. Further Reading & Resources
  17. Documentation Index

Platform Overview

The platform is designed around a single product goal: deliver high-confidence assistant responses grounded in retrieved documents and structured backend evidence.

graph LR
    U[End User] --> FE[Frontend UI - React + Socket.IO]
    FE --> RAG[RAG API - Flask + Chat Service]
    RAG --> RET[Retrieval Layer - Chroma + BM25 + Reranker]
    RAG --> ORCH[Agentic Orchestrator]
    ORCH --> BE[Backend API - Express]
    BE --> DB[(MongoDB)]
    RAG --> RESP[Source-backed Response + Trace]
    RESP --> FE
Loading

Core Capabilities

  • Multi-strategy retrieval:
    • semantic
    • hybrid
    • multi_query
    • decomposed
  • Hybrid retrieval stack:
    • Chroma vector retrieval
    • BM25 lexical retrieval
    • optional cross-encoder reranking
  • Agentic backend tool chaining:
    • team profile + insights
    • investment profile + insights
    • sector profile
    • consultations
    • scrape simulation
  • OpenAI-compatible endpoint:
    • POST /api/chat/completions
  • Real-time frontend UX:
    • streaming chunks over Socket.IO
    • REST fallback
    • session create/load/delete
    • source cards + tool trace panel
  • Production controls:
    • request IDs (X-Request-ID)
    • optional gateway auth
    • in-memory rate limiting for /api/*
    • liveness/readiness/health endpoints

Technology Stack

Languages And Formats

Python TypeScript JavaScript Bash HCL YAML Markdown

RAG, AI, And Python Runtime

Flask Flask CORS Flask SocketIO Gunicorn Eventlet LangChain LangChain Community LangChain Core LangChain Ollama LangChain OpenAI LangChain HuggingFace Ollama ChromaDB FAISS Sentence Transformers Transformers PyTorch Rank BM25 Pydantic Pydantic Settings Requests AIOHTTP Tenacity Redis Py Tiktoken Loguru PyPDF python docx python pptx OpenPyXL BeautifulSoup4 lxml Pyngrok

Backend API Stack

Node.js Express Mongoose Swagger JSDoc Swagger UI Archiver Dotenv Faker Nodemon ts-node Prettier

Frontend Stack

React React DOM Material UI Emotion Axios React Markdown React Syntax Highlighter Socket.IO Client UUID Vite NGINX

Data, Infra, And Operations

MongoDB Redis OpenAPI Docker Docker Compose Kubernetes Kustomize Argo Rollouts Terraform AWS Amazon EKS Amazon ECR Amazon VPC AWS KMS OCI Oracle OKE Oracle VCN

Quality And Developer Tooling

Pytest Pytest Asyncio Black Type Checking

graph LR
  subgraph App
    FE[React + Vite + MUI]
    RAG[Flask + LangChain + Ollama]
    BE[Express + Mongoose]
  end
  subgraph Data
    C[ChromaDB + BM25 + FAISS]
    M[MongoDB]
    R[Redis]
  end
  subgraph Platform
    D[Docker + Compose]
    K[Kubernetes + Kustomize + Argo Rollouts]
    T[Terraform AWS/OCI]
  end
  FE --> RAG
  RAG --> C
  RAG --> BE
  BE --> M
  RAG -. optional .-> R
  D --> K
  T --> K
Loading

Architecture Overview

High-Level Service Topology

graph TB
  subgraph Client
    Browser[Browser]
  end

  subgraph App
    FE[frontend - Vite/NGINX]
    RAG[rag-app - Flask + Socket.IO]
    BE[backend - Express]
  end

  subgraph Data
    Mongo[(MongoDB)]
    Chroma[(Chroma Persist Dir)]
    Uploads[(Uploads)]
    Logs[(Logs)]
  end

  Browser --> FE
  FE --> RAG
  RAG --> BE
  BE --> Mongo
  RAG --> Chroma
  RAG --> Uploads
  RAG --> Logs
Loading

Request Lifecycle (REST Chat)

sequenceDiagram
    autonumber
    participant User
    participant FE as Frontend
    participant RAG as RAG API
    participant ENG as RAG Engine
    participant ORCH as Agentic Orchestrator
    participant BE as Backend API

    User->>FE: Submit query + strategy
    FE->>RAG: POST /api/chat
    RAG->>ENG: retrieve_documents(strategy)
    ENG->>ORCH: plan + execute tool calls
    ORCH->>BE: /api/team, /api/investments, ...
    BE-->>ORCH: JSON payloads
    ORCH-->>ENG: api_data + api_chain_trace
    ENG-->>RAG: response + sources + metadata
    RAG-->>FE: success payload
    FE-->>User: rendered answer + citations + trace
Loading

Retrieval Strategy Routing

flowchart TD
    Q[Incoming Query] --> S{Strategy}
    S -->|semantic| A[Vector Retriever]
    S -->|hybrid| B[Ensemble Retriever - Vector + BM25]
    S -->|multi_query| C[Generate alternatives - then hybrid retrieval]
    S -->|decomposed| D[Decompose query - then hybrid retrieval]

    A --> RR{Reranking enabled?}
    B --> RR
    C --> RR
    D --> RR

    RR -->|yes| X[Cross-Encoder Rerank]
    RR -->|no| Y[Use raw retrieval order]
    X --> G[LLM Response Generation]
    Y --> G
Loading

Progressive Delivery Modes

graph LR
    A[Rolling Overlay] --> A1[deploy/k8s/overlays/aws]
    A --> A2[deploy/k8s/overlays/oci]

    B[Canary Overlay] --> B1[deploy/k8s/overlays/aws-canary]
    B --> B2[deploy/k8s/overlays/oci-canary]

    C[Blue-Green Overlay] --> C1[deploy/k8s/overlays/aws-bluegreen]
    C --> C2[deploy/k8s/overlays/oci-bluegreen]
Loading

Repository Layout

.
├── backend/                    # Express + MongoDB API service
├── frontend/                   # React/Vite chat application
├── rag_system/                 # Flask RAG app (API, engine, services, storage)
├── scripts/                    # Unified local/dev/build/test/deploy wrappers
├── deploy/                     # K8s overlays, rollout scripts, runbooks
├── infra/terraform/            # AWS/OCI infrastructure definitions
├── tests/                      # Python tests
├── run.py                      # Canonical local Python entrypoint
├── Dockerfile                  # Root production RAG container definition
├── Dockerfile.rag              # RAG image variant used by compose/deploy docs
├── docker-compose.yml          # Local full-stack compose environment
├── openapi.yaml                # Unified API contract (RAG + backend)
├── QUICKSTART.md               # End-to-end operator quickstart
└── ARCHITECTURE.md             # Deep technical architecture

Runtime Contracts

Service Ports

Service Port Purpose
frontend 3000 Browser UI
rag-app 5000 RAG API + Socket.IO
backend 3456 Portfolio data API + Swagger docs
mongodb 27017 Backend persistence
redis 6379 Optional infra cache service

Component Responsibilities

Layer Primary Responsibility
frontend User interaction, streaming UX, sessions, trace/citation rendering
rag-app/api Request handling, auth/rate-limit hooks, health endpoints
rag-app/services Session/cache management, query flow orchestration
rag-app/engine Retrieval + rerank + prompt construction + response generation
rag-app/clients Backend API tool client wrappers
backend Structured domain data APIs for agentic enrichment

End-To-End Data Lifecycle

Ingestion, Retrieval, Enrichment, And Delivery

flowchart TD
  SourceDocs[backend/documents + uploaded files] --> Parse[Document parsing - TXT/PDF/DOCX/MD]
  Parse --> Chunk[Chunking + metadata]
  Chunk --> Index[Vector index - Chroma - + BM25 corpus]
  Query[User query] --> Strategy[Retrieval strategy selection]
  Strategy --> Retrieve[Semantic/Hybrid/Multi-query/Decomposed retrieval]
  Retrieve --> Rerank[Cross-encoder reranking]
  Rerank --> Evidence[Top evidence bundle]
  Evidence --> Agent[Agentic orchestrator]
  Agent --> Tools[Backend API tool chain]
  Tools --> Compose[Prompt composition + context fusion]
  Compose --> LLM[LLM response generation]
  LLM --> Output[Response + citations + tool trace]
  Output --> Session[Session store + response cache]
Loading

Runtime State Matrix

State Current Placement Durability Scale Consideration
Session history In-memory (rag_system/storage/session_store.py) process-local externalize for multi-replica consistency
Response cache In-memory LRU (rag_system/storage/response_cache.py) process-local TTL externalize for shared cache hit rate
Rate limiting In-memory sliding window (rag_system/storage/rate_limiter.py) process-local move to distributed limiter for global enforcement
Vector data chroma_db filesystem/PV persisted on mounted volume requires shared/managed vector strategy for horizontal scale
Upload artifacts uploads filesystem/PV persisted on mounted volume requires shared object storage for stateless scaling

Quick Start

For full operator-level guidance, use QUICKSTART.md.

Option 1: Unified Script CLI (recommended)

scripts/system.sh setup
scripts/system.sh dev-up --setup
scripts/system.sh health
scripts/system.sh smoke
scripts/system.sh dev-down

Option 2: Docker Compose

docker compose up -d
docker compose ps

Endpoints:

  • Frontend: http://localhost:3000
  • RAG API: http://localhost:5000
  • Backend docs: http://localhost:3456/docs

Stop:

docker compose down

Option 3: Manual Local (3 terminals)

Backend:

cd backend
cp .env.example .env  # first time only
npm install
npm run dev

RAG API (repo root):

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python run.py

Frontend:

cd frontend
npm install
npm run dev

Configuration And Secrets

RAG Runtime (rag_system/config.py)

Key runtime inputs:

  • API linkage: API_BASE_URL, API_TOKEN, API_TIMEOUT_SECONDS
  • Gateway auth: ENABLE_GATEWAY_AUTH, API_GATEWAY_TOKEN
  • Retrieval controls: TOP_K, CHUNK_SIZE, CHUNK_OVERLAP, ENABLE_RERANKING, ENABLE_HYBRID_SEARCH
  • CORS and upload constraints: CORS_ORIGINS, MAX_CONTENT_LENGTH_MB, ALLOWED_UPLOAD_EXTENSIONS
  • Session/cache/rate controls: MAX_SESSION_MESSAGES, RESPONSE_CACHE_SIZE, RATE_LIMIT_REQUESTS_PER_MINUTE

Backend Runtime (backend/.env)

Required:

  • MONGO_URI (defaults to mongodb://localhost:27017/rag_db if unset in current code)
  • PORT (default 3456)

Template file:

  • backend/.env.example

Frontend Runtime (Vite)

Optional variables:

  • VITE_API_BASE_URL
  • VITE_SOCKET_URL
  • VITE_API_GATEWAY_TOKEN

Chat Interface Screenshot

Production Security Baseline

  • Never commit live secrets to git.
  • Use cloud secret manager integration for Kubernetes deployments.
  • Rotate gateway/API tokens by release window.
  • Enforce TLS termination at ingress/load balancer.

API Surface

RAG API (port 5000)

  • Health and contract:
    • GET /health
    • GET /livez
    • GET /readyz
    • GET /openapi.json
  • Chat:
    • POST /api/chat
    • POST /api/chat/completions
  • Session lifecycle:
    • POST /api/session
    • GET /api/session/<session_id>
    • DELETE /api/session/<session_id>
    • GET /api/sessions
  • Knowledge and metadata:
    • POST /api/upload
    • GET /api/strategies
    • GET /api/system/info
    • GET /api/tools

Backend API (port 3456)

  • Auth bootstrap:
    • GET /auth/token
  • Protected domain routes:
    • GET /ping
    • GET /api/documents/download
    • GET /api/team
    • GET /api/team/insights
    • GET /api/investments
    • GET /api/investments/insights
    • GET /api/sectors
    • GET /api/consultations
    • GET /api/scrape

Unified OpenAPI contract:


Operations Toolkit

Root Scripts

Primary operator entrypoint:

scripts/system.sh help

Mapped workflows:

  • setup: scripts/system.sh setup
  • local lifecycle: dev-up, dev-down, dev-status, dev-logs
  • quality gates: build, test, health, smoke
  • docker lifecycle: docker-up, docker-down, docker-logs
  • deployment wrappers: deploy, deploy-smoke

Day-2 Operations Flow

flowchart LR
    A[Code/Config Change] --> B[scripts/system.sh test]
    B --> C[scripts/system.sh health]
    C --> D[Build/Push Images]
    D --> E[rollout.sh apply]
    E --> F[rollout.sh status]
    F --> G[smoke-test.sh]
    G --> H{Pass?}
    H -->|Yes| I[promote]
    H -->|No| J[abort / rollback]
Loading

Deployment And Infrastructure

Kubernetes + Progressive Delivery

  • Base manifests: deploy/k8s/base
  • Rolling overlays: deploy/k8s/overlays/aws, deploy/k8s/overlays/oci
  • Canary overlays: deploy/k8s/overlays/aws-canary, deploy/k8s/overlays/oci-canary
  • Blue-green overlays: deploy/k8s/overlays/aws-bluegreen, deploy/k8s/overlays/oci-bluegreen

Rollout helper:

deploy/scripts/rollout.sh <strategy> <cloud> <action> [service]

Examples:

deploy/scripts/rollout.sh rolling aws apply
deploy/scripts/rollout.sh canary aws status
deploy/scripts/rollout.sh bluegreen oci promote all

Live smoke validation:

deploy/scripts/smoke-test.sh https://rag.example.com

Terraform

  • AWS stack: infra/terraform/aws
    • EKS + VPC + ECR + optional canary node group
  • OCI stack: infra/terraform/oci
    • OKE + VCN + optional canary node pool
graph TD
    TF[Terraform Apply] --> CLUSTER[EKS / OKE Cluster]
    TF --> REGISTRY[ECR / OCIR]
    REGISTRY --> IMAGES[backend, rag-app, frontend images]
    IMAGES --> K8S[Overlay apply via rollout.sh]
    K8S --> LIVE[Ingress endpoint]
    LIVE --> SMOKE[smoke-test.sh]
Loading

Production Governance And Release Decision Model

flowchart TD
  Change[Code/Config/Image Change] --> Gate1[Static checks + tests]
  Gate1 --> Gate2[Build + image publication]
  Gate2 --> Gate3[Secrets/config validation]
  Gate3 --> Apply[Apply rollout strategy]
  Apply --> Observe[Observe probes + metrics + logs]
  Observe --> Smoke[Run smoke tests]
  Smoke --> Decision{Release healthy?}
  Decision -->|Yes| Promote[Promote rollout]
  Decision -->|No| Abort[Abort and rollback]
  Promote --> Post[Post-deploy verification + report]
  Abort --> PostMortem[Incident analysis + corrective action]
Loading

Release strategies supported:

  • Rolling (deploy/k8s/overlays/aws, deploy/k8s/overlays/oci)
  • Canary (deploy/k8s/overlays/aws-canary, deploy/k8s/overlays/oci-canary)
  • Blue-green (deploy/k8s/overlays/aws-bluegreen, deploy/k8s/overlays/oci-bluegreen)

Primary release controls:

  • deploy/scripts/rollout.sh
  • deploy/scripts/smoke-test.sh
  • scripts/system.sh test|health|smoke

Testing And Quality Gates

We provide a unified test and quality gate script for local and CI use. It comprehensively runs all unit tests, type checks, and production builds for both backend and frontend components.

Unified Gate

scripts/system.sh test

What it runs:

  • Python tests (pytest -q)
  • backend TypeScript build (npm run build)
  • frontend typecheck (npm run typecheck)
  • frontend production build (npm run build)

Additional Checks

scripts/system.sh health
scripts/system.sh smoke

Security And Production Notes

  • Backend bearer auth currently uses a demo/static token behavior by default (/auth/token route and middleware logic); treat it as non-production auth unless replaced by real identity integration.
  • RAG gateway auth is optional and controlled by ENABLE_GATEWAY_AUTH + API_GATEWAY_TOKEN.
  • Current rate limiting and session/cache stores are in-memory and process-local.
  • Enable hardened ingress, secret management, and centralized telemetry before multi-tenant production rollout.

Further Reading & Resources

If you want to learn more about the concepts and technologies used in this project, as well as essential AI and RAG principles, check out the following resources:


Documentation Index

About

🧠 A production-grade, agentic RAG platform for portfolio intelligence, combining LangChain, Chroma/FAISS, Hugging Face embeddings, and Ollama with dynamic entity extraction, backend API tool-chaining, and a real-time interactive assistant across deploy-ready frontend, backend, and infrastructure stacks.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages