A comprehensive agentic RAG platform for portfolio intelligence, evidence-backed chat, and API-enriched responses.
This repository ships a complete application stack:
frontend(React + Vite + MUI) for chat, strategy controls, sessions, and traceability.rag-app(Flask + Socket.IO + LangChain) for retrieval, orchestration, and response generation, with reranking support.backend(Express + MongoDB) for structured portfolio data APIs used by tool chaining.- Deployment and operations assets for Docker, Kubernetes, progressive delivery, and Terraform.
- Platform Overview
- Core Capabilities
- Technology Stack
- Architecture Overview
- Repository Layout
- Runtime Contracts
- End-To-End Data Lifecycle
- Quick Start
- Configuration And Secrets
- API Surface
- Operations Toolkit
- Deployment And Infrastructure
- Production Governance And Release Decision Model
- Testing And Quality Gates
- Security And Production Notes
- Further Reading & Resources
- Documentation Index
The platform is designed around a single product goal: deliver high-confidence assistant responses grounded in retrieved documents and structured backend evidence.
graph LR
U[End User] --> FE[Frontend UI - React + Socket.IO]
FE --> RAG[RAG API - Flask + Chat Service]
RAG --> RET[Retrieval Layer - Chroma + BM25 + Reranker]
RAG --> ORCH[Agentic Orchestrator]
ORCH --> BE[Backend API - Express]
BE --> DB[(MongoDB)]
RAG --> RESP[Source-backed Response + Trace]
RESP --> FE
- Multi-strategy retrieval:
semantichybridmulti_querydecomposed
- Hybrid retrieval stack:
- Chroma vector retrieval
- BM25 lexical retrieval
- optional cross-encoder reranking
- Agentic backend tool chaining:
- team profile + insights
- investment profile + insights
- sector profile
- consultations
- scrape simulation
- OpenAI-compatible endpoint:
POST /api/chat/completions
- Real-time frontend UX:
- streaming chunks over Socket.IO
- REST fallback
- session create/load/delete
- source cards + tool trace panel
- Production controls:
- request IDs (
X-Request-ID) - optional gateway auth
- in-memory rate limiting for
/api/* - liveness/readiness/health endpoints
- request IDs (
graph LR
subgraph App
FE[React + Vite + MUI]
RAG[Flask + LangChain + Ollama]
BE[Express + Mongoose]
end
subgraph Data
C[ChromaDB + BM25 + FAISS]
M[MongoDB]
R[Redis]
end
subgraph Platform
D[Docker + Compose]
K[Kubernetes + Kustomize + Argo Rollouts]
T[Terraform AWS/OCI]
end
FE --> RAG
RAG --> C
RAG --> BE
BE --> M
RAG -. optional .-> R
D --> K
T --> K
graph TB
subgraph Client
Browser[Browser]
end
subgraph App
FE[frontend - Vite/NGINX]
RAG[rag-app - Flask + Socket.IO]
BE[backend - Express]
end
subgraph Data
Mongo[(MongoDB)]
Chroma[(Chroma Persist Dir)]
Uploads[(Uploads)]
Logs[(Logs)]
end
Browser --> FE
FE --> RAG
RAG --> BE
BE --> Mongo
RAG --> Chroma
RAG --> Uploads
RAG --> Logs
sequenceDiagram
autonumber
participant User
participant FE as Frontend
participant RAG as RAG API
participant ENG as RAG Engine
participant ORCH as Agentic Orchestrator
participant BE as Backend API
User->>FE: Submit query + strategy
FE->>RAG: POST /api/chat
RAG->>ENG: retrieve_documents(strategy)
ENG->>ORCH: plan + execute tool calls
ORCH->>BE: /api/team, /api/investments, ...
BE-->>ORCH: JSON payloads
ORCH-->>ENG: api_data + api_chain_trace
ENG-->>RAG: response + sources + metadata
RAG-->>FE: success payload
FE-->>User: rendered answer + citations + trace
flowchart TD
Q[Incoming Query] --> S{Strategy}
S -->|semantic| A[Vector Retriever]
S -->|hybrid| B[Ensemble Retriever - Vector + BM25]
S -->|multi_query| C[Generate alternatives - then hybrid retrieval]
S -->|decomposed| D[Decompose query - then hybrid retrieval]
A --> RR{Reranking enabled?}
B --> RR
C --> RR
D --> RR
RR -->|yes| X[Cross-Encoder Rerank]
RR -->|no| Y[Use raw retrieval order]
X --> G[LLM Response Generation]
Y --> G
graph LR
A[Rolling Overlay] --> A1[deploy/k8s/overlays/aws]
A --> A2[deploy/k8s/overlays/oci]
B[Canary Overlay] --> B1[deploy/k8s/overlays/aws-canary]
B --> B2[deploy/k8s/overlays/oci-canary]
C[Blue-Green Overlay] --> C1[deploy/k8s/overlays/aws-bluegreen]
C --> C2[deploy/k8s/overlays/oci-bluegreen]
.
├── backend/ # Express + MongoDB API service
├── frontend/ # React/Vite chat application
├── rag_system/ # Flask RAG app (API, engine, services, storage)
├── scripts/ # Unified local/dev/build/test/deploy wrappers
├── deploy/ # K8s overlays, rollout scripts, runbooks
├── infra/terraform/ # AWS/OCI infrastructure definitions
├── tests/ # Python tests
├── run.py # Canonical local Python entrypoint
├── Dockerfile # Root production RAG container definition
├── Dockerfile.rag # RAG image variant used by compose/deploy docs
├── docker-compose.yml # Local full-stack compose environment
├── openapi.yaml # Unified API contract (RAG + backend)
├── QUICKSTART.md # End-to-end operator quickstart
└── ARCHITECTURE.md # Deep technical architecture
| Service | Port | Purpose |
|---|---|---|
frontend |
3000 |
Browser UI |
rag-app |
5000 |
RAG API + Socket.IO |
backend |
3456 |
Portfolio data API + Swagger docs |
mongodb |
27017 |
Backend persistence |
redis |
6379 |
Optional infra cache service |
| Layer | Primary Responsibility |
|---|---|
frontend |
User interaction, streaming UX, sessions, trace/citation rendering |
rag-app/api |
Request handling, auth/rate-limit hooks, health endpoints |
rag-app/services |
Session/cache management, query flow orchestration |
rag-app/engine |
Retrieval + rerank + prompt construction + response generation |
rag-app/clients |
Backend API tool client wrappers |
backend |
Structured domain data APIs for agentic enrichment |
flowchart TD
SourceDocs[backend/documents + uploaded files] --> Parse[Document parsing - TXT/PDF/DOCX/MD]
Parse --> Chunk[Chunking + metadata]
Chunk --> Index[Vector index - Chroma - + BM25 corpus]
Query[User query] --> Strategy[Retrieval strategy selection]
Strategy --> Retrieve[Semantic/Hybrid/Multi-query/Decomposed retrieval]
Retrieve --> Rerank[Cross-encoder reranking]
Rerank --> Evidence[Top evidence bundle]
Evidence --> Agent[Agentic orchestrator]
Agent --> Tools[Backend API tool chain]
Tools --> Compose[Prompt composition + context fusion]
Compose --> LLM[LLM response generation]
LLM --> Output[Response + citations + tool trace]
Output --> Session[Session store + response cache]
| State | Current Placement | Durability | Scale Consideration |
|---|---|---|---|
| Session history | In-memory (rag_system/storage/session_store.py) |
process-local | externalize for multi-replica consistency |
| Response cache | In-memory LRU (rag_system/storage/response_cache.py) |
process-local TTL | externalize for shared cache hit rate |
| Rate limiting | In-memory sliding window (rag_system/storage/rate_limiter.py) |
process-local | move to distributed limiter for global enforcement |
| Vector data | chroma_db filesystem/PV |
persisted on mounted volume | requires shared/managed vector strategy for horizontal scale |
| Upload artifacts | uploads filesystem/PV |
persisted on mounted volume | requires shared object storage for stateless scaling |
For full operator-level guidance, use QUICKSTART.md.
scripts/system.sh setup
scripts/system.sh dev-up --setup
scripts/system.sh health
scripts/system.sh smoke
scripts/system.sh dev-down
docker compose up -d
docker compose ps
Endpoints:
- Frontend:
http://localhost:3000 - RAG API:
http://localhost:5000 - Backend docs:
http://localhost:3456/docs
Stop:
docker compose down
Backend:
cd backend
cp .env.example .env # first time only
npm install
npm run dev
RAG API (repo root):
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python run.py
Frontend:
cd frontend
npm install
npm run dev
Key runtime inputs:
- API linkage:
API_BASE_URL,API_TOKEN,API_TIMEOUT_SECONDS - Gateway auth:
ENABLE_GATEWAY_AUTH,API_GATEWAY_TOKEN - Retrieval controls:
TOP_K,CHUNK_SIZE,CHUNK_OVERLAP,ENABLE_RERANKING,ENABLE_HYBRID_SEARCH - CORS and upload constraints:
CORS_ORIGINS,MAX_CONTENT_LENGTH_MB,ALLOWED_UPLOAD_EXTENSIONS - Session/cache/rate controls:
MAX_SESSION_MESSAGES,RESPONSE_CACHE_SIZE,RATE_LIMIT_REQUESTS_PER_MINUTE
Required:
MONGO_URI(defaults tomongodb://localhost:27017/rag_dbif unset in current code)PORT(default3456)
Template file:
backend/.env.example
Optional variables:
VITE_API_BASE_URLVITE_SOCKET_URLVITE_API_GATEWAY_TOKEN
- Never commit live secrets to git.
- Use cloud secret manager integration for Kubernetes deployments.
- Rotate gateway/API tokens by release window.
- Enforce TLS termination at ingress/load balancer.
- Health and contract:
GET /healthGET /livezGET /readyzGET /openapi.json
- Chat:
POST /api/chatPOST /api/chat/completions
- Session lifecycle:
POST /api/sessionGET /api/session/<session_id>DELETE /api/session/<session_id>GET /api/sessions
- Knowledge and metadata:
POST /api/uploadGET /api/strategiesGET /api/system/infoGET /api/tools
- Auth bootstrap:
GET /auth/token
- Protected domain routes:
GET /pingGET /api/documents/downloadGET /api/teamGET /api/team/insightsGET /api/investmentsGET /api/investments/insightsGET /api/sectorsGET /api/consultationsGET /api/scrape
Unified OpenAPI contract:
Primary operator entrypoint:
scripts/system.sh help
Mapped workflows:
- setup:
scripts/system.sh setup - local lifecycle:
dev-up,dev-down,dev-status,dev-logs - quality gates:
build,test,health,smoke - docker lifecycle:
docker-up,docker-down,docker-logs - deployment wrappers:
deploy,deploy-smoke
flowchart LR
A[Code/Config Change] --> B[scripts/system.sh test]
B --> C[scripts/system.sh health]
C --> D[Build/Push Images]
D --> E[rollout.sh apply]
E --> F[rollout.sh status]
F --> G[smoke-test.sh]
G --> H{Pass?}
H -->|Yes| I[promote]
H -->|No| J[abort / rollback]
- Base manifests:
deploy/k8s/base - Rolling overlays:
deploy/k8s/overlays/aws,deploy/k8s/overlays/oci - Canary overlays:
deploy/k8s/overlays/aws-canary,deploy/k8s/overlays/oci-canary - Blue-green overlays:
deploy/k8s/overlays/aws-bluegreen,deploy/k8s/overlays/oci-bluegreen
Rollout helper:
deploy/scripts/rollout.sh <strategy> <cloud> <action> [service]
Examples:
deploy/scripts/rollout.sh rolling aws apply
deploy/scripts/rollout.sh canary aws status
deploy/scripts/rollout.sh bluegreen oci promote all
Live smoke validation:
deploy/scripts/smoke-test.sh https://rag.example.com
- AWS stack:
infra/terraform/aws- EKS + VPC + ECR + optional canary node group
- OCI stack:
infra/terraform/oci- OKE + VCN + optional canary node pool
graph TD
TF[Terraform Apply] --> CLUSTER[EKS / OKE Cluster]
TF --> REGISTRY[ECR / OCIR]
REGISTRY --> IMAGES[backend, rag-app, frontend images]
IMAGES --> K8S[Overlay apply via rollout.sh]
K8S --> LIVE[Ingress endpoint]
LIVE --> SMOKE[smoke-test.sh]
flowchart TD
Change[Code/Config/Image Change] --> Gate1[Static checks + tests]
Gate1 --> Gate2[Build + image publication]
Gate2 --> Gate3[Secrets/config validation]
Gate3 --> Apply[Apply rollout strategy]
Apply --> Observe[Observe probes + metrics + logs]
Observe --> Smoke[Run smoke tests]
Smoke --> Decision{Release healthy?}
Decision -->|Yes| Promote[Promote rollout]
Decision -->|No| Abort[Abort and rollback]
Promote --> Post[Post-deploy verification + report]
Abort --> PostMortem[Incident analysis + corrective action]
Release strategies supported:
- Rolling (
deploy/k8s/overlays/aws,deploy/k8s/overlays/oci) - Canary (
deploy/k8s/overlays/aws-canary,deploy/k8s/overlays/oci-canary) - Blue-green (
deploy/k8s/overlays/aws-bluegreen,deploy/k8s/overlays/oci-bluegreen)
Primary release controls:
deploy/scripts/rollout.shdeploy/scripts/smoke-test.shscripts/system.sh test|health|smoke
We provide a unified test and quality gate script for local and CI use. It comprehensively runs all unit tests, type checks, and production builds for both backend and frontend components.
scripts/system.sh test
What it runs:
- Python tests (
pytest -q) - backend TypeScript build (
npm run build) - frontend typecheck (
npm run typecheck) - frontend production build (
npm run build)
scripts/system.sh health
scripts/system.sh smoke
- Backend bearer auth currently uses a demo/static token behavior by default (
/auth/tokenroute and middleware logic); treat it as non-production auth unless replaced by real identity integration. - RAG gateway auth is optional and controlled by
ENABLE_GATEWAY_AUTH+API_GATEWAY_TOKEN. - Current rate limiting and session/cache stores are in-memory and process-local.
- Enable hardened ingress, secret management, and centralized telemetry before multi-tenant production rollout.
If you want to learn more about the concepts and technologies used in this project, as well as essential AI and RAG principles, check out the following resources:
- Agentic RAG Implementation Guide
- AI Agents & Assistants
- AI and Businesses
- Confusion Matrix for LLM Outputs
- Data Science Pipeline with a Business Problem
- Decision Trees & Ensemble Learning
- Deep Learning & Neural Networks
- k-Nearest Neighbors Algorithm
- LLM Mining for Customer Experience
- Regression Analysis & Linear Models
- Representation Learning & Dimensionality Reduction for Recommender Systems
- Retrieval Augmented Generation (RAG) Concepts
- Unstructured Data Textual Analysis
- Storytelling with Data
- Synthetic Experts
