RAG AI Portfolio Support Platform: Product And Operations Handbook

A comprehensive agentic RAG platform for portfolio intelligence, evidence-backed chat, and API-enriched responses.

This repository ships a complete application stack:

frontend (React + Vite + MUI) for chat, strategy controls, sessions, and traceability.
rag-app (Flask + Socket.IO + LangChain) for retrieval, orchestration, and response generation, with reranking support.
backend (Express + MongoDB) for structured portfolio data APIs used by tool chaining.
Deployment and operations assets for Docker, Kubernetes, progressive delivery, and Terraform.

Platform Overview
Core Capabilities
Technology Stack
Architecture Overview
Repository Layout
Runtime Contracts
End-To-End Data Lifecycle
Quick Start
Configuration And Secrets
API Surface
Operations Toolkit
Deployment And Infrastructure
Production Governance And Release Decision Model
Testing And Quality Gates
Security And Production Notes
Further Reading & Resources
Documentation Index

Platform Overview

The platform is designed around a single product goal: deliver high-confidence assistant responses grounded in retrieved documents and structured backend evidence.

graph LR
    U[End User] --> FE[Frontend UI - React + Socket.IO]
    FE --> RAG[RAG API - Flask + Chat Service]
    RAG --> RET[Retrieval Layer - Chroma + BM25 + Reranker]
    RAG --> ORCH[Agentic Orchestrator]
    ORCH --> BE[Backend API - Express]
    BE --> DB[(MongoDB)]
    RAG --> RESP[Source-backed Response + Trace]
    RESP --> FE

Core Capabilities

Multi-strategy retrieval:
- semantic
- hybrid
- multi_query
- decomposed
Hybrid retrieval stack:
- Chroma vector retrieval
- BM25 lexical retrieval
- optional cross-encoder reranking
Agentic backend tool chaining:
- team profile + insights
- investment profile + insights
- sector profile
- consultations
- scrape simulation
OpenAI-compatible endpoint:
- POST /api/chat/completions
Real-time frontend UX:
- streaming chunks over Socket.IO
- REST fallback
- session create/load/delete
- source cards + tool trace panel
Production controls:
- request IDs (X-Request-ID)
- optional gateway auth
- in-memory rate limiting for /api/*
- liveness/readiness/health endpoints

Technology Stack

Languages And Formats

RAG, AI, And Python Runtime

Backend API Stack

Frontend Stack

Data, Infra, And Operations

Quality And Developer Tooling

graph LR
  subgraph App
    FE[React + Vite + MUI]
    RAG[Flask + LangChain + Ollama]
    BE[Express + Mongoose]
  end
  subgraph Data
    C[ChromaDB + BM25 + FAISS]
    M[MongoDB]
    R[Redis]
  end
  subgraph Platform
    D[Docker + Compose]
    K[Kubernetes + Kustomize + Argo Rollouts]
    T[Terraform AWS/OCI]
  end
  FE --> RAG
  RAG --> C
  RAG --> BE
  BE --> M
  RAG -. optional .-> R
  D --> K
  T --> K

Architecture Overview

High-Level Service Topology

graph TB
  subgraph Client
    Browser[Browser]
  end

  subgraph App
    FE[frontend - Vite/NGINX]
    RAG[rag-app - Flask + Socket.IO]
    BE[backend - Express]
  end

  subgraph Data
    Mongo[(MongoDB)]
    Chroma[(Chroma Persist Dir)]
    Uploads[(Uploads)]
    Logs[(Logs)]
  end

  Browser --> FE
  FE --> RAG
  RAG --> BE
  BE --> Mongo
  RAG --> Chroma
  RAG --> Uploads
  RAG --> Logs

Request Lifecycle (REST Chat)

sequenceDiagram
    autonumber
    participant User
    participant FE as Frontend
    participant RAG as RAG API
    participant ENG as RAG Engine
    participant ORCH as Agentic Orchestrator
    participant BE as Backend API

    User->>FE: Submit query + strategy
    FE->>RAG: POST /api/chat
    RAG->>ENG: retrieve_documents(strategy)
    ENG->>ORCH: plan + execute tool calls
    ORCH->>BE: /api/team, /api/investments, ...
    BE-->>ORCH: JSON payloads
    ORCH-->>ENG: api_data + api_chain_trace
    ENG-->>RAG: response + sources + metadata
    RAG-->>FE: success payload
    FE-->>User: rendered answer + citations + trace

Retrieval Strategy Routing

flowchart TD
    Q[Incoming Query] --> S{Strategy}
    S -->|semantic| A[Vector Retriever]
    S -->|hybrid| B[Ensemble Retriever - Vector + BM25]
    S -->|multi_query| C[Generate alternatives - then hybrid retrieval]
    S -->|decomposed| D[Decompose query - then hybrid retrieval]

    A --> RR{Reranking enabled?}
    B --> RR
    C --> RR
    D --> RR

    RR -->|yes| X[Cross-Encoder Rerank]
    RR -->|no| Y[Use raw retrieval order]
    X --> G[LLM Response Generation]
    Y --> G

Progressive Delivery Modes

graph LR
    A[Rolling Overlay] --> A1[deploy/k8s/overlays/aws]
    A --> A2[deploy/k8s/overlays/oci]

    B[Canary Overlay] --> B1[deploy/k8s/overlays/aws-canary]
    B --> B2[deploy/k8s/overlays/oci-canary]

    C[Blue-Green Overlay] --> C1[deploy/k8s/overlays/aws-bluegreen]
    C --> C2[deploy/k8s/overlays/oci-bluegreen]

Repository Layout

.
├── backend/                    # Express + MongoDB API service
├── frontend/                   # React/Vite chat application
├── rag_system/                 # Flask RAG app (API, engine, services, storage)
├── scripts/                    # Unified local/dev/build/test/deploy wrappers
├── deploy/                     # K8s overlays, rollout scripts, runbooks
├── infra/terraform/            # AWS/OCI infrastructure definitions
├── tests/                      # Python tests
├── run.py                      # Canonical local Python entrypoint
├── Dockerfile                  # Root production RAG container definition
├── Dockerfile.rag              # RAG image variant used by compose/deploy docs
├── docker-compose.yml          # Local full-stack compose environment
├── openapi.yaml                # Unified API contract (RAG + backend)
├── QUICKSTART.md               # End-to-end operator quickstart
└── ARCHITECTURE.md             # Deep technical architecture

Runtime Contracts

Service Ports

Service	Port	Purpose
`frontend`	`3000`	Browser UI
`rag-app`	`5000`	RAG API + Socket.IO
`backend`	`3456`	Portfolio data API + Swagger docs
`mongodb`	`27017`	Backend persistence
`redis`	`6379`	Optional infra cache service

Component Responsibilities

Layer	Primary Responsibility
`frontend`	User interaction, streaming UX, sessions, trace/citation rendering
`rag-app/api`	Request handling, auth/rate-limit hooks, health endpoints
`rag-app/services`	Session/cache management, query flow orchestration
`rag-app/engine`	Retrieval + rerank + prompt construction + response generation
`rag-app/clients`	Backend API tool client wrappers
`backend`	Structured domain data APIs for agentic enrichment

End-To-End Data Lifecycle

Ingestion, Retrieval, Enrichment, And Delivery

flowchart TD
  SourceDocs[backend/documents + uploaded files] --> Parse[Document parsing - TXT/PDF/DOCX/MD]
  Parse --> Chunk[Chunking + metadata]
  Chunk --> Index[Vector index - Chroma - + BM25 corpus]
  Query[User query] --> Strategy[Retrieval strategy selection]
  Strategy --> Retrieve[Semantic/Hybrid/Multi-query/Decomposed retrieval]
  Retrieve --> Rerank[Cross-encoder reranking]
  Rerank --> Evidence[Top evidence bundle]
  Evidence --> Agent[Agentic orchestrator]
  Agent --> Tools[Backend API tool chain]
  Tools --> Compose[Prompt composition + context fusion]
  Compose --> LLM[LLM response generation]
  LLM --> Output[Response + citations + tool trace]
  Output --> Session[Session store + response cache]

Runtime State Matrix

State	Current Placement	Durability	Scale Consideration
Session history	In-memory (`rag_system/storage/session_store.py`)	process-local	externalize for multi-replica consistency
Response cache	In-memory LRU (`rag_system/storage/response_cache.py`)	process-local TTL	externalize for shared cache hit rate
Rate limiting	In-memory sliding window (`rag_system/storage/rate_limiter.py`)	process-local	move to distributed limiter for global enforcement
Vector data	`chroma_db` filesystem/PV	persisted on mounted volume	requires shared/managed vector strategy for horizontal scale
Upload artifacts	`uploads` filesystem/PV	persisted on mounted volume	requires shared object storage for stateless scaling

Quick Start

For full operator-level guidance, use QUICKSTART.md.

Option 1: Unified Script CLI (recommended)

scripts/system.sh setup
scripts/system.sh dev-up --setup
scripts/system.sh health
scripts/system.sh smoke
scripts/system.sh dev-down

Option 2: Docker Compose

docker compose up -d
docker compose ps

Endpoints:

Frontend: http://localhost:3000
RAG API: http://localhost:5000
Backend docs: http://localhost:3456/docs

Stop:

docker compose down

Option 3: Manual Local (3 terminals)

Backend:

cd backend
cp .env.example .env  # first time only
npm install
npm run dev

RAG API (repo root):

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python run.py

Frontend:

cd frontend
npm install
npm run dev

Configuration And Secrets

RAG Runtime (`rag_system/config.py`)

Key runtime inputs:

API linkage: API_BASE_URL, API_TOKEN, API_TIMEOUT_SECONDS
Gateway auth: ENABLE_GATEWAY_AUTH, API_GATEWAY_TOKEN
Retrieval controls: TOP_K, CHUNK_SIZE, CHUNK_OVERLAP, ENABLE_RERANKING, ENABLE_HYBRID_SEARCH
CORS and upload constraints: CORS_ORIGINS, MAX_CONTENT_LENGTH_MB, ALLOWED_UPLOAD_EXTENSIONS
Session/cache/rate controls: MAX_SESSION_MESSAGES, RESPONSE_CACHE_SIZE, RATE_LIMIT_REQUESTS_PER_MINUTE

Backend Runtime (`backend/.env`)

Required:

MONGO_URI (defaults to mongodb://localhost:27017/rag_db if unset in current code)
PORT (default 3456)

Template file:

backend/.env.example

Frontend Runtime (`Vite`)

Optional variables:

VITE_API_BASE_URL
VITE_SOCKET_URL
VITE_API_GATEWAY_TOKEN

Production Security Baseline

Never commit live secrets to git.
Use cloud secret manager integration for Kubernetes deployments.
Rotate gateway/API tokens by release window.
Enforce TLS termination at ingress/load balancer.

API Surface

RAG API (port `5000`)

Health and contract:
- GET /health
- GET /livez
- GET /readyz
- GET /openapi.json
Chat:
- POST /api/chat
- POST /api/chat/completions
Session lifecycle:
- POST /api/session
- GET /api/session/<session_id>
- DELETE /api/session/<session_id>
- GET /api/sessions
Knowledge and metadata:
- POST /api/upload
- GET /api/strategies
- GET /api/system/info
- GET /api/tools

Backend API (port `3456`)

Auth bootstrap:
- GET /auth/token
Protected domain routes:
- GET /ping
- GET /api/documents/download
- GET /api/team
- GET /api/team/insights
- GET /api/investments
- GET /api/investments/insights
- GET /api/sectors
- GET /api/consultations
- GET /api/scrape

Unified OpenAPI contract:

openapi.yaml

Operations Toolkit

Root Scripts

Primary operator entrypoint:

scripts/system.sh help

Mapped workflows:

setup: scripts/system.sh setup
local lifecycle: dev-up, dev-down, dev-status, dev-logs
quality gates: build, test, health, smoke
docker lifecycle: docker-up, docker-down, docker-logs
deployment wrappers: deploy, deploy-smoke

Day-2 Operations Flow

flowchart LR
    A[Code/Config Change] --> B[scripts/system.sh test]
    B --> C[scripts/system.sh health]
    C --> D[Build/Push Images]
    D --> E[rollout.sh apply]
    E --> F[rollout.sh status]
    F --> G[smoke-test.sh]
    G --> H{Pass?}
    H -->|Yes| I[promote]
    H -->|No| J[abort / rollback]

Deployment And Infrastructure

Kubernetes + Progressive Delivery

Base manifests: deploy/k8s/base
Rolling overlays: deploy/k8s/overlays/aws, deploy/k8s/overlays/oci
Canary overlays: deploy/k8s/overlays/aws-canary, deploy/k8s/overlays/oci-canary
Blue-green overlays: deploy/k8s/overlays/aws-bluegreen, deploy/k8s/overlays/oci-bluegreen

Rollout helper:

deploy/scripts/rollout.sh <strategy> <cloud> <action> [service]

Examples:

deploy/scripts/rollout.sh rolling aws apply
deploy/scripts/rollout.sh canary aws status
deploy/scripts/rollout.sh bluegreen oci promote all

Live smoke validation:

deploy/scripts/smoke-test.sh https://rag.example.com

Terraform

AWS stack: infra/terraform/aws
- EKS + VPC + ECR + optional canary node group
OCI stack: infra/terraform/oci
- OKE + VCN + optional canary node pool

graph TD
    TF[Terraform Apply] --> CLUSTER[EKS / OKE Cluster]
    TF --> REGISTRY[ECR / OCIR]
    REGISTRY --> IMAGES[backend, rag-app, frontend images]
    IMAGES --> K8S[Overlay apply via rollout.sh]
    K8S --> LIVE[Ingress endpoint]
    LIVE --> SMOKE[smoke-test.sh]

Production Governance And Release Decision Model

flowchart TD
  Change[Code/Config/Image Change] --> Gate1[Static checks + tests]
  Gate1 --> Gate2[Build + image publication]
  Gate2 --> Gate3[Secrets/config validation]
  Gate3 --> Apply[Apply rollout strategy]
  Apply --> Observe[Observe probes + metrics + logs]
  Observe --> Smoke[Run smoke tests]
  Smoke --> Decision{Release healthy?}
  Decision -->|Yes| Promote[Promote rollout]
  Decision -->|No| Abort[Abort and rollback]
  Promote --> Post[Post-deploy verification + report]
  Abort --> PostMortem[Incident analysis + corrective action]

Release strategies supported:

Rolling (deploy/k8s/overlays/aws, deploy/k8s/overlays/oci)
Canary (deploy/k8s/overlays/aws-canary, deploy/k8s/overlays/oci-canary)
Blue-green (deploy/k8s/overlays/aws-bluegreen, deploy/k8s/overlays/oci-bluegreen)

Primary release controls:

deploy/scripts/rollout.sh
deploy/scripts/smoke-test.sh
scripts/system.sh test|health|smoke

Testing And Quality Gates

We provide a unified test and quality gate script for local and CI use. It comprehensively runs all unit tests, type checks, and production builds for both backend and frontend components.

Unified Gate

scripts/system.sh test

What it runs:

Python tests (pytest -q)
backend TypeScript build (npm run build)
frontend typecheck (npm run typecheck)
frontend production build (npm run build)

Additional Checks

scripts/system.sh health
scripts/system.sh smoke

Security And Production Notes

Backend bearer auth currently uses a demo/static token behavior by default (/auth/token route and middleware logic); treat it as non-production auth unless replaced by real identity integration.
RAG gateway auth is optional and controlled by ENABLE_GATEWAY_AUTH + API_GATEWAY_TOKEN.
Current rate limiting and session/cache stores are in-memory and process-local.
Enable hardened ingress, secret management, and centralized telemetry before multi-tenant production rollout.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.devcontainer		.devcontainer
.github		.github
.idea		.idea
backend		backend
data		data
deploy		deploy
frontend		frontend
infra/terraform		infra/terraform
packages		packages
rag_system		rag_system
resources		resources
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTIC_RAG.md		AGENTIC_RAG.md
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
Dockerfile.rag		Dockerfile.rag
LICENSE		LICENSE
Makefile		Makefile
QUICKSTART.md		QUICKSTART.md
RAG_Experimental.ipynb		RAG_Experimental.ipynb
README.md		README.md
docker-compose.yml		docker-compose.yml
index.html		index.html
openapi.yaml		openapi.yaml
push_image.sh		push_image.sh
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run.py		run.py
serve-wiki.sh		serve-wiki.sh

Folders and files

Latest commit

History

Repository files navigation

RAG AI Portfolio Support Platform: Product And Operations Handbook

Table Of Contents

Platform Overview

Core Capabilities

Technology Stack

Languages And Formats

RAG, AI, And Python Runtime

Backend API Stack

Frontend Stack

Data, Infra, And Operations

Quality And Developer Tooling

Architecture Overview

High-Level Service Topology

Request Lifecycle (REST Chat)

Retrieval Strategy Routing

Progressive Delivery Modes

Repository Layout

Runtime Contracts

Service Ports

Component Responsibilities

End-To-End Data Lifecycle

Ingestion, Retrieval, Enrichment, And Delivery

Runtime State Matrix

Quick Start

Option 1: Unified Script CLI (recommended)

Option 2: Docker Compose

Option 3: Manual Local (3 terminals)

Configuration And Secrets

RAG Runtime (rag_system/config.py)

Backend Runtime (backend/.env)

Frontend Runtime (Vite)

Production Security Baseline

API Surface

RAG API (port 5000)

Backend API (port 3456)

Operations Toolkit

Root Scripts

Day-2 Operations Flow

Deployment And Infrastructure

Kubernetes + Progressive Delivery

Terraform

Production Governance And Release Decision Model

Testing And Quality Gates

Unified Gate

Additional Checks

Security And Production Notes

Further Reading & Resources

Documentation Index

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors 2

Languages

RAG Runtime (`rag_system/config.py`)

Backend Runtime (`backend/.env`)

Frontend Runtime (`Vite`)

RAG API (port `5000`)

Backend API (port `3456`)

Packages