Distributed Hybrid Search System

Production-grade distributed search engine combining BM25 keyword matching with semantic vector search. Built from scratch in Go with 24,765 Wikipedia documents across 8 shards.

Features

Vector Search (Phase 6)(Work In Progress...)

384-dimensional embeddings via Ollama (all-minilm model)
24,765 documents with stored vectors
Semantic similarity matching
Hybrid BM25 + cosine fusion

Smart Routing (Phase 4)

Hot-term shard affinity (80% traffic uses 2 shards vs 8)
etcd-based routing configuration

Caching (Phase 3)

Redis cache with 5-minute TTL
Thundering herd protection
0.5ms p99 latency on hits

Distribution (Phase 2)

8-shard MD5 partitioning
3-node etcd cluster
Automatic shard discovery

Foundation (Phase 1)

Bleve full-text search
BM25 relevance scoring
3.3k docs/sec indexing

Architecture

Quick Start

Prerequisites

Docker & Docker Compose
Go 1.24+
4GB RAM minimum

Deploy

git clone https://github.com/Devanshusharma2005/distributed-search.git
cd distributed-search

docker compose -f docker-compose.yml up -d --build

sleep 30

docker exec ollama ollama pull all-minilm

docker compose -f docker-compose.yml ps

Verify

curl http://localhost:8090/health

curl http://localhost:8090/shards | jq '.count'

curl 'http://localhost:8090/search?q=biodiversity&limit=3' | jq '.total_hits'

API Endpoints

`/search` - Keyword Search

curl 'http://localhost:8090/search?q=distributed&limit=5' | jq

Response:

{
  "query": "distributed",
  "shards": 8,
  "total_hits": 47,
  "routing_type": "hot",
  "hits": [{
    "id": "wiki_1234",
    "score": 12.456,
    "title": "Distributed computing",
    "shard": "shard-0:8080"
  }],
  "took": "3.2ms"
}

Parameters:

q (required): Query string
limit (optional, default=20): Results to return

`/hybrid` - Semantic + Keyword Search

curl 'http://localhost:8090/hybrid?q=biodiversity&limit=5' | jq

Response:

{
  "query": "biodiversity",
  "query_vector": [0.123, -0.456, ...],
  "keyword_hits": 256,
  "semantic_topk": 5,
  "fusion_alpha": 0.7,
  "hits": [{
    "id": "wiki_3467",
    "title": "Convention on Biological Diversity",
    "keyword_score": 1.007,
    "semantic_score": 0.0,
    "hybrid_score": 0.705,
    "shard": "shard-7:8080"
  }],
  "took": "18ms",
  "routing_type": "cold"
}

Parameters:

q (required): Query string
limit (optional, default=10): Results to return
alpha (optional, default=0.7): Keyword weight (0.0-1.0)

Alpha values:

1.0: Pure keyword (100% BM25)
0.7: Default (70% keyword, 30% semantic)
0.5: Balanced
0.3: Semantic-heavy (30% keyword, 70% semantic)

`/shards` - Active Shards

curl http://localhost:8090/shards | jq

`/hot-terms` - Routing Configuration

curl http://localhost:8090/hot-terms | jq

`/health` - Health Check

curl http://localhost:8090/health

Performance

Metric	Result
Maximum Throughput	10,000 QPS
Mean Latency (10k QPS)	5.18ms
P99 Latency (10k QPS)	92.72ms
Success Rate (10k QPS)	100%
Cache hit latency	0.5ms
Embedding generation	~10ms
Indexing speed	30-50 docs/sec

Load Test Results

10k QPS Test (100,000 requests):

echo 'GET http://localhost:8090/search?q=distributed&limit=5' | \
  vegeta attack -rate=10000 -duration=10s | \
  vegeta report

Results:

Requests      100,000
Rate          9,998.61/sec
Success       100.00%
Duration      10.003s

Latencies:
  Mean        5.18ms
  50th        4.62ms
  95th        8.33ms
  99th        92.72ms
  Max         131.92ms

Throughput    9,991.53/sec
Bytes In      60.2 MB
Bytes Out     10.8 MB

Internal Operations:
  800,000 shard RPCs (8 per query)
  100% success rate
  Zero packet loss

System survived 10k QPS on a single MacBook with:

8 shards processing 1,250 QPS each
Redis handling 95%+ cache hit rate
etcd coordinating 10k service discoveries/sec
Zero failures, zero timeouts, zero degradation

System Architecture

Services (15 containers)

Service	Count	Port	Purpose
coordinator	1	8090	Query routing, cache, fusion
etcd	3	2379	Service discovery, hot-terms
redis	1	6379	Cache (256MB LRU)
ollama	1	11434	Embedding generation
shards	8	8080	Bleve indexes
setup	1	-	Hot-term seeding

Data Distribution

Shard	Docs	Index Size
shard-0	3,195	45MB
shard-1	3,122	43MB
shard-2	3,032	42MB
shard-3	3,113	43MB
shard-4	3,028	41MB
shard-5	3,128	44MB
shard-6	3,071	43MB
shard-7	3,076	42MB
Total	24,765	343MB

Partitioning: MD5(doc_id) % 8

Vector storage: ~38MB (24,765 docs × 384 floats × 4 bytes)

Rebuilding Indexes with Vectors

If you need to rebuild the indexes with embeddings:

chmod +x rebuild-with-vectors.sh
./rebuild-with-vectors.sh

This will:

Back up existing indexes
Generate embeddings for all 24,765 documents
Build new indexes with 384-dim vectors
Takes ~10-15 minutes

Skip vectors (keyword-only):

./rebuild-with-vectors.sh --skip-vectors

Reduce batch size (if memory issues):

./rebuild-with-vectors.sh --batch-size 50

Manual Index Build

for i in {0..7}; do
  go run cmd/indexer/main.go \
    -input=shard-$i.jsonl \
    -index=search.bleve \
    -shard-id=$i \
    -batch-size=100 \
    -ollama=http://localhost:11434
done

Development

Local (No Docker)

docker compose -f docker-compose.yml up -d etcd0 redis ollama

docker exec ollama ollama pull all-minilm

go build -o coord cmd/coordinator/main.go
go build -o shard cmd/searcher/main.go

for i in {0..7}; do
  ./shard --shard-id=$i --port=$((8080+$i)) --hostname=localhost \
          --index=search.bleve --etcd=localhost:2379 &
done

./coord --port=8090 --etcd=localhost:2379 --redis=localhost:6379

Add Hot Terms

docker exec etcd0 etcdctl put /hot_terms/algorithm/shards "1,3,5"

curl http://localhost:8090/hot-terms | jq

curl 'http://localhost:8090/search?q=algorithm' | jq '.routing_type'

Monitor Cache

docker exec redis redis-cli INFO stats | grep hits

docker exec redis redis-cli KEYS "search:*"

docker exec redis redis-cli GET "search:biodiversity:5" | jq

Troubleshooting

No Results

ls -lh search.bleve-*/

docker compose -f docker-compose.yml restart shard-{0..7}

curl http://localhost:8090/shards | jq '.count'

Ollama Not Connected

docker ps | grep ollama

docker compose -f docker-compose.yml up -d ollama

docker exec ollama ollama pull all-minilm

docker compose -f docker-compose.yml restart coordinator

Shards Not Registering

docker exec etcd0 etcdctl get --prefix /shards/active/

docker compose -f docker-compose.yml logs shard-0

docker compose -f docker-compose.yml restart shard-{0..7}

etcd Unhealthy

docker exec etcd0 etcdctl endpoint health

docker compose -f docker-compose.yml down
docker volume prune -f
docker compose -f docker-compose.yml up -d

Cache Verification

curl -i 'http://localhost:8090/search?q=test&limit=3'

curl -i 'http://localhost:8090/search?q=test&limit=3'

First request: X-Cache: MISS
Second request: X-Cache: HIT

Project Structure

distributed-search/
├── cmd/
│   ├── coordinator/main.go     (query router, cache, hybrid)
│   ├── indexer/main.go         (document indexing + vectors)
│   ├── ingester/main.go        (Wikipedia XML → JSONL pipeline)
│   └── searcher/main.go        (shard service)
├── internal/
│   ├── embed/client.go         (Ollama embedding client)
│   ├── hybrid/search.go        (hybrid search logic)
│   ├── index/indexer.go        (Bleve indexer)
│   └── model/doc.go            (document model)
├── docker/
│   ├── Dockerfile.coordinator
│   └── Dockerfile.shard
├── docker-compose.yml
├── rebuild-with-vectors.sh
├── test-vectors.sh
├── shard-{0-7}.jsonl           (partitioned data)
└── search.bleve-{0-7}/         (indexes with vectors)

Technologies

Go 1.24: Primary language
Bleve: Full-text search (BM25)
etcd: Service discovery (Raft)
Redis: Caching (LRU)
Ollama: Local embeddings (all-minilm)
Docker Compose: Orchestration

Algorithms

BM25: Best Match 25 scoring
Cosine Similarity: Vector similarity
MD5: Document partitioning
LRU: Cache eviction
Raft: Distributed consensus

License

MIT

Author

Devanshu Sharma
GitHub: @Devanshusharma2005

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed Hybrid Search System

Features

Architecture

Quick Start

Prerequisites

Deploy

Verify

API Endpoints

`/search` - Keyword Search

`/hybrid` - Semantic + Keyword Search

`/shards` - Active Shards

`/hot-terms` - Routing Configuration

`/health` - Health Check

Performance

Load Test Results

System Architecture

Services (15 containers)

Data Distribution

Rebuilding Indexes with Vectors

Manual Index Build

Development

Local (No Docker)

Add Hot Terms

Monitor Cache

Troubleshooting

No Results

Ollama Not Connected

Shards Not Registering

etcd Unhealthy

Cache Verification

Project Structure

Technologies

Algorithms

License

Author

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Distributed Hybrid Search System

Features

Architecture

Quick Start

Prerequisites

Deploy

Verify

API Endpoints

/search - Keyword Search

/hybrid - Semantic + Keyword Search

/shards - Active Shards

/hot-terms - Routing Configuration

/health - Health Check

Performance

Load Test Results

System Architecture

Services (15 containers)

Data Distribution

Rebuilding Indexes with Vectors

Manual Index Build

Development

Local (No Docker)

Add Hot Terms

Monitor Cache

Troubleshooting

No Results

Ollama Not Connected

Shards Not Registering

etcd Unhealthy

Cache Verification

Project Structure

Technologies

Algorithms

License

Author

`/search` - Keyword Search

`/hybrid` - Semantic + Keyword Search

`/shards` - Active Shards

`/hot-terms` - Routing Configuration

`/health` - Health Check