Production-grade distributed search engine combining BM25 keyword matching with semantic vector search. Built from scratch in Go with 24,765 Wikipedia documents across 8 shards.
Vector Search (Phase 6)(Work In Progress...)
- 384-dimensional embeddings via Ollama (all-minilm model)
- 24,765 documents with stored vectors
- Semantic similarity matching
- Hybrid BM25 + cosine fusion
Smart Routing (Phase 4)
- Hot-term shard affinity (80% traffic uses 2 shards vs 8)
- etcd-based routing configuration
Caching (Phase 3)
- Redis cache with 5-minute TTL
- Thundering herd protection
- 0.5ms p99 latency on hits
Distribution (Phase 2)
- 8-shard MD5 partitioning
- 3-node etcd cluster
- Automatic shard discovery
Foundation (Phase 1)
- Bleve full-text search
- BM25 relevance scoring
- 3.3k docs/sec indexing
--
- Docker & Docker Compose
- Go 1.24+
- 4GB RAM minimum
git clone https://github.com/Devanshusharma2005/distributed-search.git
cd distributed-search
docker compose -f docker-compose.yml up -d --build
sleep 30
docker exec ollama ollama pull all-minilm
docker compose -f docker-compose.yml pscurl http://localhost:8090/health
curl http://localhost:8090/shards | jq '.count'
curl 'http://localhost:8090/search?q=biodiversity&limit=3' | jq '.total_hits'curl 'http://localhost:8090/search?q=distributed&limit=5' | jqResponse:
{
"query": "distributed",
"shards": 8,
"total_hits": 47,
"routing_type": "hot",
"hits": [{
"id": "wiki_1234",
"score": 12.456,
"title": "Distributed computing",
"shard": "shard-0:8080"
}],
"took": "3.2ms"
}Parameters:
q(required): Query stringlimit(optional, default=20): Results to return
curl 'http://localhost:8090/hybrid?q=biodiversity&limit=5' | jqResponse:
{
"query": "biodiversity",
"query_vector": [0.123, -0.456, ...],
"keyword_hits": 256,
"semantic_topk": 5,
"fusion_alpha": 0.7,
"hits": [{
"id": "wiki_3467",
"title": "Convention on Biological Diversity",
"keyword_score": 1.007,
"semantic_score": 0.0,
"hybrid_score": 0.705,
"shard": "shard-7:8080"
}],
"took": "18ms",
"routing_type": "cold"
}Parameters:
q(required): Query stringlimit(optional, default=10): Results to returnalpha(optional, default=0.7): Keyword weight (0.0-1.0)
Alpha values:
1.0: Pure keyword (100% BM25)0.7: Default (70% keyword, 30% semantic)0.5: Balanced0.3: Semantic-heavy (30% keyword, 70% semantic)
curl http://localhost:8090/shards | jqcurl http://localhost:8090/hot-terms | jqcurl http://localhost:8090/health| Metric | Result |
|---|---|
| Maximum Throughput | 10,000 QPS |
| Mean Latency (10k QPS) | 5.18ms |
| P99 Latency (10k QPS) | 92.72ms |
| Success Rate (10k QPS) | 100% |
| Cache hit latency | 0.5ms |
| Embedding generation | ~10ms |
| Indexing speed | 30-50 docs/sec |
10k QPS Test (100,000 requests):
echo 'GET http://localhost:8090/search?q=distributed&limit=5' | \
vegeta attack -rate=10000 -duration=10s | \
vegeta reportResults:
Requests 100,000
Rate 9,998.61/sec
Success 100.00%
Duration 10.003s
Latencies:
Mean 5.18ms
50th 4.62ms
95th 8.33ms
99th 92.72ms
Max 131.92ms
Throughput 9,991.53/sec
Bytes In 60.2 MB
Bytes Out 10.8 MB
Internal Operations:
800,000 shard RPCs (8 per query)
100% success rate
Zero packet loss
System survived 10k QPS on a single MacBook with:
- 8 shards processing 1,250 QPS each
- Redis handling 95%+ cache hit rate
- etcd coordinating 10k service discoveries/sec
- Zero failures, zero timeouts, zero degradation
| Service | Count | Port | Purpose |
|---|---|---|---|
| coordinator | 1 | 8090 | Query routing, cache, fusion |
| etcd | 3 | 2379 | Service discovery, hot-terms |
| redis | 1 | 6379 | Cache (256MB LRU) |
| ollama | 1 | 11434 | Embedding generation |
| shards | 8 | 8080 | Bleve indexes |
| setup | 1 | - | Hot-term seeding |
| Shard | Docs | Index Size |
|---|---|---|
| shard-0 | 3,195 | 45MB |
| shard-1 | 3,122 | 43MB |
| shard-2 | 3,032 | 42MB |
| shard-3 | 3,113 | 43MB |
| shard-4 | 3,028 | 41MB |
| shard-5 | 3,128 | 44MB |
| shard-6 | 3,071 | 43MB |
| shard-7 | 3,076 | 42MB |
| Total | 24,765 | 343MB |
Partitioning: MD5(doc_id) % 8
Vector storage: ~38MB (24,765 docs × 384 floats × 4 bytes)
If you need to rebuild the indexes with embeddings:
chmod +x rebuild-with-vectors.sh
./rebuild-with-vectors.shThis will:
- Back up existing indexes
- Generate embeddings for all 24,765 documents
- Build new indexes with 384-dim vectors
- Takes ~10-15 minutes
Skip vectors (keyword-only):
./rebuild-with-vectors.sh --skip-vectorsReduce batch size (if memory issues):
./rebuild-with-vectors.sh --batch-size 50for i in {0..7}; do
go run cmd/indexer/main.go \
-input=shard-$i.jsonl \
-index=search.bleve \
-shard-id=$i \
-batch-size=100 \
-ollama=http://localhost:11434
donedocker compose -f docker-compose.yml up -d etcd0 redis ollama
docker exec ollama ollama pull all-minilm
go build -o coord cmd/coordinator/main.go
go build -o shard cmd/searcher/main.go
for i in {0..7}; do
./shard --shard-id=$i --port=$((8080+$i)) --hostname=localhost \
--index=search.bleve --etcd=localhost:2379 &
done
./coord --port=8090 --etcd=localhost:2379 --redis=localhost:6379docker exec etcd0 etcdctl put /hot_terms/algorithm/shards "1,3,5"
curl http://localhost:8090/hot-terms | jq
curl 'http://localhost:8090/search?q=algorithm' | jq '.routing_type'docker exec redis redis-cli INFO stats | grep hits
docker exec redis redis-cli KEYS "search:*"
docker exec redis redis-cli GET "search:biodiversity:5" | jqls -lh search.bleve-*/
docker compose -f docker-compose.yml restart shard-{0..7}
curl http://localhost:8090/shards | jq '.count'docker ps | grep ollama
docker compose -f docker-compose.yml up -d ollama
docker exec ollama ollama pull all-minilm
docker compose -f docker-compose.yml restart coordinatordocker exec etcd0 etcdctl get --prefix /shards/active/
docker compose -f docker-compose.yml logs shard-0
docker compose -f docker-compose.yml restart shard-{0..7}docker exec etcd0 etcdctl endpoint health
docker compose -f docker-compose.yml down
docker volume prune -f
docker compose -f docker-compose.yml up -dcurl -i 'http://localhost:8090/search?q=test&limit=3'
curl -i 'http://localhost:8090/search?q=test&limit=3'First request: X-Cache: MISS
Second request: X-Cache: HIT
distributed-search/
├── cmd/
│ ├── coordinator/main.go (query router, cache, hybrid)
│ ├── indexer/main.go (document indexing + vectors)
│ ├── ingester/main.go (Wikipedia XML → JSONL pipeline)
│ └── searcher/main.go (shard service)
├── internal/
│ ├── embed/client.go (Ollama embedding client)
│ ├── hybrid/search.go (hybrid search logic)
│ ├── index/indexer.go (Bleve indexer)
│ └── model/doc.go (document model)
├── docker/
│ ├── Dockerfile.coordinator
│ └── Dockerfile.shard
├── docker-compose.yml
├── rebuild-with-vectors.sh
├── test-vectors.sh
├── shard-{0-7}.jsonl (partitioned data)
└── search.bleve-{0-7}/ (indexes with vectors)
- Go 1.24: Primary language
- Bleve: Full-text search (BM25)
- etcd: Service discovery (Raft)
- Redis: Caching (LRU)
- Ollama: Local embeddings (all-minilm)
- Docker Compose: Orchestration
- BM25: Best Match 25 scoring
- Cosine Similarity: Vector similarity
- MD5: Document partitioning
- LRU: Cache eviction
- Raft: Distributed consensus
MIT
Devanshu Sharma
GitHub: @Devanshusharma2005
