Obsidian Vault Chat

A RAG (Retrieval-Augmented Generation) powered chat interface for your Obsidian vault. Ask questions about your notes and get AI-powered responses with proper source citations.

Features

Semantic search using vector embeddings for accurate retrieval
Hybrid search combining semantic similarity with keyword boosting
Date-aware queries that automatically prioritize recent content
Smart citations with links to source articles
Progressive disclosure showing 5 results initially with option to see more
Profile system for separating public and private content
Powered by Anthropic's Claude Sonnet 4.5
Automatic re-indexing via cron jobs

Architecture

The system consists of:

Indexer: Processes Obsidian markdown files, chunks them, and generates embeddings
LanceDB: Vector database storing embeddings and metadata
RAG Search: Hybrid search combining vector similarity and keyword matching
Express Backend: API server handling chat requests
Claude AI: Generates natural language responses with proper citations
Web UI: Simple chat interface

Prerequisites

Node.js 18 or higher
An Obsidian vault
Anthropic API key (get one at https://console.anthropic.com/)

Installation

Clone the repository:

git clone https://github.com/YOUR_USERNAME/obsidian-vault-chat.git
cd obsidian-vault-chat

Install dependencies:

npm install

Configure environment variables:

cp dot_env .env
nano .env

Edit the .env file with your settings:

ANTHROPIC_API_KEY=your_api_key_here
VAULT_PATH=/path/to/your/obsidian/vault
PORT=3000
USE_RAG=true
DEFAULT_PROFILE=public
MAX_RESULTS=5
MAX_SEARCH_RESULTS=20
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2
CHUNK_SIZE=500
CHUNK_OVERLAP=50

Configure profiles in config/profiles.js:

export const profileConfig = {
  public: {
    name: 'Public Knowledge',
    directories: [
      'clippings',
      '03-Blog/benjamin-mendes/content/posts'
    ],
    // ... system prompt and settings
  }
};

Index your vault:

node scripts/index-vault.js

This will:

Read all markdown files from configured directories
Extract metadata (dates, sources, tags)
Chunk documents into 500-word pieces with 50-word overlap
Generate embeddings using the MiniLM model (downloads ~80MB on first run)
Store vectors in LanceDB

Start the server:

npm start

The chat interface will be available at http://localhost:3000

Configuration

Profile System

Profiles control which content is searchable. Edit config/profiles.js to customize:

public: {
  directories: [
    'clippings',
    '03-Blog/benjamin-mendes/content/posts'
  ],
  systemPrompt: 'Your custom prompt here...',
  enabled: true
}

Frontmatter Support

The system extracts metadata from markdown frontmatter:

For blog posts:

---
source: https://your-blog.com/post-url
publishDate: 2024-01-15
tags: [AI, Technology]
---

For clippings:

---
url: https://source-article.com
published: 2024-01-15
tags: [Research]
---

Priority: source property first, then falls back to url property.

Search Configuration

Adjust search behavior in .env:

MAX_RESULTS: Results shown per batch (default: 5)
MAX_SEARCH_RESULTS: Total results found before filtering (default: 20)
CHUNK_SIZE: Words per chunk (default: 500)
CHUNK_OVERLAP: Overlapping words between chunks (default: 50)

Production Deployment

Systemd Service (Auto-start on boot)

Create service file at /etc/systemd/system/obsidian-chat.service:

[Unit]
Description=Obsidian Vault Chat Service
After=network.target

[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/path/to/obsidian-vault-chat
Environment="NODE_ENV=production"
ExecStart=/usr/bin/node backend-server.js
Restart=always
RestartSec=10
StandardOutput=append:/var/log/obsidian-chat/output.log
StandardError=append:/var/log/obsidian-chat/error.log

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable obsidian-chat
sudo systemctl start obsidian-chat

Automatic Re-indexing

Create re-index script at /usr/local/bin/reindex-vault.sh:

#!/bin/bash
PROJECT_DIR="/path/to/obsidian-vault-chat"
cd "$PROJECT_DIR"
/usr/bin/node scripts/index-vault.js
sudo systemctl restart obsidian-chat

Make executable:

sudo chmod +x /usr/local/bin/reindex-vault.sh

Configure sudoers for passwordless restart:

sudo visudo
# Add: YOUR_USERNAME ALL=(ALL) NOPASSWD: /bin/systemctl restart obsidian-chat

Add cron job (twice daily at 6 AM and 6 PM):

crontab -e
# Add:
0 6 * * * /usr/local/bin/reindex-vault.sh
0 18 * * * /usr/local/bin/reindex-vault.sh

How It Works

Indexing Process

File Reading: Scans configured directories for markdown files
Metadata Extraction: Parses frontmatter for dates, sources, tags
Chunking: Splits documents into overlapping chunks for better retrieval
Embedding: Generates vector embeddings using Xenova transformers
Storage: Stores chunks with metadata in LanceDB

Search Process

Query Embedding: Converts user question to vector
Vector Search: Finds semantically similar chunks using cosine similarity
Keyword Boosting: Applies bonus scoring for exact keyword matches in titles and content
Deduplication: Groups chunks by file, preferring blog posts with sources
Date Sorting: For queries with temporal indicators (recent, latest), prioritizes by date
Diversification: Returns one chunk per unique article

Response Generation

Context Building: Formats top results with source links
Claude Processing: Sends context + query to Claude with system prompt
Citation Enforcement: System prompt requires linking every mentioned article
Progressive Disclosure: Offers to show more if additional results available

API Endpoints

POST /api/chat

Chat with your vault.

Request:

{
  "message": "What are the latest articles on AI?",
  "conversationHistory": [],
  "sessionId": "session_123"
}

Response:

{
  "response": "Hey! I've researched some great content...",
  "sourcesUsed": [
    {
      "name": "Article Title",
      "url": "https://source-url.com"
    }
  ],
  "hasMoreResults": true,
  "remainingCount": 15
}

GET /api/health

Health check endpoint.

Response:

{
  "status": "ok",
  "rag": "ready",
  "profile": "public",
  "maxResults": 5,
  "maxSearchResults": 20
}

Project Structure

obsidian-vault-chat/
├── backend-server.js          # Express API server
├── config/
│   └── profiles.js            # Profile configuration
├── lib/
│   ├── chunker.js            # Text chunking logic
│   ├── embeddings.js         # Embedding generation
│   ├── rag-search.js         # RAG search implementation
│   └── vault-reader.js       # Obsidian vault parser
├── scripts/
│   └── index-vault.js        # Indexing script
├── public/
│   └── index.html            # Chat UI
├── lancedb/                  # Vector database (generated)
├── package.json
└── .env                      # Configuration (not in git)

Useful Commands

Service Management:

sudo systemctl status obsidian-chat
sudo systemctl restart obsidian-chat
sudo journalctl -u obsidian-chat -f

View Logs:

tail -f /var/log/obsidian-chat/output.log
tail -f /var/log/obsidian-chat/error.log
tail -f /var/log/obsidian-chat/reindex.log

Manual Re-index:

node scripts/index-vault.js

Troubleshooting

Embeddings not generating:

First run downloads ~80MB model
Check internet connection
Verify EMBEDDING_MODEL in .env

No search results:

Verify VAULT_PATH points to correct directory
Check profile configuration in config/profiles.js
Ensure files have been indexed (lancedb/ directory exists)
Check file paths match configured directories

Links not showing:

Verify frontmatter has source or url property
Re-index after adding sources
Check logs for metadata extraction

Service won't start:

Check logs: sudo journalctl -u obsidian-chat -xe
Verify paths in service file
Ensure log directory exists and has correct permissions

License

MIT

Credits

Built with:

Anthropic Claude - AI language model
LanceDB - Vector database
Xenova Transformers - Embeddings
Express - Web framework

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Obsidian Vault Chat

Features

Architecture

Prerequisites

Installation

Configuration

Profile System

Frontmatter Support

Search Configuration

Production Deployment

Systemd Service (Auto-start on boot)

Automatic Re-indexing

How It Works

Indexing Process

Search Process

Response Generation

API Endpoints

POST /api/chat

GET /api/health

Project Structure

Useful Commands

Troubleshooting

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
lib		lib
public		public
scripts		scripts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
backend-server.js		backend-server.js
dot_env		dot_env
package-lock.json		package-lock.json
package.json		package.json
reindex-vault.sh		reindex-vault.sh
run_git		run_git

Folders and files

Latest commit

History

Repository files navigation

Obsidian Vault Chat

Features

Architecture

Prerequisites

Installation

Configuration

Profile System

Frontmatter Support

Search Configuration

Production Deployment

Systemd Service (Auto-start on boot)

Automatic Re-indexing

How It Works

Indexing Process

Search Process

Response Generation

API Endpoints

POST /api/chat

GET /api/health

Project Structure

Useful Commands

Troubleshooting

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages