Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@

## Installation

> **📦 Offline Deployment**: For offline or air-gapped environments, see the [Offline Deployment Guide](./docs/OfflineDeployment.md) for instructions on pre-installing all dependencies and cache files.

### Install LightRAG Server

The LightRAG Server is designed to provide Web UI and API support. The Web UI facilitates document indexing, knowledge graph exploration, and a simple RAG query interface. LightRAG Server also provide an Ollama compatible interfaces, aiming to emulate LightRAG as an Ollama chat model. This allows AI chat bot, such as Open WebUI, to access LightRAG easily.
Expand Down
311 changes: 311 additions & 0 deletions docs/OfflineDeployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,311 @@
# LightRAG Offline Deployment Guide

This guide provides comprehensive instructions for deploying LightRAG in offline environments where internet access is limited or unavailable.

## Table of Contents

- [Overview](#overview)
- [Quick Start](#quick-start)
- [Layered Dependencies](#layered-dependencies)
- [Tiktoken Cache Management](#tiktoken-cache-management)
- [Complete Offline Deployment Workflow](#complete-offline-deployment-workflow)
- [Troubleshooting](#troubleshooting)

## Overview

LightRAG uses dynamic package installation (`pipmaster`) for optional features based on file types and configurations. In offline environments, these dynamic installations will fail. This guide shows you how to pre-install all necessary dependencies and cache files.

### What Gets Dynamically Installed?

LightRAG dynamically installs packages for:

- **Document Processing**: `docling`, `pypdf2`, `python-docx`, `python-pptx`, `openpyxl`
- **Storage Backends**: `redis`, `neo4j`, `pymilvus`, `pymongo`, `asyncpg`, `qdrant-client`
- **LLM Providers**: `openai`, `anthropic`, `ollama`, `zhipuai`, `aioboto3`, `voyageai`, `llama-index`, `lmdeploy`, `transformers`, `torch`
- Tiktoken Models**: BPE encoding models downloaded from OpenAI CDN

## Quick Start

### Option 1: Using pip with Offline Extras

```bash
# Online environment: Install all offline dependencies
pip install lightrag-hku[offline]

# Download tiktoken cache
lightrag-download-cache

# Create offline package
pip download lightrag-hku[offline] -d ./offline-packages
tar -czf lightrag-offline.tar.gz ./offline-packages ~/.tiktoken_cache

# Transfer to offline server
scp lightrag-offline.tar.gz user@offline-server:/path/to/

# Offline environment: Install
tar -xzf lightrag-offline.tar.gz
pip install --no-index --find-links=./offline-packages lightrag-hku[offline]
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache
```

### Option 2: Using Requirements Files

```bash
# Online environment: Download packages
pip download -r requirements-offline.txt -d ./packages

# Transfer to offline server
tar -czf packages.tar.gz ./packages
scp packages.tar.gz user@offline-server:/path/to/

# Offline environment: Install
tar -xzf packages.tar.gz
pip install --no-index --find-links=./packages -r requirements-offline.txt
```

## Layered Dependencies

LightRAG provides flexible dependency groups for different use cases:

### Available Dependency Groups

| Group | Description | Use Case |
|-------|-------------|----------|
| `offline-docs` | Document processing | PDF, DOCX, PPTX, XLSX files |
| `offline-storage` | Storage backends | Redis, Neo4j, MongoDB, PostgreSQL, etc. |
| `offline-llm` | LLM providers | OpenAI, Anthropic, Ollama, etc. |
| `offline` | All of the above | Complete offline deployment |

### Installation Examples

```bash
# Install only document processing dependencies
pip install lightrag-hku[offline-docs]

# Install document processing and storage backends
pip install lightrag-hku[offline-docs,offline-storage]

# Install all offline dependencies
pip install lightrag-hku[offline]
```

### Using Individual Requirements Files

```bash
# Document processing only
pip install -r requirements-offline-docs.txt

# Storage backends only
pip install -r requirements-offline-storage.txt

# LLM providers only
pip install -r requirements-offline-llm.txt

# All offline dependencies
pip install -r requirements-offline.txt
```

## Tiktoken Cache Management

Tiktoken downloads BPE encoding models on first use. In offline environments, you must pre-download these models.

### Using the CLI Command

After installing LightRAG, use the built-in command:

```bash
# Download to default location (~/.tiktoken_cache)
lightrag-download-cache

# Download to specific directory
lightrag-download-cache --cache-dir ./tiktoken_cache

# Download specific models only
lightrag-download-cache --models gpt-4o-mini gpt-4
```

### Default Models Downloaded

- `gpt-4o-mini` (LightRAG default)
- `gpt-4o`
- `gpt-4`
- `gpt-3.5-turbo`
- `text-embedding-ada-002`
- `text-embedding-3-small`
- `text-embedding-3-large`

### Setting Cache Location in Offline Environment

```bash
# Option 1: Environment variable (temporary)
export TIKTOKEN_CACHE_DIR=/path/to/tiktoken_cache

# Option 2: Add to ~/.bashrc or ~/.zshrc (persistent)
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
source ~/.bashrc

# Option 3: Copy to default location
cp -r /path/to/tiktoken_cache ~/.tiktoken_cache/
```

## Complete Offline Deployment Workflow

### Step 1: Prepare in Online Environment

```bash
# 1. Install LightRAG with offline dependencies
pip install lightrag-hku[offline]

# 2. Download tiktoken cache
lightrag-download-cache --cache-dir ./offline_cache/tiktoken

# 3. Download all Python packages
pip download lightrag-hku[offline] -d ./offline_cache/packages

# 4. Create archive for transfer
tar -czf lightrag-offline-complete.tar.gz ./offline_cache

# 5. Verify contents
tar -tzf lightrag-offline-complete.tar.gz | head -20
```

### Step 2: Transfer to Offline Environment

```bash
# Using scp
scp lightrag-offline-complete.tar.gz user@offline-server:/tmp/

# Or using USB/physical media
# Copy lightrag-offline-complete.tar.gz to USB drive
```

### Step 3: Install in Offline Environment

```bash
# 1. Extract archive
cd /tmp
tar -xzf lightrag-offline-complete.tar.gz

# 2. Install Python packages
pip install --no-index \
--find-links=/tmp/offline_cache/packages \
lightrag-hku[offline]

# 3. Set up tiktoken cache
mkdir -p ~/.tiktoken_cache
cp -r /tmp/offline_cache/tiktoken/* ~/.tiktoken_cache/
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache

# 4. Add to shell profile for persistence
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
```

### Step 4: Verify Installation

```bash
# Test Python import
python -c "from lightrag import LightRAG; print('✓ LightRAG imported')"

# Test tiktoken
python -c "from lightrag.utils import TiktokenTokenizer; t = TiktokenTokenizer(); print('✓ Tiktoken working')"

# Test optional dependencies (if installed)
python -c "import docling; print('✓ Docling available')"
python -c "import redis; print('✓ Redis available')"
```

## Troubleshooting

### Issue: Tiktoken fails with network error

**Problem**: `Unable to load tokenizer for model gpt-4o-mini`

**Solution**:
```bash
# Ensure TIKTOKEN_CACHE_DIR is set
echo $TIKTOKEN_CACHE_DIR

# Verify cache files exist
ls -la ~/.tiktoken_cache/

# If empty, you need to download cache in online environment first
```

### Issue: Dynamic package installation fails

**Problem**: `Error installing package xxx`

**Solution**:
```bash
# Pre-install the specific package you need
# For document processing:
pip install lightrag-hku[offline-docs]

# For storage backends:
pip install lightrag-hku[offline-storage]

# For LLM providers:
pip install lightrag-hku[offline-llm]
```

### Issue: Missing dependencies at runtime

**Problem**: `ModuleNotFoundError: No module named 'xxx'`

**Solution**:
```bash
# Check what you have installed
pip list | grep -i xxx

# Install missing component
pip install lightrag-hku[offline] # Install all offline deps
```

### Issue: Permission denied on tiktoken cache

**Problem**: `PermissionError: [Errno 13] Permission denied`

**Solution**:
```bash
# Ensure cache directory has correct permissions
chmod 755 ~/.tiktoken_cache
chmod 644 ~/.tiktoken_cache/*

# Or use a user-writable directory
export TIKTOKEN_CACHE_DIR=~/my_tiktoken_cache
mkdir -p ~/my_tiktoken_cache
```

## Best Practices

1. **Test in Online Environment First**: Always test your complete setup in an online environment before going offline.

2. **Keep Cache Updated**: Periodically update your offline cache when new models are released.

3. **Document Your Setup**: Keep notes on which optional dependencies you actually need.

4. **Version Pinning**: Consider pinning specific versions in production:
```bash
pip freeze > requirements-production.txt
```

5. **Minimal Installation**: Only install what you need:
```bash
# If you only process PDFs with OpenAI
pip install lightrag-hku[offline-docs]
# Then manually add: pip install openai
```

## Additional Resources

- [LightRAG GitHub Repository](https://github.com/HKUDS/LightRAG)
- [Docker Deployment Guide](./DockerDeployment.md)
- [API Documentation](../lightrag/api/README.md)

## Support

If you encounter issues not covered in this guide:

1. Check the [GitHub Issues](https://github.com/HKUDS/LightRAG/issues)
2. Review the [project documentation](../README.md)
3. Create a new issue with your offline deployment details
12 changes: 12 additions & 0 deletions lightrag/api/README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,15 +118,23 @@ services:
lightrag:
container_name: lightrag
image: ghcr.io/hkuds/lightrag:latest
build:
context: .
dockerfile: Dockerfile
tags:
- ghcr.io/hkuds/lightrag:latest
ports:
- "${PORT:-9621}:9621"
volumes:
- ./data/rag_storage:/app/data/rag_storage
- ./data/inputs:/app/data/inputs
- ./data/tiktoken:/app/data/tiktoken
- ./config.ini:/app/config.ini
- ./.env:/app/.env
env_file:
- .env
environment:
- TIKTOKEN_CACHE_DIR=/app/data/tiktoken
restart: unless-stopped
extra_hosts:
- "host.docker.internal:host-gateway"
Expand All @@ -140,6 +148,10 @@ docker compose up
```
> 可以通过以下链接获取官方的docker compose文件:[docker-compose.yml]( https://raw.githubusercontent.com/HKUDS/LightRAG/refs/heads/main/docker-compose.yml) 。如需获取LightRAG的历史版本镜像,可以访问以下链接: [LightRAG Docker Images]( https://github.com/HKUDS/LightRAG/pkgs/container/lightrag)

### 离线部署

对于离线或隔离环境,请参阅[离线部署指南](./../../docs/OfflineDeployment.md),了解如何预先安装所有依赖项和缓存文件。

### 启动多个LightRAG实例

有两种方式可以启动多个LightRAG实例。第一种方式是为每个实例配置一个完全独立的工作环境。此时需要为每个实例创建一个独立的工作目录,然后在这个工作目录上放置一个当前实例专用的`.env`配置文件。不同实例的配置文件中的服务器监听端口不能重复,然后在工作目录上执行 lightrag-server 启动服务即可。
Expand Down
12 changes: 12 additions & 0 deletions lightrag/api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,15 +120,23 @@ services:
lightrag:
container_name: lightrag
image: ghcr.io/hkuds/lightrag:latest
build:
context: .
dockerfile: Dockerfile
tags:
- ghcr.io/hkuds/lightrag:latest
ports:
- "${PORT:-9621}:9621"
volumes:
- ./data/rag_storage:/app/data/rag_storage
- ./data/inputs:/app/data/inputs
- ./data/tiktoken:/app/data/tiktoken
- ./config.ini:/app/config.ini
- ./.env:/app/.env
env_file:
- .env
environment:
- TIKTOKEN_CACHE_DIR=/app/data/tiktoken
restart: unless-stopped
extra_hosts:
- "host.docker.internal:host-gateway"
Expand All @@ -143,6 +151,10 @@ docker compose up

> You can get the official docker compose file from here: [docker-compose.yml](https://raw.githubusercontent.com/HKUDS/LightRAG/refs/heads/main/docker-compose.yml). For historical versions of LightRAG docker images, visit this link: [LightRAG Docker Images](https://github.com/HKUDS/LightRAG/pkgs/container/lightrag)

### Offline Deployment

For offline or air-gapped environments, see the [Offline Deployment Guide](./../../docs/OfflineDeployment.md) for instructions on pre-installing all dependencies and cache files.

### Starting Multiple LightRAG Instances

There are two ways to start multiple LightRAG instances. The first way is to configure a completely independent working environment for each instance. This requires creating a separate working directory for each instance and placing a dedicated `.env` configuration file in that directory. The server listening ports in the configuration files of different instances cannot be the same. Then, you can start the service by running `lightrag-server` in the working directory.
Expand Down
Loading