Feat: Add Comprehensive Offline Deployment Solution #2194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

danielaskdd merged 1 commit into HKUDS:main from danielaskdd:offline

Oct 11, 2025

README.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -84,6 +84,8 @@ @@
     ## Installation
+    > **📦 Offline Deployment**: For offline or air-gapped environments, see the [Offline Deployment Guide](./docs/OfflineDeployment.md) for instructions on pre-installing all dependencies and cache files.
     ### Install LightRAG Server
     The LightRAG Server is designed to provide Web UI and API support. The Web UI facilitates document indexing, knowledge graph exploration, and a simple RAG query interface. LightRAG Server also provide an Ollama compatible interfaces, aiming to emulate LightRAG as an Ollama chat model. This allows AI chat bot, such as Open WebUI, to access LightRAG easily.
@@ Expand Down @@

docs/OfflineDeployment.md

-Original file line number
+Diff line change
@@ -0,0 +1,311 @@
+    # LightRAG Offline Deployment Guide
+    This guide provides comprehensive instructions for deploying LightRAG in offline environments where internet access is limited or unavailable.
+    ## Table of Contents
+    - [Overview](#overview)
+    - [Quick Start](#quick-start)
+    - [Layered Dependencies](#layered-dependencies)
+    - [Tiktoken Cache Management](#tiktoken-cache-management)
+    - [Complete Offline Deployment Workflow](#complete-offline-deployment-workflow)
+    - [Troubleshooting](#troubleshooting)
+    ## Overview
+    LightRAG uses dynamic package installation (`pipmaster`) for optional features based on file types and configurations. In offline environments, these dynamic installations will fail. This guide shows you how to pre-install all necessary dependencies and cache files.
+    ### What Gets Dynamically Installed?
+    LightRAG dynamically installs packages for:
+    - **Document Processing**: `docling`, `pypdf2`, `python-docx`, `python-pptx`, `openpyxl`
+    - **Storage Backends**: `redis`, `neo4j`, `pymilvus`, `pymongo`, `asyncpg`, `qdrant-client`
+    - **LLM Providers**: `openai`, `anthropic`, `ollama`, `zhipuai`, `aioboto3`, `voyageai`, `llama-index`, `lmdeploy`, `transformers`, `torch`
+    - Tiktoken Models**: BPE encoding models downloaded from OpenAI CDN
+    ## Quick Start
+    ### Option 1: Using pip with Offline Extras
+    ```bash
+    # Online environment: Install all offline dependencies
+    pip install lightrag-hku[offline]
+    # Download tiktoken cache
+    lightrag-download-cache
+    # Create offline package
+    pip download lightrag-hku[offline] -d ./offline-packages
+    tar -czf lightrag-offline.tar.gz ./offline-packages ~/.tiktoken_cache
+    # Transfer to offline server
+    scp lightrag-offline.tar.gz user@offline-server:/path/to/
+    # Offline environment: Install
+    tar -xzf lightrag-offline.tar.gz
+    pip install --no-index --find-links=./offline-packages lightrag-hku[offline]
+    export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache
+    ```
+    ### Option 2: Using Requirements Files
+    ```bash
+    # Online environment: Download packages
+    pip download -r requirements-offline.txt -d ./packages
+    # Transfer to offline server
+    tar -czf packages.tar.gz ./packages
+    scp packages.tar.gz user@offline-server:/path/to/
+    # Offline environment: Install
+    tar -xzf packages.tar.gz
+    pip install --no-index --find-links=./packages -r requirements-offline.txt
+    ```
+    ## Layered Dependencies
+    LightRAG provides flexible dependency groups for different use cases:
+    ### Available Dependency Groups
+    | Group | Description | Use Case |
+    |-------|-------------|----------|
+    | `offline-docs` | Document processing | PDF, DOCX, PPTX, XLSX files |
+    | `offline-storage` | Storage backends | Redis, Neo4j, MongoDB, PostgreSQL, etc. |
+    | `offline-llm` | LLM providers | OpenAI, Anthropic, Ollama, etc. |
+    | `offline` | All of the above | Complete offline deployment |
+    ### Installation Examples
+    ```bash
+    # Install only document processing dependencies
+    pip install lightrag-hku[offline-docs]
+    # Install document processing and storage backends
+    pip install lightrag-hku[offline-docs,offline-storage]
+    # Install all offline dependencies
+    pip install lightrag-hku[offline]
+    ```
+    ### Using Individual Requirements Files
+    ```bash
+    # Document processing only
+    pip install -r requirements-offline-docs.txt
+    # Storage backends only
+    pip install -r requirements-offline-storage.txt
+    # LLM providers only
+    pip install -r requirements-offline-llm.txt
+    # All offline dependencies
+    pip install -r requirements-offline.txt
+    ```
+    ## Tiktoken Cache Management
+    Tiktoken downloads BPE encoding models on first use. In offline environments, you must pre-download these models.
+    ### Using the CLI Command
+    After installing LightRAG, use the built-in command:
+    ```bash
+    # Download to default location (~/.tiktoken_cache)
+    lightrag-download-cache
+    # Download to specific directory
+    lightrag-download-cache --cache-dir ./tiktoken_cache
+    # Download specific models only
+    lightrag-download-cache --models gpt-4o-mini gpt-4
+    ```
+    ### Default Models Downloaded
+    - `gpt-4o-mini` (LightRAG default)
+    - `gpt-4o`
+    - `gpt-4`
+    - `gpt-3.5-turbo`
+    - `text-embedding-ada-002`
+    - `text-embedding-3-small`
+    - `text-embedding-3-large`
+    ### Setting Cache Location in Offline Environment
+    ```bash
+    # Option 1: Environment variable (temporary)
+    export TIKTOKEN_CACHE_DIR=/path/to/tiktoken_cache
+    # Option 2: Add to ~/.bashrc or ~/.zshrc (persistent)
+    echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
+    source ~/.bashrc
+    # Option 3: Copy to default location
+    cp -r /path/to/tiktoken_cache ~/.tiktoken_cache/
+    ```
+    ## Complete Offline Deployment Workflow
+    ### Step 1: Prepare in Online Environment
+    ```bash
+    # 1. Install LightRAG with offline dependencies
+    pip install lightrag-hku[offline]
+    # 2. Download tiktoken cache
+    lightrag-download-cache --cache-dir ./offline_cache/tiktoken
+    # 3. Download all Python packages
+    pip download lightrag-hku[offline] -d ./offline_cache/packages
+    # 4. Create archive for transfer
+    tar -czf lightrag-offline-complete.tar.gz ./offline_cache
+    # 5. Verify contents
+    tar -tzf lightrag-offline-complete.tar.gz | head -20
+    ```
+    ### Step 2: Transfer to Offline Environment
+    ```bash
+    # Using scp
+    scp lightrag-offline-complete.tar.gz user@offline-server:/tmp/
+    # Or using USB/physical media
+    # Copy lightrag-offline-complete.tar.gz to USB drive
+    ```
+    ### Step 3: Install in Offline Environment
+    ```bash
+    # 1. Extract archive
+    cd /tmp
+    tar -xzf lightrag-offline-complete.tar.gz
+    # 2. Install Python packages
+    pip install --no-index \
+        --find-links=/tmp/offline_cache/packages \
+        lightrag-hku[offline]
+    # 3. Set up tiktoken cache
+    mkdir -p ~/.tiktoken_cache
+    cp -r /tmp/offline_cache/tiktoken/* ~/.tiktoken_cache/
+    export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache
+    # 4. Add to shell profile for persistence
+    echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
+    ```
+    ### Step 4: Verify Installation
+    ```bash
+    # Test Python import
+    python -c "from lightrag import LightRAG; print('✓ LightRAG imported')"
+    # Test tiktoken
+    python -c "from lightrag.utils import TiktokenTokenizer; t = TiktokenTokenizer(); print('✓ Tiktoken working')"
+    # Test optional dependencies (if installed)
+    python -c "import docling; print('✓ Docling available')"
+    python -c "import redis; print('✓ Redis available')"
+    ```
+    ## Troubleshooting
+    ### Issue: Tiktoken fails with network error
+    **Problem**: `Unable to load tokenizer for model gpt-4o-mini`
+    **Solution**:
+    ```bash
+    # Ensure TIKTOKEN_CACHE_DIR is set
+    echo $TIKTOKEN_CACHE_DIR
+    # Verify cache files exist
+    ls -la ~/.tiktoken_cache/
+    # If empty, you need to download cache in online environment first
+    ```
+    ### Issue: Dynamic package installation fails
+    **Problem**: `Error installing package xxx`
+    **Solution**:
+    ```bash
+    # Pre-install the specific package you need
+    # For document processing:
+    pip install lightrag-hku[offline-docs]
+    # For storage backends:
+    pip install lightrag-hku[offline-storage]
+    # For LLM providers:
+    pip install lightrag-hku[offline-llm]
+    ```
+    ### Issue: Missing dependencies at runtime
+    **Problem**: `ModuleNotFoundError: No module named 'xxx'`
+    **Solution**:
+    ```bash
+    # Check what you have installed
+    pip list | grep -i xxx
+    # Install missing component
+    pip install lightrag-hku[offline]  # Install all offline deps
+    ```
+    ### Issue: Permission denied on tiktoken cache
+    **Problem**: `PermissionError: [Errno 13] Permission denied`
+    **Solution**:
+    ```bash
+    # Ensure cache directory has correct permissions
+    chmod 755 ~/.tiktoken_cache
+    chmod 644 ~/.tiktoken_cache/*
+    # Or use a user-writable directory
+    export TIKTOKEN_CACHE_DIR=~/my_tiktoken_cache
+    mkdir -p ~/my_tiktoken_cache
+    ```
+    ## Best Practices
+. **Test in Online Environment First**: Always test your complete setup in an online environment before going offline.
+. **Keep Cache Updated**: Periodically update your offline cache when new models are released.
+. **Document Your Setup**: Keep notes on which optional dependencies you actually need.
+. **Version Pinning**: Consider pinning specific versions in production:
+       ```bash
+       pip freeze > requirements-production.txt
+       ```
+. **Minimal Installation**: Only install what you need:
+       ```bash
+       # If you only process PDFs with OpenAI
+       pip install lightrag-hku[offline-docs]
+       # Then manually add: pip install openai
+       ```
+    ## Additional Resources
+    - [LightRAG GitHub Repository](https://github.com/HKUDS/LightRAG)
+    - [Docker Deployment Guide](./DockerDeployment.md)
+    - [API Documentation](../lightrag/api/README.md)
+    ## Support
+    If you encounter issues not covered in this guide:
+. Check the [GitHub Issues](https://github.com/HKUDS/LightRAG/issues)
+. Review the [project documentation](../README.md)
+. Create a new issue with your offline deployment details

lightrag/api/README-zh.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -118,15 +118,23 @@ services: @@
       lightrag:
         container_name: lightrag
         image: ghcr.io/hkuds/lightrag:latest
+        build:
+          context: .
+          dockerfile: Dockerfile
+          tags:
+            - ghcr.io/hkuds/lightrag:latest
         ports:
           - "${PORT:-9621}:9621"
         volumes:
           - ./data/rag_storage:/app/data/rag_storage
           - ./data/inputs:/app/data/inputs
+          - ./data/tiktoken:/app/data/tiktoken
           - ./config.ini:/app/config.ini
           - ./.env:/app/.env
         env_file:
           - .env
+        environment:
+          - TIKTOKEN_CACHE_DIR=/app/data/tiktoken
         restart: unless-stopped
         extra_hosts:
           - "host.docker.internal:host-gateway"
@@ Expand All / @@ -140,6 +148,10 @@ docker compose up @@
     ```
     > 可以通过以下链接获取官方的docker compose文件：[docker-compose.yml]( https://raw.githubusercontent.com/HKUDS/LightRAG/refs/heads/main/docker-compose.yml) 。如需获取LightRAG的历史版本镜像，可以访问以下链接: [LightRAG Docker Images]( https://github.com/HKUDS/LightRAG/pkgs/container/lightrag)
+    ### 离线部署
+    对于离线或隔离环境，请参阅[离线部署指南](./../../docs/OfflineDeployment.md)，了解如何预先安装所有依赖项和缓存文件。
     ### 启动多个LightRAG实例
     有两种方式可以启动多个LightRAG实例。第一种方式是为每个实例配置一个完全独立的工作环境。此时需要为每个实例创建一个独立的工作目录，然后在这个工作目录上放置一个当前实例专用的`.env`配置文件。不同实例的配置文件中的服务器监听端口不能重复，然后在工作目录上执行 lightrag-server 启动服务即可。
@@ Expand Down @@

lightrag/api/README.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -120,15 +120,23 @@ services: @@
       lightrag:
         container_name: lightrag
         image: ghcr.io/hkuds/lightrag:latest
+        build:
+          context: .
+          dockerfile: Dockerfile
+          tags:
+            - ghcr.io/hkuds/lightrag:latest
         ports:
           - "${PORT:-9621}:9621"
         volumes:
           - ./data/rag_storage:/app/data/rag_storage
           - ./data/inputs:/app/data/inputs
+          - ./data/tiktoken:/app/data/tiktoken
           - ./config.ini:/app/config.ini
           - ./.env:/app/.env
         env_file:
           - .env
+        environment:
+          - TIKTOKEN_CACHE_DIR=/app/data/tiktoken
         restart: unless-stopped
         extra_hosts:
           - "host.docker.internal:host-gateway"
@@ Expand All / @@ -143,6 +151,10 @@ docker compose up @@
     > You can get the official docker compose file from here: [docker-compose.yml](https://raw.githubusercontent.com/HKUDS/LightRAG/refs/heads/main/docker-compose.yml). For historical versions of LightRAG docker images, visit this link: [LightRAG Docker Images](https://github.com/HKUDS/LightRAG/pkgs/container/lightrag)
+    ### Offline Deployment
+     For offline or air-gapped environments, see the [Offline Deployment Guide](./../../docs/OfflineDeployment.md) for instructions on pre-installing all dependencies and cache files.
     ### Starting Multiple LightRAG Instances
     There are two ways to start multiple LightRAG instances. The first way is to configure a completely independent working environment for each instance. This requires creating a separate working directory for each instance and placing a dedicated `.env` configuration file in that directory. The server listening ports in the configuration files of different instances cannot be the same. Then, you can start the service by running `lightrag-server` in the working directory.
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Add Comprehensive Offline Deployment Solution #2194

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!