AsyncFuncAI · kuarcis · Jun 5, 2025 · Jun 5, 2025 · Jun 5, 2025 · Jun 5, 2025
diff --git a/README.md b/README.md
@@ -26,6 +26,7 @@
 - **Ask Feature**: Chat with your repository using RAG-powered AI to get accurate answers
 - **DeepResearch**: Multi-turn research process that thoroughly investigates complex topics
 - **Multiple Model Providers**: Support for Google Gemini, OpenAI, OpenRouter, and local Ollama models
+- **Flexible Embeddings**: Choose between OpenAI, Google AI, or local Ollama embeddings for optimal performance
 
 ## 🚀 Quick Start (Super Easy!)
 
@@ -39,6 +40,8 @@ cd deepwiki-open
 # Create a .env file with your API keys
 echo "GOOGLE_API_KEY=your_google_api_key" > .env
 echo "OPENAI_API_KEY=your_openai_api_key" >> .env
+# Optional: Use Google AI embeddings instead of OpenAI (recommended if using Google models)
+echo "DEEPWIKI_EMBEDDER_TYPE=google" >> .env
 # Optional: Add OpenRouter API key if you want to use OpenRouter models
 echo "OPENROUTER_API_KEY=your_openrouter_api_key" >> .env
 # Optional: Add Ollama host if not local. defaults to http://localhost:11434
@@ -67,6 +70,8 @@ Create a `.env` file in the project root with these keys:
 ```
 GOOGLE_API_KEY=your_google_api_key
 OPENAI_API_KEY=your_openai_api_key
+# Optional: Use Google AI embeddings (recommended if using Google models)
+DEEPWIKI_EMBEDDER_TYPE=google
 # Optional: Add this if you want to use OpenRouter models
 OPENROUTER_API_KEY=your_openrouter_api_key
 # Optional: Add this if you want to use Azure OpenAI models
@@ -269,6 +274,89 @@ If you want to use embedding models compatible with the OpenAI API (such as Alib
 
 This allows you to seamlessly switch to any OpenAI-compatible embedding service without code changes.
 
+## 🧠 Using Google AI Embeddings
+
+DeepWiki now supports Google AI's latest embedding models as an alternative to OpenAI embeddings. This provides better integration when you're already using Google Gemini models for text generation.
+
+### Features
+
+- **Latest Model**: Uses Google's `text-embedding-004` model
+- **Same API Key**: Uses your existing `GOOGLE_API_KEY` (no additional setup required)
+- **Better Integration**: Optimized for use with Google Gemini text generation models
+- **Task-Specific**: Supports semantic similarity, retrieval, and classification tasks
+- **Batch Processing**: Efficient processing of multiple texts
+
+### How to Enable Google AI Embeddings
+
+**Option 1: Environment Variable (Recommended)**
+
+Set the embedder type in your `.env` file:
+
+```bash
+# Your existing Google API key
+GOOGLE_API_KEY=your_google_api_key
+
+# Enable Google AI embeddings
+DEEPWIKI_EMBEDDER_TYPE=google
+```
+
+**Option 2: Docker Environment**
+
+```bash
+docker run -p 8001:8001 -p 3000:3000 \
+  -e GOOGLE_API_KEY=your_google_api_key \
+  -e DEEPWIKI_EMBEDDER_TYPE=google \
+  -v ~/.adalflow:/root/.adalflow \
+  ghcr.io/asyncfuncai/deepwiki-open:latest
+```
+
+**Option 3: Docker Compose**
+
+Add to your `.env` file:
+
+```bash
+GOOGLE_API_KEY=your_google_api_key
+DEEPWIKI_EMBEDDER_TYPE=google
+```
+
+Then run:
+
+```bash
+docker-compose up
+```
+
+### Available Embedder Types
+
+| Type | Description | API Key Required | Notes |
+|------|-------------|------------------|-------|
+| `openai` | OpenAI embeddings (default) | `OPENAI_API_KEY` | Uses `text-embedding-3-small` model |
+| `google` | Google AI embeddings | `GOOGLE_API_KEY` | Uses `text-embedding-004` model |
+| `ollama` | Local Ollama embeddings | None | Requires local Ollama installation |
+
+### Why Use Google AI Embeddings?
+
+- **Consistency**: If you're using Google Gemini for text generation, using Google embeddings provides better semantic consistency
+- **Performance**: Google's latest embedding model offers excellent performance for retrieval tasks
+- **Cost**: Competitive pricing compared to OpenAI
+- **No Additional Setup**: Uses the same API key as your text generation models
+
+### Switching Between Embedders
+
+You can easily switch between different embedding providers:
+
+```bash
+# Use OpenAI embeddings (default)
+export DEEPWIKI_EMBEDDER_TYPE=openai
+
+# Use Google AI embeddings
+export DEEPWIKI_EMBEDDER_TYPE=google
+
+# Use local Ollama embeddings
+export DEEPWIKI_EMBEDDER_TYPE=ollama
+```
+
+**Note**: When switching embedders, you may need to regenerate your repository embeddings as different models produce different vector spaces.
+
 ### Logging
 
 DeepWiki uses Python's built-in `logging` module for diagnostic output. You can configure the verbosity and log file destination via environment variables:
@@ -311,19 +399,25 @@ docker-compose up
 
 | Variable             | Description                                                  | Required | Note                                                                                                     |
 |----------------------|--------------------------------------------------------------|----------|----------------------------------------------------------------------------------------------------------|
-| `GOOGLE_API_KEY`     | Google Gemini API key for AI generation                      | No | Required only if you want to use Google Gemini models                                                    
-| `OPENAI_API_KEY`     | OpenAI API key for embeddings                                | Yes | Note: This is required even if you're not using OpenAI models, as it's used for embeddings.              |
+| `GOOGLE_API_KEY`     | Google Gemini API key for AI generation and embeddings      | No | Required for Google Gemini models and Google AI embeddings                                               
+| `OPENAI_API_KEY`     | OpenAI API key for embeddings and models                     | Conditional | Required if using OpenAI embeddings or models                                                            |
 | `OPENROUTER_API_KEY` | OpenRouter API key for alternative models                    | No | Required only if you want to use OpenRouter models                                                       |
 | `AZURE_OPENAI_API_KEY` | Azure OpenAI API key                    | No | Required only if you want to use Azure OpenAI models                                                       |
 | `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint                    | No | Required only if you want to use Azure OpenAI models                                                       |
 | `AZURE_OPENAI_VERSION` | Azure OpenAI version                     | No | Required only if you want to use Azure OpenAI models                                                       |
 | `OLLAMA_HOST`        | Ollama Host (default: http://localhost:11434)                | No | Required only if you want to use external Ollama server                                                  |
+| `DEEPWIKI_EMBEDDER_TYPE` | Embedder type: `openai`, `google`, or `ollama` (default: `openai`) | No | Controls which embedding provider to use                                                              |
 | `PORT`               | Port for the API server (default: 8001)                      | No | If you host API and frontend on the same machine, make sure change port of `SERVER_BASE_URL` accordingly |
 | `SERVER_BASE_URL`    | Base URL for the API server (default: http://localhost:8001) | No |
 | `DEEPWIKI_AUTH_MODE` | Set to `true` or `1` to enable authorization mode. | No | Defaults to `false`. If enabled, `DEEPWIKI_AUTH_CODE` is required. |
 | `DEEPWIKI_AUTH_CODE` | The secret code required for wiki generation when `DEEPWIKI_AUTH_MODE` is enabled. | No | Only used if `DEEPWIKI_AUTH_MODE` is `true` or `1`. |
 
-If you're not using ollama mode, you need to configure an OpenAI API key for embeddings. Other API keys are only required when configuring and using models from the corresponding providers.
+**API Key Requirements:**
+- If using `DEEPWIKI_EMBEDDER_TYPE=openai` (default): `OPENAI_API_KEY` is required
+- If using `DEEPWIKI_EMBEDDER_TYPE=google`: `GOOGLE_API_KEY` is required  
+- If using `DEEPWIKI_EMBEDDER_TYPE=ollama`: No API key required (local processing)
+
+Other API keys are only required when configuring and using models from the corresponding providers.
 
 ## Authorization Mode
 

diff --git a/api/api.py b/api/api.py
@@ -507,7 +507,7 @@ async def delete_wiki_cache(
 
     if WIKI_AUTH_MODE:
         logger.info("check the authorization code")
-        if WIKI_AUTH_CODE != authorization_code:
+        if not authorization_code or WIKI_AUTH_CODE != authorization_code:
             raise HTTPException(status_code=401, detail="Authorization code is invalid")
 
     logger.info(f"Attempting to delete wiki cache for {owner}/{repo} ({repo_type}), lang: {language}")

diff --git a/api/config.py b/api/config.py
@@ -10,6 +10,7 @@
 from api.openai_client import OpenAIClient
 from api.openrouter_client import OpenRouterClient
 from api.bedrock_client import BedrockClient
+from api.google_embedder_client import GoogleEmbedderClient
 from api.azureai_client import AzureAIClient
 from adalflow import GoogleGenAIClient, OllamaClient
 
@@ -43,12 +44,16 @@
 WIKI_AUTH_MODE = raw_auth_mode.lower() in ['true', '1', 't']
 WIKI_AUTH_CODE = os.environ.get('DEEPWIKI_AUTH_CODE', '')
 
+# Embedder settings
+EMBEDDER_TYPE = os.environ.get('DEEPWIKI_EMBEDDER_TYPE', 'openai').lower()
+
 # Get configuration directory from environment variable, or use default if not set
 CONFIG_DIR = os.environ.get('DEEPWIKI_CONFIG_DIR', None)
 
 # Client class mapping
 CLIENT_CLASSES = {
     "GoogleGenAIClient": GoogleGenAIClient,
+    "GoogleEmbedderClient": GoogleEmbedderClient,
     "OpenAIClient": OpenAIClient,
     "OpenRouterClient": OpenRouterClient,
     "OllamaClient": OllamaClient,
@@ -141,7 +146,7 @@ def load_embedder_config():
     embedder_config = load_json_config("embedder.json")
 
     # Process client classes
-    for key in ["embedder", "embedder_ollama"]:
+    for key in ["embedder", "embedder_ollama", "embedder_google"]:
         if key in embedder_config and "client_class" in embedder_config[key]:
             class_name = embedder_config[key]["client_class"]
             if class_name in CLIENT_CLASSES:
@@ -151,12 +156,18 @@ def load_embedder_config():
 
 def get_embedder_config():
     """
-    Get the current embedder configuration.
+    Get the current embedder configuration based on DEEPWIKI_EMBEDDER_TYPE.
 
     Returns:
         dict: The embedder configuration with model_client resolved
     """
-    return configs.get("embedder", {})
+    embedder_type = EMBEDDER_TYPE
+    if embedder_type == 'google' and 'embedder_google' in configs:
+        return configs.get("embedder_google", {})
+    elif embedder_type == 'ollama' and 'embedder_ollama' in configs:
+        return configs.get("embedder_ollama", {})
+    else:
+        return configs.get("embedder", {})
 
 def is_ollama_embedder():
     """
@@ -178,6 +189,40 @@ def is_ollama_embedder():
     client_class = embedder_config.get("client_class", "")
     return client_class == "OllamaClient"
 
+def is_google_embedder():
+    """
+    Check if the current embedder configuration uses GoogleEmbedderClient.
+
+    Returns:
+        bool: True if using GoogleEmbedderClient, False otherwise
+    """
+    embedder_config = get_embedder_config()
+    if not embedder_config:
+        return False
+
+    # Check if model_client is GoogleEmbedderClient
+    model_client = embedder_config.get("model_client")
+    if model_client:
+        return model_client.__name__ == "GoogleEmbedderClient"
+
+    # Fallback: check client_class string
+    client_class = embedder_config.get("client_class", "")
+    return client_class == "GoogleEmbedderClient"
+
+def get_embedder_type():
+    """
+    Get the current embedder type based on configuration.
+
+    Returns:
+        str: 'ollama', 'google', or 'openai' (default)
+    """
+    if is_ollama_embedder():
+        return 'ollama'
+    elif is_google_embedder():
+        return 'google'
+    else:
+        return 'openai'
+
 # Load repository and file filters configuration
 def load_repo_config():
     return load_json_config("repo.json")
@@ -265,7 +310,7 @@ def load_lang_config():
 
 # Update embedder configuration
 if embedder_config:
-    for key in ["embedder", "embedder_ollama", "retriever", "text_splitter"]:
+    for key in ["embedder", "embedder_ollama", "embedder_google", "retriever", "text_splitter"]:
         if key in embedder_config:
             configs[key] = embedder_config[key]
 

diff --git a/api/config/embedder.json b/api/config/embedder.json
@@ -8,6 +8,20 @@
       "encoding_format": "float"
     }
   },
+  "embedder_ollama": {
+    "client_class": "OllamaClient",
+    "model_kwargs": {
+      "model": "nomic-embed-text"
+    }
+  },
+  "embedder_google": {
+    "client_class": "GoogleEmbedderClient",
+    "batch_size": 100,
+    "model_kwargs": {
+      "model": "text-embedding-004",
+      "task_type": "SEMANTIC_SIMILARITY"
+    }
+  },
   "retriever": {
     "top_k": 20
   },