Skip to content

robbyczgw-cla/web-search-plus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

53 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Web Search Plus

Unified multi-provider web search with Intelligent Auto-Routing β€” uses multi-signal analysis to automatically select between Serper, Tavily, Querit, Exa, Perplexity (Sonar Pro), You.com, and SearXNG with confidence scoring.

ClawHub Version GitHub


🧠 Features (v2.9.0)

Intelligent Multi-Signal Routing β€” The skill uses sophisticated query analysis:

  • Intent Classification: Shopping vs Research vs Discovery vs RAG/Real-time vs Privacy
  • Linguistic Patterns: "how much" (price) vs "how does" (research) vs "privately" (privacy)
  • Entity Detection: Product+brand combos, URLs, domains
  • Complexity Analysis: Long queries favor research providers
  • Confidence Scoring: Know how reliable the routing decision is
python3 scripts/search.py -q "how much does iPhone 16 cost"     # β†’ Serper (68% confidence)
python3 scripts/search.py -q "how does quantum entanglement work"  # β†’ Tavily (86% HIGH)
python3 scripts/search.py -q "startups similar to Notion"       # β†’ Exa (76% HIGH)
python3 scripts/search.py -q "companies like stripe.com"        # β†’ Exa (100% HIGH - URL detected)
python3 scripts/search.py -q "summarize key points on AI"       # β†’ You.com (68% MEDIUM - RAG intent)
python3 scripts/search.py -q "search privately without tracking" # β†’ SearXNG (74% HIGH - privacy intent)

πŸ” When to Use Which Provider

Built-in Brave Search (OpenClaw default)

  • βœ… General web searches
  • βœ… Privacy-focused
  • βœ… Quick lookups
  • βœ… Default fallback

Serper (Google Results)

  • πŸ› Product specs, prices, shopping
  • πŸ“ Local businesses, places
  • 🎯 "Google it" - explicit Google results
  • πŸ“° Shopping/images needed
  • πŸ† Knowledge Graph data

Tavily (AI-Optimized Research)

  • πŸ“š Research questions, deep dives
  • πŸ”¬ Complex multi-part queries
  • πŸ“„ Need full page content (not just snippets)
  • πŸŽ“ Academic/technical research
  • πŸ”’ Domain filtering (trusted sources)

Querit (Multilingual AI Search)

  • 🌏 Multilingual AI search across 10+ languages
  • ⚑ Fast real-time answers with ~400ms latency
  • πŸ—ΊοΈ International / cross-language queries
  • πŸ“° Recency-aware results for current information
  • πŸ€– Good fit for AI workflows with clean metadata

Exa (Neural Semantic Search)

  • πŸ”— Find similar pages
  • 🏒 Company/startup discovery
  • πŸ“ Research papers
  • πŸ’» GitHub projects
  • πŸ“… Date-specific content

Perplexity (Sonar Pro via Kilo Gateway)

  • ⚑ Direct answers (great for β€œwho/what/define”)
  • 🧾 Cited, answer-first output
  • πŸ•’ Current events / β€œas of” questions
  • πŸ”‘ Auth via KILOCODE_API_KEY (routes to https://api.kilo.ai)

You.com (RAG/Real-time)

  • πŸ€– RAG applications (LLM-ready snippets)
  • πŸ“° Combined web + news (single API call)
  • ⚑ Real-time information (current events)
  • πŸ“‹ Summarization context ("What's the latest...")
  • πŸ”„ Live crawling (full page content on demand)

SearXNG (Privacy-First/Self-Hosted)

  • πŸ”’ Privacy-preserving search (no tracking)
  • 🌐 Multi-source aggregation (70+ engines)
  • πŸ’° $0 API cost (self-hosted)
  • 🎯 Diverse perspectives (results from multiple engines)
  • 🏠 Self-hosted environments (full control)

Table of Contents


Quick Start

Option A: Interactive Setup (Recommended)

# Run the setup wizard - it guides you through everything
python3 scripts/setup.py

The wizard explains each provider, collects your API keys, and creates config.json automatically.

Option B: Manual Setup

# 1. Set up at least one API key (or SearXNG instance)
export SERPER_API_KEY="your-key"   # https://serper.dev
export TAVILY_API_KEY="your-key"   # https://tavily.com
export QUERIT_API_KEY="your-key"   # https://querit.ai
export EXA_API_KEY="your-key"      # https://exa.ai
export KILOCODE_API_KEY="your-key" # enables Perplexity Sonar Pro via https://api.kilo.ai
export YOU_API_KEY="your-key"      # https://api.you.com
export SEARXNG_INSTANCE_URL="https://your-instance.example.com"  # Self-hosted

# 2. Run a search (auto-routed!)
python3 scripts/search.py -q "best laptop 2024"

Run a Search

# Auto-routed to best provider
python3 scripts/search.py -q "best laptop 2024"

# Or specify a provider explicitly
python3 scripts/search.py -p serper -q "iPhone 16 specs"
python3 scripts/search.py -p tavily -q "quantum computing explained" --depth advanced
python3 scripts/search.py -p querit -q "latest AI policy updates in Germany"
python3 scripts/search.py -p exa -q "AI startups 2024" --category company
python3 scripts/search.py -p perplexity -q "Who is the president of Austria?"

Smart Auto-Routing

How It Works

When you don't specify a provider, the skill analyzes your query and routes it to the best provider:

Query Contains Routes To Example
"price", "buy", "shop", "cost" Serper "iPhone 16 price"
"near me", "restaurant", "hotel" Serper "pizza near me"
"weather", "news", "latest" Serper "weather Berlin"
"how does", "explain", "what is" Tavily "how does TCP work"
"research", "study", "analyze" Tavily "climate research"
"tutorial", "guide", "learn" Tavily "python tutorial"
multilingual, current status, latest updates Querit "latest AI policy updates in Germany"
"similar to", "companies like" Exa "companies like Stripe"
"startup", "Series A" Exa "AI startups Series A"
"github", "research paper" Exa "LLM papers arxiv"
"private", "anonymous", "no tracking" SearXNG "search privately"
"multiple sources", "aggregate" SearXNG "results from all engines"

Examples

# These are all auto-routed to the optimal provider:
python3 scripts/search.py -q "MacBook Pro M3 price"           # β†’ Serper
python3 scripts/search.py -q "how does HTTPS work"            # β†’ Tavily
python3 scripts/search.py -q "latest AI policy updates in Germany" # β†’ Querit
python3 scripts/search.py -q "startups like Notion"           # β†’ Exa
python3 scripts/search.py -q "best sushi restaurant near me"  # β†’ Serper
python3 scripts/search.py -q "explain attention mechanism"    # β†’ Tavily
python3 scripts/search.py -q "alternatives to Figma"          # β†’ Exa
python3 scripts/search.py -q "search privately without tracking" # β†’ SearXNG

Result Caching (introduced in v2.7.x)

Search results are automatically cached for 1 hour to save API costs:

# First request: fetches from API ($)
python3 scripts/search.py -q "AI startups 2024"

# Second request: uses cache (FREE!)
python3 scripts/search.py -q "AI startups 2024"
# Output includes: "cached": true

# Bypass cache (force fresh results)
python3 scripts/search.py -q "AI startups 2024" --no-cache

# View cache stats
python3 scripts/search.py --cache-stats

# Clear all cached results
python3 scripts/search.py --clear-cache

# Custom TTL (in seconds, default: 3600 = 1 hour)
python3 scripts/search.py -q "query" --cache-ttl 7200

Cache location: .cache/ in skill directory (override with WSP_CACHE_DIR environment variable)

Debug Auto-Routing

See exactly why a provider was selected:

python3 scripts/search.py --explain-routing -q "best laptop to buy"

Output:

{
  "query": "best laptop to buy",
  "selected_provider": "serper",
  "reason": "matched_keywords (score=2)",
  "matched_keywords": ["buy", "best"],
  "available_providers": ["serper", "tavily", "exa"]
}

Routing Info in Results

Every search result includes routing information:

{
  "provider": "serper",
  "query": "iPhone 16 price",
  "results": [...],
  "routing": {
    "auto_routed": true,
    "selected_provider": "serper",
    "reason": "matched_keywords (score=1)",
    "matched_keywords": ["price"]
  }
}

Configuration Guide

Environment Variables

Create a .env file or set these in your shell:

# Required: Set at least one
export SERPER_API_KEY="your-serper-key"
export TAVILY_API_KEY="your-tavily-key"
export EXA_API_KEY="your-exa-key"

Config File (config.json)

The config.json file lets you customize auto-routing and provider defaults:

{
  "defaults": {
    "provider": "serper",
    "max_results": 5
  },
  
  "auto_routing": {
    "enabled": true,
    "fallback_provider": "serper",
    "provider_priority": ["serper", "tavily", "exa"],
    "disabled_providers": [],
    "keyword_mappings": {
      "serper": ["price", "buy", "shop", "cost", "deal", "near me", "weather"],
      "tavily": ["how does", "explain", "research", "what is", "tutorial"],
      "exa": ["similar to", "companies like", "alternatives", "startup", "github"]
    }
  },
  
  "serper": {
    "country": "us",
    "language": "en"
  },
  
  "tavily": {
    "depth": "basic",
    "topic": "general"
  },
  
  "exa": {
    "type": "neural"
  }
}

Configuration Examples

Example 1: Disable Exa (Only Use Serper + Tavily)

{
  "auto_routing": {
    "disabled_providers": ["exa"]
  }
}

Example 2: Make Tavily the Default

{
  "auto_routing": {
    "fallback_provider": "tavily"
  }
}

Example 3: Add Custom Keywords

{
  "auto_routing": {
    "keyword_mappings": {
      "serper": [
        "price", "buy", "shop", "amazon", "ebay", "walmart",
        "deal", "discount", "coupon", "sale", "cheap"
      ],
      "tavily": [
        "how does", "explain", "research", "what is",
        "coursera", "udemy", "learn", "course", "certification"
      ],
      "exa": [
        "similar to", "companies like", "competitors",
        "YC company", "funded startup", "Series A", "Series B"
      ]
    }
  }
}

Example 4: German Locale for Serper

{
  "serper": {
    "country": "de",
    "language": "de"
  }
}

Example 5: Disable Auto-Routing

{
  "auto_routing": {
    "enabled": false
  },
  "defaults": {
    "provider": "serper"
  }
}

Example 6: Research-Heavy Config

{
  "auto_routing": {
    "fallback_provider": "tavily",
    "provider_priority": ["tavily", "serper", "exa"]
  },
  "tavily": {
    "depth": "advanced",
    "include_raw_content": true
  }
}

Provider Deep Dives

Serper (Google Search API)

What it is: Direct access to Google Search results via API β€” the same results you'd see on google.com.

Strengths

Strength Description
🎯 Accuracy Google's search quality, knowledge graph, featured snippets
πŸ›’ Shopping Product prices, reviews, shopping results
πŸ“ Local Business listings, maps, places
πŸ“° News Real-time news with Google News integration
πŸ–Ό Images Google Images search
⚑ Speed Fastest response times (~200-400ms)

Best Use Cases

  • βœ… Product specifications and comparisons
  • βœ… Shopping and price lookups
  • βœ… Local business searches ("restaurants near me")
  • βœ… Quick factual queries (weather, conversions, definitions)
  • βœ… News headlines and current events
  • βœ… Image searches
  • βœ… When you need "what Google shows"

Getting Your API Key

  1. Go to serper.dev
  2. Sign up with email or Google
  3. Copy your API key from the dashboard
  4. Set SERPER_API_KEY environment variable

Tavily (Research Search)

What it is: AI-optimized search engine built for research and RAG applications β€” returns synthesized answers plus full content.

Strengths

Strength Description
πŸ“š Research Quality Optimized for comprehensive, accurate research
πŸ’¬ AI Answers Returns synthesized answers, not just links
πŸ“„ Full Content Can return complete page content (raw_content)
🎯 Domain Filtering Include/exclude specific domains
πŸ”¬ Deep Mode Advanced search for thorough research
πŸ“° Topic Modes Specialized for general vs news content

Best Use Cases

  • βœ… Research questions requiring synthesized answers
  • βœ… Academic or technical deep dives
  • βœ… When you need actual page content (not just snippets)
  • βœ… Multi-source information comparison
  • βœ… Domain-specific research (filter to authoritative sources)
  • βœ… News research with context
  • βœ… RAG/LLM applications

Getting Your API Key

  1. Go to tavily.com
  2. Sign up and verify email
  3. Navigate to API Keys section
  4. Generate and copy your key
  5. Set TAVILY_API_KEY environment variable

Exa (Neural Search)

What it is: Neural/semantic search engine that understands meaning, not just keywords β€” finds conceptually similar content.

Strengths

Strength Description
🧠 Semantic Understanding Finds results by meaning, not keywords
πŸ”— Similar Pages Find pages similar to a reference URL
🏒 Company Discovery Excellent for finding startups, companies
πŸ“‘ Category Filters Filter by type (company, paper, tweet, etc.)
πŸ“… Date Filtering Precise date range searches
πŸŽ“ Academic Great for research papers and technical content

Best Use Cases

  • βœ… Conceptual queries ("companies building X")
  • βœ… Finding similar companies or pages
  • βœ… Startup and company discovery
  • βœ… Research paper discovery
  • βœ… Finding GitHub projects
  • βœ… Date-filtered searches for recent content
  • βœ… When keyword matching fails

Getting Your API Key

  1. Go to exa.ai
  2. Sign up with email or Google
  3. Navigate to API section in dashboard
  4. Copy your API key
  5. Set EXA_API_KEY environment variable

SearXNG (Privacy-First Meta-Search)

What it is: Open-source, self-hosted meta-search engine that aggregates results from 70+ search engines without tracking.

Strengths

Strength Description
πŸ”’ Privacy-First No tracking, no profiling, no data collection
🌐 Multi-Engine Aggregates Google, Bing, DuckDuckGo, and 70+ more
πŸ’° Free $0 API cost (self-hosted, unlimited queries)
🎯 Diverse Results Get perspectives from multiple search engines
βš™ Customizable Choose which engines to use, SafeSearch, language
🏠 Self-Hosted Full control over your search infrastructure

Best Use Cases

  • βœ… Privacy-sensitive searches (no tracking)
  • βœ… When you want diverse results from multiple engines
  • βœ… Budget-conscious (no API fees)
  • βœ… Self-hosted/air-gapped environments
  • βœ… Fallback when paid APIs are rate-limited
  • βœ… When "aggregate everything" is the goal

Setting Up Your Instance

# Docker (recommended, 5 minutes)
docker run -d -p 8080:8080 searxng/searxng

# Enable JSON API in settings.yml:
# search:
#   formats: [html, json]
  1. See docs.searxng.org
  2. Deploy via Docker, pip, or your preferred method
  3. Enable JSON format in settings.yml
  4. Set SEARXNG_INSTANCE_URL environment variable

Usage Examples

Auto-Routed Searches (Recommended)

# Just search β€” the skill picks the best provider
python3 scripts/search.py -q "Tesla Model 3 price"
python3 scripts/search.py -q "how do neural networks learn"
python3 scripts/search.py -q "YC startups like Stripe"
python3 scripts/search.py -q "search privately without tracking"

Serper Options

# Different search types
python3 scripts/search.py -p serper -q "gaming monitor" --type shopping
python3 scripts/search.py -p serper -q "coffee shop" --type places
python3 scripts/search.py -p serper -q "AI news" --type news

# With time filter
python3 scripts/search.py -p serper -q "OpenAI news" --time-range day

# Include images
python3 scripts/search.py -p serper -q "iPhone 16 Pro" --images

# Different locale
python3 scripts/search.py -p serper -q "Wetter Wien" --country at --language de

Tavily Options

# Deep research mode
python3 scripts/search.py -p tavily -q "quantum computing applications" --depth advanced

# With full page content
python3 scripts/search.py -p tavily -q "transformer architecture" --raw-content

# Domain filtering
python3 scripts/search.py -p tavily -q "AI research" --include-domains arxiv.org nature.com

Exa Options

# Category filtering
python3 scripts/search.py -p exa -q "AI startups Series A" --category company
python3 scripts/search.py -p exa -q "attention mechanism" --category "research paper"

# Date filtering
python3 scripts/search.py -p exa -q "YC companies" --start-date 2024-01-01

# Find similar pages
python3 scripts/search.py -p exa --similar-url "https://stripe.com" --category company

SearXNG Options

# Basic search
python3 scripts/search.py -p searxng -q "linux distros"

# Specific engines only
python3 scripts/search.py -p searxng -q "AI news" --engines "google,bing,duckduckgo"

# SafeSearch (0=off, 1=moderate, 2=strict)
python3 scripts/search.py -p searxng -q "privacy tools" --searxng-safesearch 2

# With time filter
python3 scripts/search.py -p searxng -q "open source projects" --time-range week

# Custom instance URL
python3 scripts/search.py -p searxng -q "test" --searxng-url "http://localhost:8080"

Workflow Examples

πŸ›’ Product Research Workflow

# Step 1: Get product specs (auto-routed to Serper)
python3 scripts/search.py -q "MacBook Pro M3 Max specs"

# Step 2: Check prices (auto-routed to Serper)
python3 scripts/search.py -q "MacBook Pro M3 Max price comparison"

# Step 3: In-depth reviews (auto-routed to Tavily)
python3 scripts/search.py -q "detailed MacBook Pro M3 Max review"

πŸ“š Academic Research Workflow

# Step 1: Understand the topic (auto-routed to Tavily)
python3 scripts/search.py -q "explain transformer architecture in deep learning"

# Step 2: Find recent papers (Exa)
python3 scripts/search.py -p exa -q "transformer improvements" --category "research paper" --start-date 2024-01-01

# Step 3: Find implementations (Exa)
python3 scripts/search.py -p exa -q "transformer implementation" --category github

🏒 Competitive Analysis Workflow

# Step 1: Find competitors (auto-routed to Exa)
python3 scripts/search.py -q "companies like Notion"

# Step 2: Find similar products (Exa)
python3 scripts/search.py -p exa --similar-url "https://notion.so" --category company

# Step 3: Deep dive comparison (Tavily)
python3 scripts/search.py -p tavily -q "Notion vs Coda comparison" --depth advanced

Optimization Tips

Cost Optimization

Tip Savings
Use SearXNG for routine queries $0 API cost
Use auto-routing (defaults to Serper, cheapest paid) Best value
Use Tavily basic before advanced ~50% cost reduction
Set appropriate max_results Linear cost savings
Use Exa only for semantic queries Avoid waste

Performance Optimization

Tip Impact
Serper is fastest (~200ms) Use for time-sensitive queries
Tavily basic faster than advanced ~2x faster
Lower max_results = faster response Linear improvement

FAQ & Troubleshooting

General Questions

Q: Do I need API keys for all three providers?

No. You only need keys for providers you want to use. Auto-routing skips providers without keys.

Q: Which provider should I start with?

Serper β€” it's the fastest, cheapest, and has the largest free tier (2,500 queries).

Q: Can I use multiple providers in one workflow?

Yes! That's the recommended approach. See Workflow Examples.

Q: How do I reduce API costs?

Use auto-routing (defaults to cheapest), start with lower max_results, use Tavily basic before advanced.

Auto-Routing Questions

Q: Why did my query go to the wrong provider?

Use --explain-routing to debug. Add custom keywords to config.json if needed.

Q: Can I add my own keywords?

Yes! Edit config.json β†’ auto_routing.keyword_mappings.

Q: How does keyword scoring work?

Multi-word phrases get higher weights. "companies like" (2 words) scores higher than "like" (1 word).

Q: What if no keywords match?

Uses the fallback provider (default: Serper).

Q: Can I force a specific provider?

Yes, use -p serper, -p tavily, or -p exa.

Troubleshooting

Error: "Missing API key"

# Check if key is set
echo $SERPER_API_KEY

# Set it
export SERPER_API_KEY="your-key"

Error: "API Error (401)"

Your API key is invalid or expired. Generate a new one.

Error: "API Error (429)"

Rate limited. Wait and retry, or upgrade your plan.

Empty results?

Try a different provider, broaden your query, or remove restrictive filters.

Slow responses?

Reduce max_results, use Tavily basic, or use Serper (fastest).


API Reference

Output Format

All providers return unified JSON:

{
  "provider": "serper|tavily|exa",
  "query": "original search query",
  "results": [
    {
      "title": "Page Title",
      "url": "https://example.com/page",
      "snippet": "Content excerpt...",
      "score": 0.95,
      "date": "2024-01-15",
      "raw_content": "Full page content (Tavily only)"
    }
  ],
  "images": ["url1", "url2"],
  "answer": "Synthesized answer",
  "knowledge_graph": { },
  "routing": {
    "auto_routed": true,
    "selected_provider": "serper",
    "reason": "matched_keywords (score=1)",
    "matched_keywords": ["price"]
  }
}

CLI Options Reference

Option Providers Description
-q, --query All Search query
-p, --provider All Provider: auto, serper, tavily, querit, exa, perplexity, you, searxng
-n, --max-results All Max results (default: 5)
--auto All Force auto-routing
--explain-routing All Debug auto-routing
--images Serper, Tavily Include images
--country Serper, You Country code (default: us)
--language Serper, SearXNG Language code (default: en)
--type Serper search/news/images/videos/places/shopping
--time-range Serper, SearXNG hour/day/week/month/year
--depth Tavily basic/advanced
--topic Tavily general/news
--raw-content Tavily Include full page content
--querit-base-url Querit Override Querit API base URL
--querit-base-path Querit Override Querit API path
--exa-type Exa neural/keyword
--category Exa company/research paper/news/pdf/github/tweet
--start-date Exa Start date (YYYY-MM-DD)
--end-date Exa End date (YYYY-MM-DD)
--similar-url Exa Find similar pages
--searxng-url SearXNG Instance URL
--searxng-safesearch SearXNG 0=off, 1=moderate, 2=strict
--engines SearXNG Specific engines (google,bing,duckduckgo)
--categories SearXNG Search categories (general,images,news)
--include-domains Tavily, Exa Only these domains
--exclude-domains Tavily, Exa Exclude these domains
--compact All Compact JSON output

License

MIT


Links

  • Serper β€” Google Search API
  • Tavily β€” AI Research Search
  • Exa β€” Neural Search
  • ClawHub β€” OpenClaw Skills

About

Unified search skill for Clawdbot supporting Serper (Google), Tavily (AI-optimized), and Exa (neural semantic) search providers

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors