om-ai-lab
diff --git a/‎README.md‎
Lines changed: 2 additions & 16 deletions b/‎README.md‎
Lines changed: 2 additions & 16 deletions
diff --git a/‎docs/images/reflexion.png‎
Lines changed: 3 additions & 0 deletions b/‎docs/images/reflexion.png‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/images/video_understanding_gradio.png‎
-85.6 KB b/‎docs/images/video_understanding_gradio.png‎
-85.6 KB
diff --git a/‎docs/tutorials/run_agent_full.md‎
Lines changed: 73 additions & 0 deletions b/‎docs/tutorials/run_agent_full.md‎
Lines changed: 73 additions & 0 deletions
diff --git a/‎examples/PoT/eval_aqua_zeroshot.py‎
Lines changed: 7 additions & 3 deletions b/‎examples/PoT/eval_aqua_zeroshot.py‎
Lines changed: 7 additions & 3 deletions
diff --git a/‎examples/PoT/eval_gsm8k_fewshot.py‎
Lines changed: 6 additions & 2 deletions b/‎examples/PoT/eval_gsm8k_fewshot.py‎
Lines changed: 6 additions & 2 deletions
diff --git a/‎examples/PoT/run_cli.py‎
Lines changed: 5 additions & 1 deletion b/‎examples/PoT/run_cli.py‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎examples/PoT/run_programmatic.py‎
Lines changed: 7 additions & 2 deletions b/‎examples/PoT/run_programmatic.py‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎examples/ToT/README.md‎
Lines changed: 104 additions & 0 deletions b/‎examples/ToT/README.md‎
Lines changed: 104 additions & 0 deletions
diff --git a/‎examples/ToT/__init__.py‎ b/‎examples/ToT/__init__.py‎
@@ -28,6 +28,7 @@ OmAgent is python library for building multimodal language agents with ease. We
  - Native multimodal interaction support include VLM models, real-time API, computer vision models, mobile connection and etc.   
  - A suite of state-of-the-art unimodal and multimodal agent algorithms that goes beyond simple LLM reasoning, e.g. ReAct, CoT, SC-Cot etc.   
  - Supports local deployment of models. You can deploy your own models locally by using Ollama[Ollama](./docs/concepts/models/Ollama.md) or [LocalAI](./examples/video_understanding/docs/local-ai.md).
+ - Fully distributed architecture, supports custom scaling. Also supports Lite mode, eliminating the need for middleware deployment.
 
 
 ## 🛠️ How To Install
@@ -41,11 +42,6 @@ OmAgent is python library for building multimodal language agents with ease. We
   ```bash
   pip install -e omagent-core
   ```
-- Set Up Conductor Server (Docker-Compose) Docker-compose includes conductor-server, Elasticsearch, and Redis.
-  ```bash
-  cd docker
-  docker-compose up -d
-  ```
 
 ## 🚀 Quick Start 
 ### Configuration
@@ -57,9 +53,7 @@ The container.yaml file is a configuration file that manages dependencies and se
    cd examples/step1_simpleVQA
    python compile_container.py
    ```
-   This will create a container.yaml file with default settings under `examples/step1_simpleVQA`.
-
-
+   This will create a container.yaml file with default settings under `examples/step1_simpleVQA`. For more information about the container.yaml configuration, please refer to the [container module](./docs/concepts/container.md)
 
 2. Configure your LLM settings in `configs/llms/gpt.yml`:
 
@@ -70,14 +64,6 @@ The container.yaml file is a configuration file that manages dependencies and se
    ```
    You can use a locally deployed Ollama to call your own language model. The tutorial is [here](docs/concepts/models/Ollama.md).
 
-3. Update settings in the generated `container.yaml`:
-      - Configure Redis connection settings, including host, port, credentials, and both `redis_stream_client` and `redis_stm_client` sections.
-   - Update the Conductor server URL under conductor_config section
-   - Adjust any other component settings as needed
-
-
-For more information about the container.yaml configuration, please refer to the [container module](./docs/concepts/container.md)
-
 ### Run the demo
 
 1. Run the simple VQA demo with webpage GUI:
 
@@ -0,0 +1,73 @@
+# Run the full version of OmAgent
+OmAgent now supports free switching between Full and Lite versions, the differences between the two versions are as follows:
+- The Full version has better concurrency performance, can view workflows as well as run logs with the help of the orchestration system GUI, and supports more device types (e.g. smartphone apps). Note that running the Full version requires a Docker deployment middleware dependencies.
+- The Lite version is suitable for developers who want to get started faster. It eliminates the steps of installing and deploying Docker, and is suitable for rapid prototyping and debugging.
+
+## Instruction of how to use Full version
+### 🛠️ How To Install
+- python >= 3.10
+- Install omagent_core  
+  Use pip to install omagent_core latest release.
+  ```bash
+  pip install omagent-core
+  ```
+  Or install the latest version from the source code like below.
+  ```bash
+  pip install -e omagent-core
+  ```
+- Set Up Conductor Server (Docker-Compose) Docker-compose includes conductor-server, Elasticsearch, and Redis.
+  ```bash
+  cd docker
+  docker-compose up -d
+  ```
+
+### 🚀 Quick Start 
+#### Configuration
+
+The container.yaml file is a configuration file that manages dependencies and settings for different components of the system. To set up your configuration:
+
+1. Generate the container.yaml file:
+   ```bash
+   cd examples/step1_simpleVQA
+   python compile_container.py
+   ```
+   This will create a container.yaml file with default settings under `examples/step1_simpleVQA`.
+
+
+
+2. Configure your LLM settings in `configs/llms/gpt.yml`:
+
+   - Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file
+   ```bash
+   export custom_openai_key="your_openai_api_key"
+   export custom_openai_endpoint="your_openai_endpoint"
+   ```
+   You can use a locally deployed Ollama to call your own language model. The tutorial is [here](docs/concepts/models/Ollama.md).
+
+3. Update settings in the generated `container.yaml`:
+      - Configure Redis connection settings, including host, port, credentials, and both `redis_stream_client` and `redis_stm_client` sections.
+   - Update the Conductor server URL under conductor_config section
+   - Adjust any other component settings as needed
+
+
+For more information about the container.yaml configuration, please refer to the [container module](./docs/concepts/container.md)
+
+#### Run the demo
+
+1. Set the OmAgent to Full version by setting environment variable `OMAGENT_MODE`
+   ```bash
+   export OMAGENT_MODE=full
+   ```
+   or
+   ```pyhton
+   os.environ["OMAGENT_MODE"] = "full"
+   ```
+2. Run the simple VQA demo with webpage GUI:
+
+   For WebpageClient usage: Input and output are in the webpage
+   ```bash
+   cd examples/step1_simpleVQA
+   python run_webpage.py
+   ```
+   Open the webpage at `http://127.0.0.1:7860`, you will see the following interface:  
+   <img src="docs/images/simpleVQA_webpage.png" width="400"/>
@@ -1,10 +1,13 @@
 # Import required modules and components
+import os
+os.environ["OMAGENT_MODE"] = "lite"
+
 from omagent_core.utils.container import container
 from omagent_core.engine.workflow.conductor_workflow import ConductorWorkflow
 from omagent_core.advanced_components.workflow.pot.workflow import PoTWorkflow
 from pathlib import Path
 from omagent_core.utils.registry import registry
-from omagent_core.clients.devices.programmatic.client import ProgrammaticClient
+from omagent_core.clients.devices.programmatic import ProgrammaticClient
 from omagent_core.utils.logger import logging
 import argparse
 import json
@@ -50,6 +53,7 @@ def main():
     # Setup logging and paths
     logging.init_logger("omagent", "omagent", level="INFO")
     CURRENT_PATH = Path(__file__).parents[0]
+    container.register_stm("SharedMemSTM")
 
     # Initialize agent modules and configuration
     registry.import_module(project_path=CURRENT_PATH.joinpath('agent'))
@@ -84,7 +88,7 @@ def main():
     for r, w in zip(res, workflow_input_list):
         output_json.append({
             "id": w['id'],
-            "question": w['query'],
+            "question": w['query']+'\nOptions: '+str(question['options']),
             "last_output": r['last_output'],
             "prompt_tokens": r['prompt_tokens'],
             "completion_tokens": r['completion_tokens']
@@ -101,7 +105,7 @@ def main():
     # Save results to output file
     if not os.path.exists(args.output_path):
         os.makedirs(args.output_path)
-    with open(f'{args.output_path}/{dataset_name}_{model_id}_POT_output.json', 'w') as f:
+    with open(f'{args.output_path}/{dataset_name}_{model_id.replace("/","-")}_POT_output.json', 'w') as f:
         json.dump(final_output, f, indent=4)
 
     # Cleanup
 
@@ -1,10 +1,13 @@
+import os
+os.environ["OMAGENT_MODE"] = "lite"
+
 # Import required modules and components
 from omagent_core.utils.container import container
 from omagent_core.engine.workflow.conductor_workflow import ConductorWorkflow
 from omagent_core.advanced_components.workflow.pot.workflow import PoTWorkflow
 from pathlib import Path
 from omagent_core.utils.registry import registry
-from omagent_core.clients.devices.programmatic.client import ProgrammaticClient
+from omagent_core.clients.devices.programmatic import ProgrammaticClient
 from omagent_core.utils.logger import logging
 import argparse
 import json
@@ -114,6 +117,7 @@ def main():
     # Setup logging and paths
     logging.init_logger("omagent", "omagent", level="INFO")
     CURRENT_PATH = Path(__file__).parents[0]
+    container.register_stm("SharedMemSTM")
 
     # Initialize agent modules and configuration
     registry.import_module(project_path=CURRENT_PATH.joinpath('agent'))
@@ -164,7 +168,7 @@ def main():
     # Save results to output file
     if not os.path.exists(args.output_path):
         os.makedirs(args.output_path)
-    with open(f'{args.output_path}/{dataset_name}_{model_id}_POT_output.json', 'w') as f:
+    with open(f'{args.output_path}/{dataset_name}_{model_id.replace("/","-")}_POT_output.json', 'w') as f:
         json.dump(final_output, f, indent=4)
 
     # Cleanup
 
@@ -1,12 +1,16 @@
 # Import core modules and components for the Program of Thought (PoT) workflow
+import os
+os.environ["OMAGENT_MODE"] = "lite"
+
 from omagent_core.utils.container import container
 from omagent_core.engine.workflow.conductor_workflow import ConductorWorkflow
 from omagent_core.engine.workflow.task.simple_task import simple_task
 from agent.input_interface.input_interface import PoTInputInterface
 from omagent_core.advanced_components.workflow.pot.workflow import PoTWorkflow
 from pathlib import Path
 from omagent_core.utils.registry import registry
-from omagent_core.clients.devices.cli.client import DefaultClient
+from omagent_core.clients.devices.cli import DefaultClient
+
 from omagent_core.utils.logger import logging
 
 
 
@@ -1,10 +1,13 @@
-# Import required modules and components
+# Import core modules and components for the Program of Thought (PoT) workflow
+import os
+os.environ["OMAGENT_MODE"] = "lite"
+
 from omagent_core.utils.container import container
 from omagent_core.engine.workflow.conductor_workflow import ConductorWorkflow
 from omagent_core.advanced_components.workflow.pot.workflow import PoTWorkflow
 from pathlib import Path
 from omagent_core.utils.registry import registry
-from omagent_core.clients.devices.programmatic.client import ProgrammaticClient
+from omagent_core.clients.devices.programmatic import ProgrammaticClient
 from omagent_core.utils.logger import logging
 
 
@@ -17,6 +20,8 @@
 # Load custom agent modules from the project directory
 registry.import_module(project_path=CURRENT_PATH.joinpath('agent'))
 
+container.register_stm("SharedMemSTM")
+
 # Load container configuration from YAML file
 container.from_config(CURRENT_PATH.joinpath('container.yaml'))
 
 
@@ -0,0 +1,104 @@
+# Tree-of-Thought Example
+
+In computer science, "Tree of Thought" (ToT) is a tree-structured solution that maintains a tree of thoughts represented by a coherent sequence of language, which is the intermediate steps in solving the problem. Using this approach, LLM is able to evaluate the intermediate thoughts of a rigorous reasoning process. LLM combines the ability to generate and evaluate thoughts with search algorithms such as breadth-first search and depth-first search, which can verify forward and backward when systematically exploring thoughts.
+
+This example demonstrates how to use the framework for a simple, ToT task. The example code can be found in the "examples/ToT" directory.
+
+```bash
+   cd examples/ToT
+```
+
+## Overview
+
+This example implements a general ToT workflow that consists of following components:
+
+1. **ToT Input**
+   - Handles user input containing questions, requirements, and examples(if has)
+
+2. **ToT Workflow**
+   - Generate possible next thoughts
+   - Evaluate the thoughts
+   - Search the best thought
+
+3. **ToT Output**
+   - Output the last output in final result
+
+### This whole workflow is looked like the following diagram:
+
+![ToT Workflow](./docs/images/tot_run_structure.png)
+
+## Prerequisites
+
+- Python 3.11+
+- Required packages installed (see requirements.txt)
+- Access to OpenAI API or compatible endpoint (see configs/llms/*.yml)
+- Redis server running locally or remotely
+- Conductor server running locally or remotely
+
+## Configuration
+
+The container.yaml file is a configuration file that manages dependencies and settings for different components of the system, including Conductor connections, Redis connections, and other service configurations. To set up your configuration:
+
+1. Generate the container.yaml file:
+   ```bash
+   python compile_container.py
+   ```
+   This will create a container.yaml file with default settings under `examples/ToT`.
+
+
+2. Configure your LLM and tool settings in `configs/llms/*.yml` and `configs/tools/*.yml`:
+   - Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file
+   ```bash
+   export custom_model_id="your_model_id"
+   export custom_openai_key="your_openai_api_key"
+   export custom_openai_endpoint="your_openai_endpoint"
+   ```
+
+   - Configure other model settings like temperature as needed through environment variable or by directly modifying the yml file
+
+3. Update settings in the generated `container.yaml`:
+   - Modify Redis connection settings:
+     - Set the host, port and credentials for your Redis instance
+     - Configure both `redis_stream_client` and `redis_stm_client` sections
+   - Update the Conductor server URL under conductor_config section
+   - Adjust any other component settings as needed
+
+## Running the Example
+
+3. Run the general ToT example:
+
+   For terminal/CLI usage:
+   ```bash
+   python run_cli.py
+   ```
+
+   You can now run the ToT workflow in `pro` mode or `lite` mode by changing the `OMAGENT_MODE` environment variable. The default mode is `pro` which use the conductor and redis server to run the workflow. The `lite` mode will run the workflow in the current python process without using the conductor and redis server.
+
+   For pro mode:
+   ```bash
+   export OMAGENT_MODE="pro"
+   python run_cli.py
+   ```
+
+   For lite mode:
+   ```bash
+   export OMAGENT_MODE="lite"
+   python run_cli.py
+   ```
+
+
+
+## Troubleshooting
+
+If you encounter issues:
+- Verify Redis is running and accessible
+- Check your OpenAI API key is valid
+- Check your Bing API key is valid if search results are not as expected
+- Ensure all dependencies are installed correctly
+- Review logs for any error messages
+- **Open an issue on GitHub if you can't find a solution, we will do our best to help you out!**
+
+
+## Set up your own ToT workflow
+
+ToT has many other complex settings. If you want to learn more about ToT settings, please follow this document.[ToT User Book](./docs/files/user_book.md)