Skip to content

Commit fc71480

Browse files
authored
Merge pull request #14 from kyusonglee/develop/v0.2.5
Develop/v0.2.5
2 parents c323939 + 74a9eab commit fc71480

File tree

470 files changed

+32877
-1000
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

470 files changed

+32877
-1000
lines changed

.github/workflows/workflow.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ jobs:
2222
- name: Set up Python
2323
uses: actions/setup-python@v3
2424
with:
25-
python-version: '3.10'
25+
python-version: '3.11'
2626
- name: Install Poetry
2727
uses: snok/install-poetry@v1
2828
- name: Install dependencies

.gitignore

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,4 +154,9 @@ video_cache/
154154
*.db
155155

156156
# vscode
157-
.vscode
157+
.vscode
158+
159+
# JSON files
160+
*.json
161+
!mcp.json
162+
import os

omagent-core/src/omagent_core/advanced_components/workflow/self_consist_cot/agent/cot_conclude/sys_prompt.prompt renamed to 1.txt

File renamed without changes.

README.md

Lines changed: 8 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ OmAgent is python library for building multimodal language agents with ease. We
2727
- A flexible agent architecture that provides graph-based workflow orchestration engine and various memory type enabling contextual reasoning.
2828
- Native multimodal interaction support include VLM models, real-time API, computer vision models, mobile connection and etc.
2929
- A suite of state-of-the-art unimodal and multimodal agent algorithms that goes beyond simple LLM reasoning, e.g. ReAct, CoT, SC-Cot etc.
30+
- Supports local deployment of models. You can deploy your own models locally by using Ollama[Ollama](./docs/concepts/models/Ollama.md) or [LocalAI](./examples/video_understanding/docs/local-ai.md).
31+
- Fully distributed architecture, supports custom scaling. Also supports Lite mode, eliminating the need for middleware deployment.
3032

3133

3234
## 🛠️ How To Install
@@ -40,11 +42,6 @@ OmAgent is python library for building multimodal language agents with ease. We
4042
```bash
4143
pip install -e omagent-core
4244
```
43-
- Set Up Conductor Server (Docker-Compose) Docker-compose includes conductor-server, Elasticsearch, and Redis.
44-
```bash
45-
cd docker
46-
docker-compose up -d
47-
```
4845

4946
## 🚀 Quick Start
5047
### Configuration
@@ -56,9 +53,7 @@ The container.yaml file is a configuration file that manages dependencies and se
5653
cd examples/step1_simpleVQA
5754
python compile_container.py
5855
```
59-
This will create a container.yaml file with default settings under `examples/step1_simpleVQA`.
60-
61-
56+
This will create a container.yaml file with default settings under `examples/step1_simpleVQA`. For more information about the container.yaml configuration, please refer to the [container module](./docs/concepts/container.md)
6257

6358
2. Configure your LLM settings in `configs/llms/gpt.yml`:
6459

@@ -69,14 +64,6 @@ The container.yaml file is a configuration file that manages dependencies and se
6964
```
7065
You can use a locally deployed Ollama to call your own language model. The tutorial is [here](docs/concepts/models/Ollama.md).
7166

72-
3. Update settings in the generated `container.yaml`:
73-
- Configure Redis connection settings, including host, port, credentials, and both `redis_stream_client` and `redis_stm_client` sections.
74-
- Update the Conductor server URL under conductor_config section
75-
- Adjust any other component settings as needed
76-
77-
78-
For more information about the container.yaml configuration, please refer to the [container module](./docs/concepts/container.md)
79-
8067
### Run the demo
8168

8269
1. Run the simple VQA demo with webpage GUI:
@@ -91,7 +78,11 @@ For more information about the container.yaml configuration, please refer to the
9178

9279
## 🤖 Example Projects
9380
### 1. Video QA Agents
94-
Build a system that can answer any questions about uploaded videos with video understanding agents. See Details [here](examples/video_understanding/README.md).
81+
Build a system that can answer any questions about uploaded videos with video understanding agents. we provide a gradio based application, see details [here](examples/video_understanding/README.md).
82+
<p >
83+
<img src="docs/images/video_understanding_gradio.png" width="500"/>
84+
</p>
85+
9586
More about the video understanding agent can be found in [paper](https://arxiv.org/abs/2406.16620).
9687
<p >
9788
<img src="docs/images/OmAgent.png" width="500"/>

docs/concepts/tool_system/mcp.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Model Control Protocol (MCP)
2+
3+
OmAgent's Model Control Protocol (MCP) system enables seamless integration with external AI models and services through a standardized interface. This protocol allows OmAgent to dynamically discover, register, and execute tools from multiple external servers, extending its capabilities without modifying the core codebase.
4+
5+
## MCP Configuration File
6+
7+
MCP servers are configured in a JSON file, typically named `mcp.json`. This file defines the servers that OmAgent can connect to. Each server has a unique name, command to execute, arguments, and environment variables.
8+
9+
Here's an example of a basic `mcp.json` file that configures multiple MCP servers:
10+
11+
```json
12+
{
13+
"mcpServers": {
14+
"desktop-commander": {
15+
"command": "npx",
16+
"args": [
17+
"-y",
18+
"@smithery/cli@latest",
19+
"run",
20+
"@wonderwhy-er/desktop-commander",
21+
"--key",
22+
"your-api-key-here"
23+
]
24+
},
25+
.....
26+
}
27+
```
28+
29+
By default, OmAgent looks for this file in the following locations (in order):
30+
1. Inside the tool_system directory `omagent-cor/src/omagnet_core/tool_system/mcp.json`
31+
it will be automatically loaded.
32+
33+
## Executing MCP Tools
34+
35+
MCP tools can be executed just like any other tool using the ToolManager:
36+
37+
```python
38+
# Let the ToolManager choose the appropriate tool
39+
x = tool_manager.execute_task("command ls -l for the current directory")
40+
print (x)
41+
```
42+
43+
For more details on creating MCP servers, refer to the [MCP specification](https://github.com/modelcontextprotocol/python-sdk).

docs/images/reflexion.png

Lines changed: 3 additions & 0 deletions
Loading
Lines changed: 3 additions & 0 deletions
Loading

docs/tutorials/run_agent_full.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Run the full version of OmAgent
2+
OmAgent now supports free switching between Full and Lite versions, the differences between the two versions are as follows:
3+
- The Full version has better concurrency performance, can view workflows as well as run logs with the help of the orchestration system GUI, and supports more device types (e.g. smartphone apps). Note that running the Full version requires a Docker deployment middleware dependencies.
4+
- The Lite version is suitable for developers who want to get started faster. It eliminates the steps of installing and deploying Docker, and is suitable for rapid prototyping and debugging.
5+
6+
## Instruction of how to use Full version
7+
### 🛠️ How To Install
8+
- python >= 3.10
9+
- Install omagent_core
10+
Use pip to install omagent_core latest release.
11+
```bash
12+
pip install omagent-core
13+
```
14+
Or install the latest version from the source code like below.
15+
```bash
16+
pip install -e omagent-core
17+
```
18+
- Set Up Conductor Server (Docker-Compose) Docker-compose includes conductor-server, Elasticsearch, and Redis.
19+
```bash
20+
cd docker
21+
docker-compose up -d
22+
```
23+
24+
### 🚀 Quick Start
25+
#### Configuration
26+
27+
The container.yaml file is a configuration file that manages dependencies and settings for different components of the system. To set up your configuration:
28+
29+
1. Generate the container.yaml file:
30+
```bash
31+
cd examples/step1_simpleVQA
32+
python compile_container.py
33+
```
34+
This will create a container.yaml file with default settings under `examples/step1_simpleVQA`.
35+
36+
37+
38+
2. Configure your LLM settings in `configs/llms/gpt.yml`:
39+
40+
- Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file
41+
```bash
42+
export custom_openai_key="your_openai_api_key"
43+
export custom_openai_endpoint="your_openai_endpoint"
44+
```
45+
You can use a locally deployed Ollama to call your own language model. The tutorial is [here](docs/concepts/models/Ollama.md).
46+
47+
3. Update settings in the generated `container.yaml`:
48+
- Configure Redis connection settings, including host, port, credentials, and both `redis_stream_client` and `redis_stm_client` sections.
49+
- Update the Conductor server URL under conductor_config section
50+
- Adjust any other component settings as needed
51+
52+
53+
For more information about the container.yaml configuration, please refer to the [container module](./docs/concepts/container.md)
54+
55+
#### Run the demo
56+
57+
1. Set the OmAgent to Full version by setting environment variable `OMAGENT_MODE`
58+
```bash
59+
export OMAGENT_MODE=full
60+
```
61+
or
62+
```pyhton
63+
os.environ["OMAGENT_MODE"] = "full"
64+
```
65+
2. Run the simple VQA demo with webpage GUI:
66+
67+
For WebpageClient usage: Input and output are in the webpage
68+
```bash
69+
cd examples/step1_simpleVQA
70+
python run_webpage.py
71+
```
72+
Open the webpage at `http://127.0.0.1:7860`, you will see the following interface:
73+
<img src="docs/images/simpleVQA_webpage.png" width="400"/>

examples/PoT/eval_aqua_zeroshot.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
11
# Import required modules and components
2+
import os
3+
os.environ["OMAGENT_MODE"] = "lite"
4+
25
from omagent_core.utils.container import container
36
from omagent_core.engine.workflow.conductor_workflow import ConductorWorkflow
47
from omagent_core.advanced_components.workflow.pot.workflow import PoTWorkflow
58
from pathlib import Path
69
from omagent_core.utils.registry import registry
7-
from omagent_core.clients.devices.programmatic.client import ProgrammaticClient
10+
from omagent_core.clients.devices.programmatic import ProgrammaticClient
811
from omagent_core.utils.logger import logging
912
import argparse
1013
import json
@@ -50,6 +53,7 @@ def main():
5053
# Setup logging and paths
5154
logging.init_logger("omagent", "omagent", level="INFO")
5255
CURRENT_PATH = Path(__file__).parents[0]
56+
container.register_stm("SharedMemSTM")
5357

5458
# Initialize agent modules and configuration
5559
registry.import_module(project_path=CURRENT_PATH.joinpath('agent'))
@@ -84,7 +88,7 @@ def main():
8488
for r, w in zip(res, workflow_input_list):
8589
output_json.append({
8690
"id": w['id'],
87-
"question": w['query'],
91+
"question": w['query']+'\nOptions: '+str(question['options']),
8892
"last_output": r['last_output'],
8993
"prompt_tokens": r['prompt_tokens'],
9094
"completion_tokens": r['completion_tokens']
@@ -101,7 +105,7 @@ def main():
101105
# Save results to output file
102106
if not os.path.exists(args.output_path):
103107
os.makedirs(args.output_path)
104-
with open(f'{args.output_path}/{dataset_name}_{model_id}_POT_output.json', 'w') as f:
108+
with open(f'{args.output_path}/{dataset_name}_{model_id.replace("/","-")}_POT_output.json', 'w') as f:
105109
json.dump(final_output, f, indent=4)
106110

107111
# Cleanup

examples/PoT/eval_gsm8k_fewshot.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1+
import os
2+
os.environ["OMAGENT_MODE"] = "lite"
3+
14
# Import required modules and components
25
from omagent_core.utils.container import container
36
from omagent_core.engine.workflow.conductor_workflow import ConductorWorkflow
47
from omagent_core.advanced_components.workflow.pot.workflow import PoTWorkflow
58
from pathlib import Path
69
from omagent_core.utils.registry import registry
7-
from omagent_core.clients.devices.programmatic.client import ProgrammaticClient
10+
from omagent_core.clients.devices.programmatic import ProgrammaticClient
811
from omagent_core.utils.logger import logging
912
import argparse
1013
import json
@@ -114,6 +117,7 @@ def main():
114117
# Setup logging and paths
115118
logging.init_logger("omagent", "omagent", level="INFO")
116119
CURRENT_PATH = Path(__file__).parents[0]
120+
container.register_stm("SharedMemSTM")
117121

118122
# Initialize agent modules and configuration
119123
registry.import_module(project_path=CURRENT_PATH.joinpath('agent'))
@@ -164,7 +168,7 @@ def main():
164168
# Save results to output file
165169
if not os.path.exists(args.output_path):
166170
os.makedirs(args.output_path)
167-
with open(f'{args.output_path}/{dataset_name}_{model_id}_POT_output.json', 'w') as f:
171+
with open(f'{args.output_path}/{dataset_name}_{model_id.replace("/","-")}_POT_output.json', 'w') as f:
168172
json.dump(final_output, f, indent=4)
169173

170174
# Cleanup

0 commit comments

Comments
 (0)