Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit bd9271c

Browse files
Feat/check status python process (#1848)
* init python engine documentation * add docs for python engine * add docs for python engine * add docs for python engine * update: add explanation for CI params input * update check status python process * fix: CI build
1 parent 76956dc commit bd9271c

File tree

5 files changed

+392
-5
lines changed

5 files changed

+392
-5
lines changed

docs/docs/engines/python-engine.mdx

Lines changed: 317 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,317 @@
1+
---
2+
title: Python Engine
3+
description: Interface for running Python process through cortex
4+
---
5+
6+
:::warning
7+
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
8+
:::
9+
# Guild to Python Engine
10+
## Introduction
11+
To run python program, we need python environment and python intepreter to running the different process from the main cortex process. All requests to The python program will be routed through cortex with Http API protocol.
12+
13+
The python-engine acts like a process manager, mange all python processes.
14+
Each python program will be treated as a model and has it own model.yml template
15+
16+
## Python engine cpp implementation
17+
The python-engine implemented the [EngineI Interface ](/docs/engines/engine-extension) with the following map:
18+
- LoadModel: Load the python program and start the python process
19+
- UnloadModel: Stop the python process
20+
- GetModelStatus: Send health check requests to the python processes
21+
- GetModels: Get running python program
22+
23+
Beside the EngineI interface, the python-engine also implemented the HandleInference and HandleRouteRequest method:
24+
- HandleInference: Send inference request to the python process
25+
- HandleRouteRequest: route any types of requests to the python process
26+
27+
Python engine is a built in engine of cortex-cpp, so that it will automatically loaded when load model, users don't need to download engine or load/unload engine like working with llama-cpp engine.
28+
29+
## Python program implementation
30+
31+
Each python program will be treated as python model. Each python model has it own `model.yml` template:
32+
```yaml
33+
34+
id: ichigo-0.5:fp16-linux-amd64
35+
model: ichigo-0.5:fp16-linux-amd64
36+
name: Ichigo Wrapper
37+
version: 1
38+
39+
port: 22310
40+
script: src/app.py
41+
log_path: ichigo-wrapper.log
42+
log_level: INFO
43+
command:
44+
- python
45+
files:
46+
- /home/thuan/cortexcpp/models/cortex.so/ichigo-0.5/fp16-linux-amd64
47+
depends:
48+
- ichigo-0.4:8b-gguf-q4-km
49+
- whispervq:fp16-linux-amd64
50+
- fish-speech:fp16-linux-amd64
51+
engine: python-engine
52+
extra_params:
53+
device_id: 0
54+
fish_speech_port: 22312
55+
ichigo_model: ichigo-0.4:8b-gguf-q4-km
56+
ichigo_port: 39281
57+
whisper_port: 3348
58+
```
59+
60+
61+
| **Parameter** | **Description** | **Required** |
62+
|-----------------|-----------------------------------------------------------------------------------------------------------|--------------|
63+
| `id` | Unique identifier for the model, typically includes version and platform information. | Yes |
64+
| `model` | Specifies the variant of the model, often denoting size or quantization details. | Yes |
65+
| `name` | The human-readable name for the model, used as the `model_id`. | Yes |
66+
| `version` | The specific version number of the model. | Yes |
67+
| `port` | The network port on which the Python program will listen for requests. | Yes |
68+
| `script` | Path to the main Python script to be executed by the engine. This is relative path to the model folder | Yes |
69+
| `log_path` | File location where logs will be stored for the Python program's execution. log_path is relative path of cortex data folder | No |
70+
| `log_level` | The level of logging detail (e.g., INFO, DEBUG). | No |
71+
| `command` | The command used to launch the Python program, typically starting with 'python'. | Yes |
72+
| `files` | For python models, the files is the path to folder contains all python scripts, model binary and environment to run the program | No |
73+
| `depends` | Dependencies required by the model, specified by their identifiers. The dependencies are other models | No |
74+
| `engine` | Specifies the engine to use, which in this context is 'python-engine'. | Yes |
75+
| `extra_params` | Additional parameters that may be required by the model, often including device IDs and network ports of dependencies models. This extra_params will be passed when running python script | No |
76+
77+
## Ichigo python with cortex
78+
79+
[Ichigo python](https://github.com/janhq/ichigo) is built in model in cortex that support chat with audio.
80+
### Downloads models
81+
Ichigo python requires 4 models to run:
82+
- ichigo-0.5
83+
- whispervq
84+
- ichigo-0.4
85+
- fish-speech (this model is required if user want to use text to speech mode)
86+
87+
Firstly, you need to download these models, remember to chose the correct version base on your device and operating system.
88+
for example for linux amd64:
89+
```sh
90+
> curl --location '127.0.0.1:39281/v1/models/pull' \
91+
--header 'Content-Type: application/json' \
92+
--data '{"model":"ichigo-0.5:fp16-linux-amd64"}'
93+
94+
> curl --location '127.0.0.1:39281/v1/models/pull' \
95+
--header 'Content-Type: application/json' \
96+
--data '{"model":"ichigo-0.4:8b-gguf-q4-km"}'
97+
98+
> curl --location '127.0.0.1:39281/v1/models/pull' \
99+
--header 'Content-Type: application/json' \
100+
--data '{"model":"whispervq:fp16-linux-amd64"}'
101+
102+
> curl --location '127.0.0.1:39281/v1/models/pull' \
103+
--header 'Content-Type: application/json' \
104+
--data '{"model":"fish-speech:fp16-linux-amd64"}'
105+
```
106+
### Start model
107+
108+
Each python model will run it owns server with a port defined in `model.yml`, you can update `model.yml` to change the port.
109+
for the ichigo-0.5 model, it has `extra_params` that need to be defined correctly:
110+
extra_params:
111+
device_id: 0
112+
fish_speech_port: 22312
113+
ichigo_model: ichigo-0.4:8b-gguf-q4-km
114+
ichigo_port: 39281
115+
whisper_port: 3348
116+
117+
To start model just need to send API:
118+
```sh
119+
> curl --location '127.0.0.1:39281/v1/models/start' \
120+
--header 'Content-Type: application/json' \
121+
--data '{
122+
"model":"ichigo-0.5:fp16-linux-amd64"
123+
}'
124+
125+
```
126+
127+
Then the model will start all dependencies model of ichigo
128+
129+
### Check Status
130+
131+
You can check the status of the model by sending API:
132+
```
133+
curl --location '127.0.0.1:39281/v1/models/status/fish-speech:fp16-linux-amd64'
134+
```
135+
136+
### Inference
137+
138+
You can send inference request to the model by sending API:
139+
```sh
140+
> curl --location '127.0.0.1:39281/v1/inference' \
141+
--header 'Content-Type: application/json' \
142+
--data '{
143+
"model":"ichigo-0.5:fp16-linux-amd64",
144+
"engine":"python-engine",
145+
"body":{
146+
"messages": [
147+
{
148+
"role":"system",
149+
"content":"you are helpful assistant, you must answer questions short and concil!"
150+
}
151+
],
152+
"input_audio": {
153+
"data": "base64_encoded_audio_data",
154+
"format": "wav"
155+
},
156+
"model": "ichigo-0.4:8b-gguf-q4km",
157+
"stream": true,
158+
"temperature": 0.7,
159+
"top_p": 0.9,
160+
"max_tokens": 2048,
161+
"presence_penalty": 0,
162+
"frequency_penalty": 0,
163+
"stop": [
164+
"<|eot_id|>"
165+
],
166+
"output_audio": true
167+
}}'
168+
169+
```
170+
171+
### Stop Model
172+
173+
You can stop the model by sending API:
174+
```sh
175+
> curl --location '127.0.0.1:39281/v1/models/stop' \
176+
--header 'Content-Type: application/json' \
177+
--data '{
178+
"model":"ichigo-0.5:fp16-linux-amd64"
179+
}'
180+
```
181+
182+
Cortex also stop all dependencies of this model.
183+
184+
### Route requests
185+
186+
Beside from that, cortex also support route any kind of request to python program through the route request endpoint.
187+
188+
```sh
189+
curl --location '127.0.0.1:39281/v1/route/request' \
190+
--header 'Content-Type: application/json' \
191+
--data '{
192+
"model":"whispervq:fp16",
193+
"path":"/inference",
194+
"engine":"python-engine",
195+
"method":"post",
196+
"transform_response":"{ {%- set first = true -%} {%- for key, value in input_request -%} {%- if key == \"tokens\" -%} {%- if not first -%},{%- endif -%} \"{{ key }}\": {{ tojson(value) }} {%- set first = false -%} {%- endif -%} {%- endfor -%} }",
197+
"body": {
198+
"data": "base64 data",
199+
"format": "wav"
200+
}
201+
}
202+
'
203+
204+
```
205+
## Add new python model
206+
207+
### Python model implementation
208+
209+
The implementation of a python program can follow this [implementation](https://github.com/janhq/ichigo/pull/154).
210+
The python server should expose at least 2 endpoint:
211+
- /health : for checking status of server.
212+
- /inference : for inferencing purpose.
213+
214+
Exemple of the main entrypoint `src/app.py`:
215+
216+
```
217+
import argparse
218+
import os
219+
import sys
220+
from pathlib import Path
221+
222+
from contextlib import asynccontextmanager
223+
224+
from typing import AsyncGenerator, List
225+
226+
import uvicorn
227+
from dotenv import load_dotenv
228+
from fastapi import APIRouter, FastAPI
229+
230+
from common.utility.logger_utility import LoggerUtility
231+
from services.audio.audio_controller import AudioController
232+
from services.audio.implementation.audio_service import AudioService
233+
from services.health.health_controller import HealthController
234+
235+
236+
def create_app() -> FastAPI:
237+
routes: List[APIRouter] = [
238+
HealthController(),
239+
AudioController()
240+
]
241+
app = FastAPI()
242+
for route in routes:
243+
app.include_router(route)
244+
return app
245+
246+
247+
def parse_argument():
248+
parser = argparse.ArgumentParser(description="Ichigo-wrapper Application")
249+
parser.add_argument('--log_path', type=str,
250+
default='Ichigo-wrapper.log', help='The log file path')
251+
parser.add_argument('--log_level', type=str, default='INFO',
252+
choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'TRACE'], help='The log level')
253+
parser.add_argument('--port', type=int, default=22310,
254+
help='The port to run the Ichigo-wrapper app on')
255+
parser.add_argument('--device_id', type=str, default="0",
256+
help='The port to run the Ichigo-wrapper app on')
257+
parser.add_argument('--package_dir', type=str, default="",
258+
help='The package-dir to be extended to sys.path')
259+
parser.add_argument('--whisper_port', type=int, default=3348,
260+
help='The port of whisper vq model')
261+
parser.add_argument('--ichigo_port', type=int, default=39281,
262+
help='The port of ichigo model')
263+
parser.add_argument('--fish_speech_port', type=int, default=22312,
264+
help='The port of fish speech model')
265+
parser.add_argument('--ichigo_model', type=str, default="ichigo:8b-gguf-q4-km",
266+
help='The ichigo model name')
267+
args = parser.parse_args()
268+
return args
269+
270+
271+
if __name__ == "__main__":
272+
args = parse_argument()
273+
LoggerUtility.init_logger(__name__, args.log_level, args.log_path)
274+
275+
env_path = Path(os.path.dirname(os.path.realpath(__file__))
276+
) / "variables" / ".env"
277+
AudioService.initialize(
278+
args.whisper_port, args.ichigo_port, args.fish_speech_port, args.ichigo_model)
279+
load_dotenv(dotenv_path=env_path)
280+
app: FastAPI = create_app()
281+
print("Server is running at: 0.0.0.0:", args.port)
282+
uvicorn.run(app=app, host="0.0.0.0", port=args.port)
283+
284+
```
285+
286+
287+
The parse_argument must include parameters to integrate with cortex:
288+
- port
289+
- log_path
290+
- log_level
291+
292+
The python server can also have extra parameters and need to be defined in `extra_params` part of `model.yml`
293+
When starting server, the parameters will be override by the parameters in `model.yml`
294+
295+
When finished python code, you need to trigger this [CI](https://github.com/janhq/cortex.cpp/actions/workflows/python-script-package.yml)
296+
so that the latest code will be pushed to cortexso huggingface. After pushed to HF, user can download and use it.
297+
The CI will clone and checkout approriate branch of your repo and navigate to the correct folder base on input parameters.The CI needs 5 parameters:
298+
- Path to model directory in github repo: the path to folder contains all model scripts for running python program
299+
- name of repo to be checked out: name of github repo
300+
- branch to be checked out: name of branch to be checked out
301+
- name of huggingface repo to be pushed: name of huggingface repo to be pushed (e.g. cortexso/ichigo-0.5)
302+
- prefix of hf branch: The prefix of branch name (e.g `fp16`)
303+
304+
### Python venv package
305+
For packaging python venv, you need to prepare a `requirements.txt` and a `requirements.cuda.txt` file in the root of your project.
306+
The `requirements.txt` file should contain all the dependencies for your project, and the `requirements.cuda.txt` file should contain all the dependencies that require CUDA.
307+
The `requirements.txt` will be used to build venv for MacOS. The `requirements.cuda.txt` will be used to build venv for Linux and Windows.
308+
309+
After finished you need to trigger this [CI](https://github.com/janhq/cortex.cpp/actions/workflows/python-venv-package.yml).
310+
After the CI is finished, the venv for 4 os will be build and pushed to HuggingFace and it can be downloaded and used by users.
311+
The CI will clone and checkout approriate branch of your repo and navigate to the correct folder contains `requirements.txt` base on input parameters.The CI needs 5 parameters:
312+
- Path to model directory in github repo: the path to folder contains all model scripts for running python program
313+
- name of repo to be checked out: name of github repo
314+
- name of model to be release: name of the model that we are building venv for (e.g whispervq)
315+
- branch to be checked out: name of branch to be checked out
316+
- name of huggingface repo to be pushed: name of huggingface repo to be pushed (e.g. cortexso/ichigo-0.5)
317+
- prefix of hf branch: The prefix of branch name (e.g `fp16`)

docs/sidebars.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ const sidebars: SidebarsConfig = {
148148
collapsed: true,
149149
items: [
150150
{ type: "doc", id: "engines/llamacpp", label: "llama.cpp" },
151+
{ type: "doc", id: "engines/python-engine", label: "python engine" },
151152
// { type: "doc", id: "engines/tensorrt-llm", label: "TensorRT-LLM" },
152153
// { type: "doc", id: "engines/onnx", label: "ONNX" },
153154
{

engine/extensions/python-engine/python_engine.cc

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -867,17 +867,30 @@ void PythonEngine::GetModelStatus(
867867
auto model = json_body->get("model", "").asString();
868868
auto model_config = models_[model];
869869
auto health_endpoint = model_config.heath_check;
870+
auto pid = processMap[model];
871+
auto is_process_live = process_status_utils::IsProcessRunning(pid);
870872
auto response_health = MakeGetRequest(model, health_endpoint.path);
871873

872-
if (response_health.error) {
874+
if (response_health.error && is_process_live) {
875+
Json::Value status;
876+
status["is_done"] = true;
877+
status["has_error"] = false;
878+
status["is_stream"] = false;
879+
status["status_code"] = k200OK;
880+
Json::Value message;
881+
message["message"] = "model '"+model+"' is loading";
882+
callback(std::move(status), std::move(message));
883+
return;
884+
}
885+
else if(response_health.error && !is_process_live){
873886
Json::Value status;
874887
status["is_done"] = true;
875888
status["has_error"] = true;
876889
status["is_stream"] = false;
877890
status["status_code"] = k400BadRequest;
878-
Json::Value error;
879-
error["error"] = response_health.error_message;
880-
callback(std::move(status), std::move(error));
891+
Json::Value message;
892+
message["message"] = response_health.error_message;
893+
callback(std::move(status), std::move(message));
881894
return;
882895
}
883896

engine/extensions/python-engine/python_engine.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
#include "extensions/template_renderer.h"
1515
#include "utils/file_logger.h"
1616
#include "utils/file_manager_utils.h"
17-
17+
#include "utils/process_status_utils.h"
1818
#include "utils/curl_utils.h"
1919
#ifdef _WIN32
2020
#include <process.h>

0 commit comments

Comments
 (0)