Skip to content

Commit 785669c

Browse files
ProKilXuhuiZhouyangalan123Chenghao Yangautofix-ci[bot]
authored
[Automated] Merge release into main (#199)
* bump the version, test release to PyPi * Update README.md * Update README.md * Update README.md * bumpy version to 0.0.9 * Update Sotopia presentation information in README.md * bump version to 0.0.10 * bump version * add merge release back to main action * change checkout v4->v3 * fix merge-back-to-main and pin mypy to <1.11.0 * merge bug fix * upgrade default model to handle bad-foratted outputs to gpt-4o-mini as gpt-3.5-turbo is deprecated (#183) * update pull request -> pull request target * bump version * Add `bad_output_process_model` option and `use_fixed_model_version` option for all generation methods, to avoid future OpenAI API changes break Sotopia running. (#196) * Two major updates: 1) add "bad_output_process_model" option to all `agenerate_xxx()` methods so users can decide which model to use for handling bad outputs. By default, this is set to be `gpt-4o-mini`. 2) add `use_fixed_model_version` option for all generation methods, as some fixed model version may no longer available in the future. Users should have the right to bypass the fixed model version mapping instead of getting stuck in an error. Document (`generation.md`) has been updated for these two major changes correspondingly. * [autofix.ci] apply automated fixes --------- Co-authored-by: Chenghao Yang <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix gpt-3.5 * replace gpt3.5 turbo for tests * update gpt-3.5-turbo to gpt-4o-mini * bug fix for return fixed model version function --------- Co-authored-by: XuhuiZhou <[email protected]> Co-authored-by: Chenghao (Alan) Yang <[email protected]> Co-authored-by: Chenghao Yang <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
1 parent 4bf087d commit 785669c

File tree

18 files changed

+118
-46
lines changed

18 files changed

+118
-46
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,8 @@ asyncio.run(
7474
run_async_server(
7575
model_dict={
7676
"env": "gpt-4",
77-
"agent1": "gpt-3.5-turbo",
78-
"agent2": "gpt-3.5-turbo",
77+
"agent1": "gpt-4o-mini",
78+
"agent2": "gpt-4o-mini",
7979
},
8080
sampler=UniformSampler(),
8181
)

docs/pages/concepts/agents.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ class LLMAgent(BaseAgent[Observation, AgentAction]):
1111
agent_name: str | None = None,
1212
uuid_str: str | None = None,
1313
agent_profile: AgentProfile | None = None,
14-
model_name: str = "gpt-3.5-turbo",
14+
model_name: str = "gpt-4o-mini",
1515
script_like: bool = False,
1616
) -> None:
1717
```
@@ -26,7 +26,7 @@ class ScriptWritingAgent(LLMAgent):
2626
agent_name: str | None = None,
2727
uuid_str: str | None = None,
2828
agent_profile: AgentProfile | None = None,
29-
model_name: str = "gpt-3.5-turbo",
29+
model_name: str = "gpt-4o-mini",
3030
agent_names: list[str] = [],
3131
background: ScriptBackground | None = None,
3232
) -> None:

docs/pages/concepts/generation.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ async def agenerate(
1212
output_parser: BaseOutputParser[OutputType],
1313
temperature: float = 0.7,
1414
structured_output: bool = False,
15+
bad_output_process_model: str = DEFAULT_BAD_OUTPUT_PROCESS_MODEL,
16+
use_fixed_model_version: bool = True
1517
) -> OutputType:
1618
input_variables = re.findall(r"(?<!{){([^{}]+)}(?!})", template)
1719
```
@@ -23,6 +25,12 @@ The `agenerate` function is versatile by taking the output_parser as an argument
2325
* `gpt-4o-mini-2024-07-18` and later
2426
* `gpt-4o-2024-08-06` and later
2527

28+
The `bad_output_process_model` is used to process the bad output. `DEFAULT_BAD_OUTPUT_PROCESS_MODEL` is set to be `gpt-4o-mini` (At the publication time of Sotopia, we used `gpt-3.5-turbo-0613`. However this model has been taken off the shelf by OpenAI.).
29+
30+
The `use_fixed_model_version` is used to determine whether to use the fixed model version. If set to `True`, the model version will be fixed to the version that was used in Sotopia paper. If set to `False`, the model version will be the latest version available.
31+
32+
Warning: As some fixed model versions might not be available in the OpenAI API, setting `use_fixed_model_version = True` might result in an error.
33+
2634
</Callout>
2735

2836
Here are a few examples of how to use the `agenerate` function:
@@ -37,6 +45,8 @@ async def agenerate_env_profile(
3745
inspiration_prompt: str = "asking my boyfriend to stop being friends with his ex",
3846
examples: str = "",
3947
temperature: float = 0.7,
48+
bad_output_process_model: str = DEFAULT_BAD_OUTPUT_PROCESS_MODEL,
49+
use_fixed_model_version: bool = True
4050
) -> tuple[EnvironmentProfile, str]:
4151
"""
4252
Using langchain to generate the background
@@ -56,6 +66,8 @@ async def agenerate_env_profile(
5666
),
5767
output_parser=PydanticOutputParser(pydantic_object=EnvironmentProfile),
5868
temperature=temperature,
69+
bad_output_process_model=bad_output_process_model,
70+
use_fixed_model_version=use_fixed_model_version
5971
)
6072
```
6173
### Other generation functions
@@ -66,6 +78,8 @@ Similarly, there are other utility functions that builds upon the `agenerate` fu
6678
async def agenerate_relationship_profile(
6779
model_name: str,
6880
agents_profiles: list[str],
81+
bad_output_process_model: str = DEFAULT_BAD_OUTPUT_PROCESS_MODEL,
82+
use_fixed_model_version: bool = True
6983
) -> tuple[RelationshipProfile, str]
7084
```
7185

@@ -78,5 +92,7 @@ async def agenerate_script(
7892
agent_name: str = "",
7993
history: str = "",
8094
single_step: bool = False,
95+
bad_output_process_model: str = DEFAULT_BAD_OUTPUT_PROCESS_MODEL,
96+
use_fixed_model_version: bool = True
8197
) -> tuple[ScriptInteractionReturnType, str]
8298
```

docs/pages/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -206,8 +206,8 @@ asyncio.run(
206206
run_async_server(
207207
model_dict={
208208
"env": "gpt-4",
209-
"agent1": "gpt-3.5-turbo",
210-
"agent2": "gpt-3.5-turbo",
209+
"agent1": "gpt-4o-mini",
210+
"agent2": "gpt-4o-mini",
211211
},
212212
sampler=UniformSampler(),
213213
)

examples/benchmark_evaluator.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515

1616
target_model_patterns: list[list[str]] = [
1717
["gpt-4", "gpt-4", "gpt-3.5-turbo"],
18-
["gpt-4", "gpt-3.5-turbo", "gpt-4"],
19-
["gpt-4", "gpt-3.5-turbo", "togethercomputer/llama-2-70b-chat"],
18+
["gpt-4", "gpt-4o-mini", "gpt-4"],
19+
["gpt-4", "gpt-4o-mini", "togethercomputer/llama-2-70b-chat"],
2020
["gpt-4", "togethercomputer/llama-2-70b-chat", "gpt-3.5-turbo"],
2121
]
2222

examples/experiment_eval.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -170,8 +170,8 @@ def run_async_server_in_batch(
170170
batch_size: int = 1,
171171
model_names: dict[str, LLM_Name] = {
172172
"env": "gpt-4",
173-
"agent1": "gpt-3.5-turbo",
174-
"agent2": "gpt-3.5-turbo",
173+
"agent1": "gpt-4o-mini",
174+
"agent2": "gpt-4o-mini",
175175
},
176176
tag: str | None = None,
177177
verbose: bool = False,

examples/fix_missing_episodes.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -252,8 +252,8 @@ def re_run_missing_episodes(
252252
combo_with_models: dict[tuple[LLM_Name, LLM_Name], list[tuple[str, str, str]]],
253253
model_names: dict[str, LLM_Name] = {
254254
"env": "gpt-4",
255-
"agent1": "gpt-3.5-turbo",
256-
"agent2": "gpt-3.5-turbo",
255+
"agent1": "gpt-4o-mini",
256+
"agent2": "gpt-4o-mini",
257257
},
258258
batch_size: int = 5,
259259
verbose: bool = False,

examples/fix_missing_episodes_with_tag.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -350,8 +350,8 @@ def re_run_missing_episodes(
350350
env_agent_ids: List[Tuple[str, str, str]] = [],
351351
model_names: dict[str, LLM_Name] = {
352352
"env": "gpt-4",
353-
"agent1": "gpt-3.5-turbo",
354-
"agent2": "gpt-3.5-turbo",
353+
"agent1": "gpt-4o-mini",
354+
"agent2": "gpt-4o-mini",
355355
},
356356
batch_size: int = 5,
357357
rerun_tag: str = "missing_episodes",

examples/generate_script.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ def full_freeform(
175175
def run_async_server_in_batch_script(
176176
*,
177177
batch_size: int = 10,
178-
model: LLM_Name = "gpt-3.5-turbo",
178+
model: LLM_Name = "gpt-4o-mini",
179179
tag: str | None = None,
180180
push_to_db: bool = True,
181181
json_in_script: bool = False,

examples/minimalist_demo.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@
2828
run_async_server(
2929
model_dict={
3030
"env": "gpt-4",
31-
"agent1": "gpt-3.5-turbo",
32-
"agent2": "gpt-3.5-turbo",
31+
"agent1": "gpt-4o-mini",
32+
"agent2": "gpt-4o-mini",
3333
},
3434
sampler=UniformSampler(),
3535
)

0 commit comments

Comments
 (0)