sotopia-lab
diff --git a/‎.github/.codecov.yml‎
Lines changed: 1 addition & 0 deletions b/‎.github/.codecov.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.github/workflows/cli_tests.yml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/cli_tests.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.github/workflows/mypy.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/mypy.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/tests.sh‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/tests.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/tests_in_docker.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/tests_in_docker.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/pages/concepts/evaluation_dimension.md‎
Lines changed: 116 additions & 0 deletions b/‎docs/pages/concepts/evaluation_dimension.md‎
Lines changed: 116 additions & 0 deletions
diff --git a/‎docs/pages/contribution/contribution.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/pages/contribution/contribution.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/pages/examples/deployment.md‎
Lines changed: 6 additions & 0 deletions b/‎docs/pages/examples/deployment.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/pages/index.mdx‎
Lines changed: 8 additions & 3 deletions b/‎docs/pages/index.mdx‎
Lines changed: 8 additions & 3 deletions
diff --git a/‎docs/pages/python_API/database/evaluation_dimensions.md‎
Lines changed: 54 additions & 0 deletions b/‎docs/pages/python_API/database/evaluation_dimensions.md‎
Lines changed: 54 additions & 0 deletions
@@ -6,6 +6,7 @@ ignore:
   - ".github" # ignore the .github directory
   - "docs" # ignore the tests directory
   - "figs" # ignore the figs directory
+  - "ui" # ignore the ui directory
 
 coverage:
   status:
 
@@ -20,7 +20,7 @@ jobs:
     strategy:
       max-parallel: 5
       matrix:
-        os: [ubuntu-latest, macos-13]
+        os: [ubuntu-latest, macos-latest]
 
     runs-on: ${{ matrix.os }}
 
@@ -38,7 +38,7 @@ jobs:
       run: |
           python -m pip install --upgrade pip
           python -m pip install uv
-          uv sync --extra test --extra chat
+          uv sync --extra test --extra api
     - name: Test with pytest
       run: |
         uv run pytest tests/cli/test_install.py --cov=. --cov-report=xml
 
@@ -35,7 +35,7 @@ jobs:
       run: |
         python -m pip install --upgrade pip
         python -m pip install uv
-        uv sync --extra test --extra chat
+        uv sync --extra test --extra api
     - name: Type-checking package with mypy
       run: |
         # Run this mypy instance against our main package.
 
@@ -1 +1 @@
-uv run --extra test --extra chat pytest --ignore tests/cli --cov=. --cov-report=xml
+uv run --extra test --extra api pytest --ignore tests/cli --cov=. --cov-report=xml
@@ -28,7 +28,7 @@ jobs:
     - name: Docker Compose
       run: docker compose -f .devcontainer/docker-compose.yml up -d
     - name: Run tests
-      run: docker compose -f .devcontainer/docker-compose.yml run --rm -u root -v /home/runner/work/sotopia/sotopia:/workspaces/sotopia devcontainer /bin/sh -c "cd /workspaces/sotopia; ls; uv sync --extra test --extra chat; uv run pytest --ignore tests/cli --cov=. --cov-report=xml"
+      run: docker compose -f .devcontainer/docker-compose.yml run --rm -u root -v /home/runner/work/sotopia/sotopia:/workspaces/sotopia devcontainer /bin/sh -c "cd /workspaces/sotopia; ls; uv sync --extra test --extra api; uv run pytest --ignore tests/cli --cov=. --cov-report=xml"
     - name: Upload coverage report to Codecov
       uses: codecov/[email protected]
       with:
 
@@ -0,0 +1,116 @@
+## Overview
+
+Evaluation dimensions are used to evaluate the quality of social interactions.
+In original Sotopia paper, there are 7 dimensions to evaluate the quality of social interactions, where we named them as `sotopia` evaluation dimensions:
+- believability
+- relationship
+- knowledge
+- secret
+- social rules
+- financial and material benefits
+- goal
+
+The `SotopiaDimensions` can be used directly without initializing the database. It provides a set of predefined evaluation dimensions that are ready to use for evaluating social interactions. For example,
+
+```python
+from sotopia.envs.parallel import ParallelSotopiaEnv
+from sotopia.envs.evaluators import EvaluationForTwoAgents, ReachGoalLLMEvaluator, RuleBasedTerminatedEvaluator, SotopiaDimensions
+
+env = ParallelSotopiaEnv(
+    env_profile=env_profile,
+        model_name=model_names["env"],
+        action_order="round-robin",
+        evaluators=[
+            RuleBasedTerminatedEvaluator(max_turn_number=20, max_stale_turn=2),
+        ],
+        terminal_evaluators=[
+            ReachGoalLLMEvaluator(
+                model_names["env"],
+                EvaluationForTwoAgents[SotopiaDimensions],  # type: ignore
+                # TODO check how to do type annotation
+            ),
+        ],
+    )
+```
+
+
+However we observe under many use cases people may want to evaluate with customized evaluation metrics, so we provide a way to build custom evaluation dimensions.
+For a quick reference, you can directly check out the `examples/use_custom_dimensions.py`.
+
+### CustomEvaluationDimension
+The [`CustomEvaluationDimension`](/python_API/database/evaluation_dimensions) is a class that can be used to create a custom evaluation dimension.
+There are four parameters:
+- name: the name of the dimension
+- description: the description of the dimension
+- range_low: the minimum score of the dimension (should be an integer)
+- range_high: the maximum score of the dimension (should be an integer)
+
+### CustomEvaluationDimensionList
+The [`CustomEvaluationDimensionList`](/python_API/database/evaluation_dimensions) is a class that can be used to create a custom evaluation dimension list based on the existing dimensions. It helps one to group multiple dimensions together for a specific use case.
+There are two parameters:
+- name: the name of the dimension list
+- dimension_pks: the primary keys of the dimensions in the dimension list
+
+### EvaluationDimensionBuilder
+The [`EvaluationDimensionBuilder`](/python_API/database/evaluation_dimensions) is a class that can be used to generate a custom evaluation dimension model based on the existing dimensions.
+
+
+## Usage
+### Initialize the database
+The default evaluation metric is still `SotopiaDimensions` in `sotopia.env.evaluators`.There is no `CustomEvaluationDimension` in the database by default. To initialize the database, please refer to `examples/use_custom_dimensions.py`.
+
+
+### Use the custom evaluation dimensions
+After you initialize your customized evaluation dimensions, you can choose to use any one of these methods provided below:
+
+#### Method 1: Choose dimensions by names
+```python
+evaluation_dimensions = (
+    EvaluationDimensionBuilder.select_existing_dimension_model_by_name(
+        ["transactivity", "verbal_equity"]
+    )
+)
+```
+
+#### Method 2: Directly choose the grouped evaluation dimension list
+```python
+evaluation_dimensions = (
+    EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
+        "sotopia"
+    )
+)
+```
+
+#### Method 3: Build a custom evaluation dimension model temporarily
+We provide multiple ways to build a custom evaluation dimension model with `EvaluationDimensionBuilder`, specifically:
+- `generate_dimension_model`: build an evaluation dimension from existing dimension primary keys.
+- `generate_dimension_model_from_dict`: build an evaluation dimension from a dictionary that specifies the parameters of the `CustomEvaluationDimension`. For example
+```json
+[
+    {
+        "name": "believability",
+        "description": "The believability of the interaction",
+        "range_low": 0,
+        "range_high": 10
+    },
+    ...
+]
+```
+- `select_existing_dimension_model_by_name`: build an evaluation dimension from existing dimension names. For example `['believability', 'goal']`
+- `select_existing_dimension_model_by_list_name`: build an evaluation dimension from existing `CustomEvaluationDimensionList` list names. For example, directly use `sotopia`.
+
+
+After you get the evaluation dimension model, you can pass it as a parameter for the `Evaluator`, for example,
+```python
+evaluation_dimensions = (
+    EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
+        "sotopia"
+    )
+)
+terminal_evaluators=[
+    ReachGoalLLMEvaluator(
+        model_names["env"],
+        EvaluationForTwoAgents[evaluation_dimensions],  # type: ignore
+    ),
+],
+```
@@ -133,7 +133,7 @@ Please refer to [Dev Containers](https://containers.dev/supporting#editors) to s
 
 You can also set up the development environment without Dev Containers. There are three things you will need to set up manually:
 
-- Python and uv: Please start from an environment supporting Python 3.10+ and install uv using `pip install uv; uv sync --all-extra`.
+- Python and uv: Please start from an environment supporting Python 3.10+ and install uv using `pip install uv; uv sync --all-extras`. (Note that this will install all the extra dependencies)
 - Redis: Please refer to introduction page for the set up of Redis.
 - Local LLM (optional): If you don't have access to model endpoints (e.g. OpenAI, Anthropic or others), you can use a local model. You can use Ollama, Llama.cpp,  vLLM or many others which support OpenAI compatible endpoints.
 
 
@@ -0,0 +1,6 @@
+# Deploy Sotopia Python API to Modal
+We offer a script to deploy Sotopia Python API to [Modal](https://modal.com/).
+To do so, simply go to the `sotopia/sotopia/ui` directory and run the following command:
+```bash
+modal deploy sotopia/ui/modal_api_server.py
+```
@@ -117,7 +117,7 @@ export REDIS_OM_URL="redis://localhost:6379"
           ```
           if you are developing Sotopia using uv, you can sync your dependency with
           ```bash
-          uv sync --extra examples --extra chat
+          uv sync --extra examples --extra api
           ```
         </AccordionContent>
       </AccordionItem>
@@ -144,13 +144,18 @@ or manual setup:
       <AccordionItem value="item-1">
         <AccordionTrigger>Docker is my thing.</AccordionTrigger>
         <AccordionContent>
-        Please follow the [instruction](https://redis.io/docs/stack/get-started/install/docker/) to start a redis-stack server or use an existing server. You can also check [Q&A](/docs/troubleshooting.md) to initiate the redis server with the Sotopia data.
+        Please follow the [instruction](https://redis.io/docs/stack/get-started/install/docker/) to start a redis-stack server or use an existing server. If you want to use the existing data in Sotopia, you can download the `dump.rdb` file from [here](https://cmu.box.com/shared/static/xiivc5z8rnmi1zr6vmk1ohxslylvynur). Feel free to check more datasets related to Sotopia [here](https://huggingface.co/collections/cmu-lti/sotopia-65f312c1bd04a8c4a9225e5b).
+
+        After downloading the `dump.rdb` file, make a `redis-data` folder in an desired `<your_path>` directory. And then you can start the server with the following command:
+        ```bash
+        docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 -v <your_path>/redis-data:/data/ redis/redis-stack:latest
+        ```
 
         The `REDIS_OM_URL` need to be set before loading and saving agents:
         ```bash
         conda env config vars set REDIS_OM_URL="redis://user:password@host:port"
         ```
-        </AccordionContent>
+      </AccordionContent>
       </AccordionItem>
       <AccordionItem value="item-2">
         <AccordionTrigger>No, I don't want to use Docker.</AccordionTrigger>
 
@@ -0,0 +1,54 @@
+# `evaluation_dimensions.py`
+
+This module provides classes and utilities for defining and managing custom evaluation dimensions within the Sotopia environment. It includes classes for individual dimensions, lists of dimensions, and a builder for creating dimension models.
+
+## Classes
+
+### `CustomEvaluationDimension`
+
+Represents a custom evaluation dimension with specific attributes such as name, description, and score range.
+
+#### Attributes
+- `name`: `str`. The name of the dimension.
+- `description`: `str`. A brief description of the dimension.
+- `range_low`: `int`. The minimum score for the dimension.
+- `range_high`: `int`. The maximum score for the dimension.
+
+### `CustomEvaluationDimensionList`
+
+Groups multiple custom evaluation dimensions together.
+
+#### Attributes
+- `name`: `str`. The name of the dimension list.
+- `dimension_pks`: `list[str]`. A list of primary keys for the dimensions included in the list.
+
+### `EvaluationDimensionBuilder`
+
+Provides utility methods to create and manage evaluation dimension models.
+
+#### Methods
+- `create_range_validator(low: int, high: int)`: Creates a validator for score ranges.
+
+  **Arguments:**
+  - `low`: `int`. The minimum score allowed.
+  - `high`: `int`. The maximum score allowed.
+
+- `build_dimension_model(dimension_ids: list[str])`: Builds a dimension model from primary keys.
+
+  **Arguments:**
+  - `dimension_ids`: `list[str]`. A list of dimension primary keys.
+
+- `build_dimension_model_from_dict(dimensions: list[dict[str, Union[str, int]]])`: Builds a dimension model from a dictionary.
+
+  **Arguments:**
+  - `dimensions`: `list[dict[str, Union[str, int]]]`. A list of dictionaries specifying dimension attributes.
+
+- `select_existing_dimension_model_by_name(dimension_names: list[str])`: Selects a dimension model by dimension names.
+
+  **Arguments:**
+  - `dimension_names`: `list[str]`. A list of dimension names.
+
+- `select_existing_dimension_model_by_list_name(list_name: str)`: Selects a dimension model by list name.
+
+  **Arguments:**
+  - `list_name`: `str`. The name of the dimension list.
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-uv run --extra test --extra chat pytest --ignore tests/cli --cov=. --cov-report=xml`
	`1`	`+uv run --extra test --extra api pytest --ignore tests/cli --cov=. --cov-report=xml`