Skip to content

Feature implementation from commits edfebd5..ce6a26d #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: feature-base-branch-2
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ outputs

# VS Code
.vscode
.devcontainer

# HPC
nautilus/*.yaml
Expand Down
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,19 @@ Additionally, if you are using any of the particular policy architecture, pretra
year={2024}
}
```


- [HIL-SERL](https://hil-serl.github.io/)
```bibtex
@Article{luo2024hilserl,
title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
author={Jianlan Luo and Charles Xu and Jeffrey Wu and Sergey Levine},
year={2024},
eprint={2410.21845},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
```
## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=huggingface/lerobot&type=Timeline)](https://star-history.com/#huggingface/lerobot&Timeline)
20 changes: 18 additions & 2 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,23 @@
title: Installation
title: Get started
- sections:
- local: getting_started_real_world_robot
title: Getting Started with Real-World Robots
- local: il_robots
title: Imitation Learning for Robots
- local: il_sim
title: Imitation Learning in Sim
- local: cameras
title: Cameras
- local: integrate_hardware
title: Bring Your Own Hardware
- local: hilserl
title: Train a Robot with RL
- local: hilserl_sim
title: Train RL in Simulation
title: "Tutorials"
- sections:
- local: smolvla
title: Finetune SmolVLA
title: "Policies"
- sections:
- local: so101
title: SO-101
Expand All @@ -20,6 +32,10 @@
- local: lekiwi
title: LeKiwi
title: "Robots"
- sections:
- local: notebooks
title: Notebooks
title: "Resources"
- sections:
- local: contributing
title: Contribute to LeRobot
Expand Down
547 changes: 547 additions & 0 deletions docs/source/hilserl.mdx

Large diffs are not rendered by default.

120 changes: 120 additions & 0 deletions docs/source/hilserl_sim.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Train RL in Simulation

This guide explains how to use the `gym_hil` simulation environments as an alternative to real robots when working with the LeRobot framework for Human-In-the-Loop (HIL) reinforcement learning.

`gym_hil` is a package that provides Gymnasium-compatible simulation environments specifically designed for Human-In-the-Loop reinforcement learning. These environments allow you to:

- Train policies in simulation to test the RL stack before training on real robots

- Collect demonstrations in sim using external devices like gamepads or keyboards
- Perform human interventions during policy learning

Currently, the main environment is a Franka Panda robot simulation based on MuJoCo, with tasks like picking up a cube.


## Installation

First, install the `gym_hil` package within the LeRobot environment:

```bash
pip install -e ".[hilserl]"
```

## What do I need?

- A gamepad or keyboard to control the robot
- A Nvidia GPU



## Configuration

To use `gym_hil` with LeRobot, you need to create a configuration file. An example is provided [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/gym_hil_env.json). Key configuration sections include:

### Environment Type and Task

```json
{
"type": "hil",
"name": "franka_sim",
"task": "PandaPickCubeGamepad-v0",
"device": "cuda"
}
```

Available tasks:
- `PandaPickCubeBase-v0`: Basic environment
- `PandaPickCubeGamepad-v0`: With gamepad control
- `PandaPickCubeKeyboard-v0`: With keyboard control

### Gym Wrappers Configuration

```json
"wrapper": {
"gripper_penalty": -0.02,
"control_time_s": 15.0,
"use_gripper": true,
"fixed_reset_joint_positions": [0.0, 0.195, 0.0, -2.43, 0.0, 2.62, 0.785],
"end_effector_step_sizes": {
"x": 0.025,
"y": 0.025,
"z": 0.025
},
"control_mode": "gamepad"
}
```

Important parameters:
- `gripper_penalty`: Penalty for excessive gripper movement
- `use_gripper`: Whether to enable gripper control
- `end_effector_step_sizes`: Size of the steps in the x,y,z axes of the end-effector
- `control_mode`: Set to `"gamepad"` to use a gamepad controller

## Running with HIL RL of LeRobot

### Basic Usage

To run the environment, set mode to null:

```python
python lerobot/scripts/rl/gym_manipulator.py --config_path path/to/gym_hil_env.json
```

### Recording a Dataset

To collect a dataset, set the mode to `record` whilst defining the repo_id and number of episodes to record:

```python
python lerobot/scripts/rl/gym_manipulator.py --config_path path/to/gym_hil_env.json
```

### Training a Policy

To train a policy, checkout the configuration example available [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/train_gym_hil_env.json) and run the actor and learner servers:

```python
python lerobot/scripts/rl/actor.py --config_path path/to/train_gym_hil_env.json
```

In a different terminal, run the learner server:

```python
python lerobot/scripts/rl/learner.py --config_path path/to/train_gym_hil_env.json
```

The simulation environment provides a safe and repeatable way to develop and test your Human-In-the-Loop reinforcement learning components before deploying to real robots.

Congrats 🎉, you have finished this tutorial!

> [!TIP]
> If you have any questions or need help, please reach out on [Discord](https://discord.com/invite/s3KuuzsPFb).

Paper citation:
```
@article{luo2024precise,
title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
author={Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey},
journal={arXiv preprint arXiv:2410.21845},
year={2024}
}
```
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Getting Started with Real-World Robots
# Imitation Learning on Real-World Robots

This tutorial will explain how to train a neural network to control a real robot autonomously.

Expand Down Expand Up @@ -273,6 +273,9 @@ python lerobot/scripts/train.py \
--resume=true
```

#### Train using Collab
If your local computer doesn't have a powerful GPU you could utilize Google Collab to train your model by following the [ACT training notebook](./notebooks#training-act).

#### Upload policy checkpoints

Once training is done, upload the latest checkpoint with:
Expand All @@ -297,12 +300,13 @@ python -m lerobot.record \
--robot.port=/dev/ttyACM1 \
--robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
--robot.id=my_awesome_follower_arm \
--teleop.type=so100_leader \
--teleop.port=/dev/ttyACM0 \
--teleop.id=my_awesome_leader_arm \
--display_data=false \
--dataset.repo_id=$HF_USER/eval_so100 \
--dataset.single_task="Put lego brick into the transparent box" \
# <- Teleop optional if you want to teleoperate in between episodes \
# --teleop.type=so100_leader \
# --teleop.port=/dev/ttyACM0 \
# --teleop.id=my_awesome_leader_arm \
--policy.path=${HF_USER}/my_policy
```

Expand Down
152 changes: 152 additions & 0 deletions docs/source/il_sim.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Imitation Learning in Sim

This tutorial will explain how to train a neural network to control a robot in simulation with imitation learning.

**You'll learn:**
1. How to record a dataset in simulation with [gym-hil](https://github.com/huggingface/gym-hil) and visualize the dataset.
2. How to train a policy using your data.
3. How to evaluate your policy in simulation and visualize the results.

For the simulation environment we use the same [repo](https://github.com/huggingface/gym-hil) that is also being used by the Human-In-the-Loop (HIL) reinforcement learning algorithm.
This environment is based on [MuJoCo](https://mujoco.org) and allows you to record datasets in LeRobotDataset format.
Teleoperation is easiest with a controller like the Logitech F710, but you can also use your keyboard if you are up for the challenge.

## Installation

First, install the `gym_hil` package within the LeRobot environment, go to your LeRobot folder and run this command:

```bash
pip install -e ".[hilserl]"
```

## Teleoperate and Record a Dataset

To use `gym_hil` with LeRobot, you need to use a configuration file. An example config file can be found [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/env_config_gym_hil_il.json).

To teleoperate and collect a dataset, we need to modify this config file and you should add your `repo_id` here: `"repo_id": "il_gym",` and `"num_episodes": 30,` and make sure you set `mode` to `record`, "mode": "record".

If you do not have a Nvidia GPU also change `"device": "cuda"` parameter in the config file (for example to `mps` for MacOS).

By default the config file assumes you use a controller. To use your keyboard please change the envoirment specified at `"task"` in the config file and set it to `"PandaPickCubeKeyboard-v0"`.

Then we can run this command to start:

<hfoptions id="teleop_sim">
<hfoption id="Linux">

```bash
python lerobot/scripts/rl/gym_manipulator.py --config_path path/to/env_config_gym_hil_il.json
```

</hfoption>
<hfoption id="MacOS">

```bash
mjpython lerobot/scripts/rl/gym_manipulator.py --config_path path/to/env_config_gym_hil_il.json
```

</hfoption>
</hfoptions>

Once rendered you can teleoperate the robot with the gamepad or keyboard, below you can find the gamepad/keyboard controls.

Note that to teleoperate the robot you have to hold the "Human Take Over Pause Policy" Button `RB` to enable control!

**Gamepad Controls**

<p align="center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/gamepad_guide.jpg?raw=true" alt="Figure shows the control mappings on a Logitech gamepad." title="Gamepad Control Mapping" width="100%"></img>
</p>
<p align="center"><i>Gamepad button mapping for robot control and episode management</i></p>

**Keyboard controls**

For keyboard controls use the `spacebar` to enable control and the following keys to move the robot:
```bash
Arrow keys: Move in X-Y plane
Shift and Shift_R: Move in Z axis
Right Ctrl and Left Ctrl: Open and close gripper
ESC: Exit
```

## Visualize a dataset

If you uploaded your dataset to the hub you can [visualize your dataset online](https://huggingface.co/spaces/lerobot/visualize_dataset) by copy pasting your repo id.

<p align="center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/dataset_visualizer_sim.png" alt="Figure shows the dataset visualizer" title="Dataset visualization" width="100%"></img>
</p>
<p align="center"><i>Dataset visualizer</i></p>


## Train a policy

To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
```bash
python lerobot/scripts/train.py \
--dataset.repo_id=${HF_USER}/il_gym \
--policy.type=act \
--output_dir=outputs/train/il_sim_test \
--job_name=il_sim_test \
--policy.device=cuda \
--wandb.enable=true
```

Let's explain the command:
1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/il_gym`.
2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor states, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset.
4. We provided `policy.device=cuda` since we are training on a Nvidia GPU, but you could use `policy.device=mps` to train on Apple silicon.
5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`.

Training should take several hours, 100k steps (which is the default) will take about 1h on Nvidia A100. You will find checkpoints in `outputs/train/il_sim_test/checkpoints`.

#### Train using Collab
If your local computer doesn't have a powerful GPU you could utilize Google Collab to train your model by following the [ACT training notebook](./notebooks#training-act).

#### Upload policy checkpoints

Once training is done, upload the latest checkpoint with:
```bash
huggingface-cli upload ${HF_USER}/il_sim_test \
outputs/train/il_sim_test/checkpoints/last/pretrained_model
```

You can also upload intermediate checkpoints with:
```bash
CKPT=010000
huggingface-cli upload ${HF_USER}/il_sim_test${CKPT} \
outputs/train/il_sim_test/checkpoints/${CKPT}/pretrained_model
```

## Evaluate your policy in Sim

To evaluate your policy we have to use the config file that can be found [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/eval_config_gym_hil.json).

Make sure to replace the `repo_id` with the dataset you trained on, for example `pepijn223/il_sim_dataset` and replace the `pretrained_policy_name_or_path` with your model id, for example `pepijn223/il_sim_model`

Then you can run this command to visualize your trained policy

<hfoptions id="eval_policy">
<hfoption id="Linux">

```bash
python lerobot/scripts/rl/eval_policy.py --config_path=path/to/eval_config_gym_hil.json
```

</hfoption>
<hfoption id="MacOS">

```bash
mjpython lerobot/scripts/rl/eval_policy.py --config_path=path/to/eval_config_gym_hil.json
```

</hfoption>
</hfoptions>

> [!WARNING]
> While the main workflow of training ACT in simulation is straightforward, there is significant room for exploring how to set up the task, define the initial state of the environment, and determine the type of data required during collection to learn the most effective policy. If your trained policy doesn't perform well, investigate the quality of the dataset it was trained on using our visualizers, as well as the action values and various hyperparameters related to ACT and the simulation.

Congrats 🎉, you have finished this tutorial. If you want to continue with using LeRobot in simulation follow this [Tutorial on reinforcement learning in sim with HIL-SERL](https://huggingface.co/docs/lerobot/hilserl_sim)

> [!TIP]
> If you have any questions or need help, please reach out on [Discord](https://discord.com/invite/s3KuuzsPFb).
2 changes: 2 additions & 0 deletions docs/source/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,5 @@ To use [Weights and Biases](https://docs.wandb.ai/quickstart) for experiment tra
```bash
wandb login
```

You can now assemble your robot if it's not ready yet, look for your robot type on the left. Then follow the link below to use Lerobot with your robot.
Loading