Skip to content

Conversation

mshukor
Copy link
Contributor

@mshukor mshukor commented Dec 8, 2024

What this does

Based on this PR. It includes:

  • The ability to keep training without accelerate
  • Updated to the recent main
  • Some minor fixes

Note: we still need to merge with vla branch before merging

How it was tested

ENV=aloha
ENV_TASK=AlohaTransferCube-v0
dataset_repo_id=lerobot/aloha_sim_transfer_cube_human
policy=act
LR=1e-5
LR_SCHEDULER=
USE_AMP=false
ASYNC_ENV=false

GPUS=2
EVAL_FREQ=10000 #51000 #10000 51000
OFFLINE_STEPS=100000 #25000 17000 12500 50000
TRAIN_BATCH_SIZE=4 # global batch size / num of gpus
EVAL_BATCH_SIZE=50

TASK_NAME=lerobot_${ENV}_transfer_cube_${policy}_2gpus

python -m accelerate.commands.launch --num_processes=$GPUS --mixed_precision=fp16 lerobot/scripts/train.py \
 hydra.job.name=base_distributed_aloha_transfer_cube \
 hydra.run.dir=/data/mshukor/logs/lerobot/${TASK_NAME} \
 dataset_repo_id=$dataset_repo_id \
 policy=$policy \
 env=$ENV env.task=$ENV_TASK \
 training.offline_steps=$OFFLINE_STEPS training.batch_size=$TRAIN_BATCH_SIZE \
 training.eval_freq=$EVAL_FREQ eval.n_episodes=50 eval.use_async_envs=$ASYNC_ENV eval.batch_size=$EVAL_BATCH_SIZE \
 training.lr_scheduler=$LR_SCHEDULER training.lr=$LR \
 wandb.enable=true 

Cadene and others added 30 commits October 3, 2024 17:05
Co-authored-by: jess-moss <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.