[torchx/local_scheduler] Set CUDA_VISIBLE_DEVICES correctly when running distributed job on local GPU

## 🐛 Bug
Running DDP on a devgpu with 4 GPUs with `--nprocs_per_node=2` and `--nnodes=2` does not work when the script uses `LOCAL_RANK` to set the cuda device.
```
torchx run dist.ddp -j 2x2
```

Module (check all that applies):
 * [ ] `torchx.spec`
 * [ ] `torchx.component`
 * [ ] `torchx.apps`
 * [ ] `torchx.runtime`
 * [ ] `torchx.cli`
 * [x] `torchx.schedulers`
 * [ ] `torchx.pipelines`
 * [ ] `torchx.aws`
 * [ ] `torchx.examples`
 * [ ] `other`


## To Reproduce

See description above, easily repros with a training script:

```
if __name__ == "__main__":
    torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
```

try running the above with

```
torchx run dist.ddp -j 2x2 main.py
```

## Expected behavior
TorchX local scheduler should set CUDA_VISIBLE_DEVICE=0,1 on the first two workers, and CUDA_VISIBLE_DEVICE=2,3 on the next two workers.

## Environment

 - torchx version (e.g. 0.1.0rc1):
 - Python version:
 - OS (e.g., Linux):
 - How you installed torchx (`conda`, `pip`, source, `docker`):
 - Docker image and tag (if using docker):
 - Git commit (if installed from source):
 - Execution environment (on-prem, AWS, GCP, Azure etc):
 - Any other relevant information:

## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[torchx/local_scheduler] Set CUDA_VISIBLE_DEVICES correctly when running distributed job on local GPU #377

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[torchx/local_scheduler] Set CUDA_VISIBLE_DEVICES correctly when running distributed job on local GPU #377

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions