[CI] Build and test rocm Python packages as part of ci.yml

We've had issues like https://github.com/ROCm/TheRock/issues/1347 and https://github.com/ROCm/TheRock/issues/1552 go undetected since we only install the rocm packages we built and run `rocm-sdk test` during PyTorch release builds:
* https://github.com/ROCm/TheRock/blob/bee658b0d7c89e636fa04356711edb5366c8b213/.github/workflows/test_pytorch_wheels.yml#L115-L117
* https://github.com/ROCm/TheRock/blob/bee658b0d7c89e636fa04356711edb5366c8b213/external-builds/pytorch/build_prod_wheels.py#L720-L728

The [`ci.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/ci.yml) workflow that we run on pull requests and merged commits can do more than just build and test native ROCm packages, it could build and test ROCm Python packages too. Looking at these Windows workflows, we could merge them or have one reuse the other:
* https://github.com/ROCm/TheRock/blob/main/.github/workflows/build_windows_packages.yml (for CI)
* https://github.com/ROCm/TheRock/blob/main/.github/workflows/release_windows_packages.yml (for releases)

The build step that is missing is:
```yml
      - name: Build Python Packages
        run: |
          python ./build_tools/build_python_packages.py \
            --artifact-dir=${{ env.BUILD_DIR }}/artifacts \
            --dest-dir=${{ env.BUILD_DIR }}/packages \
            --version=${{ needs.setup_metadata.outputs.version }}
```

See a recent release workflow run for an example of that: https://github.com/ROCm/TheRock/actions/runs/17904146218/job/50902443944#step:15:66. Note that it takes about 7 minutes, after a 1h40m build (62% cache hits)

Beyond just building, we probably want to upload wheels to some sort of dev release bucket and then install them for testing on machines with GPUs, and then also trigger dev PyTorch builds and tests eventually too.

	print("+++ Sanity checking installed torch (unavailable is okay on CPU machines):")
	sanity_check_output = capture(
	[sys.executable, "-c", "import torch; print(torch.cuda.is_available())"],
	cwd=tempfile.gettempdir(),
	)
	if not sanity_check_output:
	raise RuntimeError("torch package sanity check failed (see output above)")
	else:
	print(f"Sanity check output:\n{sanity_check_output}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Build and test rocm Python packages as part of ci.yml #1559

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CI] Build and test rocm Python packages as part of ci.yml #1559

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions