TESTING ONLY (TPU test) #21528

sachinprasadhs · 2025-07-29T22:49:24Z

Draft setup, testing only.

… configs

… hosted TPU based runner

updated build file path

Updated tpu_build job of actions.yml with specific runner label

Added container section

…tpu.txt

Progress bar would always report the starting batch + 1 at the end of the batch. Now it takes into account `steps_per_execution` for the last batch reported. Fixes keras-team#20861

Using `keras.ops.math.logsumexp` with an int for `axis` in a functional model would throw an error.

…eras-team#21429) Arbitrary functions and classes are not allowed. - Made `Operation` extend `KerasSaveable`, this required moving imports to avoid circular imports - `Layer` no longer need to extend `KerasSaveable` directly - Made feature space `Cross` and `Feature` extend `KerasSaveable` - Also dissallow public function `enable_unsafe_deserialization`

…developed dtypes_new_test.py to use requires_tpu marker

…inding

…workflow

…workflow and added a step to install docker client

…ames (keras-team#21526) * [OpenVINO backend] fix openvino model exported names to match keras names * Update keras/src/export/openvino.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update keras/src/export/openvino.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist

Summary of Changes

Hello @sachinprasadhs, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on establishing and verifying the infrastructure for running Keras tests on Tensor Processing Units (TPUs). I've set up a Dockerized environment, integrated TPU initialization into our testing framework, and introduced new and modified existing tests to specifically validate dtype behavior on TPUs. Additionally, I've updated dependency configurations to support these new testing capabilities. There's also a minor, unrelated enhancement to the OpenVINO export utility.

Highlights

TPU Test Environment Setup: I've introduced a new Dockerfile to create a standardized environment for running Keras tests on TPUs, ensuring consistent setup across different testing scenarios.
Pytest TPU Integration: I've integrated robust TPU initialization and management directly into our pytest framework via conftest.py fixtures, making it easier to run and manage TPU-dependent tests.
TPU-Specific Dtype Testing: I've added and modified several dtype-related tests to specifically run on TPUs, verifying that Keras's type promotion rules behave as expected in a distributed TPU context.
Dependency Management for TPU: I've updated our dependency management by adding new requirements files tailored for JAX and TensorFlow TPU environments, streamlining the setup for these specialized configurations.
OpenVINO Export Enhancement: I've also included an unrelated change to the OpenVINO export functionality, adding the ability to set friendly names for model inputs during conversion.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request sets up testing on TPUs. It introduces several new test files and fixtures for this purpose. My main feedback is to consolidate the TPU testing strategy. Currently, there are multiple approaches for TPU initialization (in conftest.py, dtypes_test.py, dtypes_TPU_test.py), leading to duplicated and sometimes inconsistent code. Using the pytest fixture from conftest.py seems like the most robust and reusable approach. Additionally, there are several hardcoded values like the TPU name that should be made configurable. Finally, some generated files like logs and test lists seem to have been accidentally committed and should be removed.

gemini-code-assist · 2025-07-29T22:51:31Z

keras/src/backend/common/dtypes_test.py

+os.environ["KERAS_BACKEND"] = "tensorflow" # Moved to test_case module in Keras
+
+# Set TPU_NAME if connecting to a specific TPU worker
+os.environ["TPU_NAME"] = "harshith-tf-4"
+# JAX_PLATFORMS is typically for JAX-specific environments, not directly for TF/Keras TPU.
+os.environ["JAX_PLATFORMS"] = ""


Setting environment variables at the module level can lead to side effects that are hard to debug, as it affects the entire test session globally. It's better to configure these settings outside the test suite or, if necessary, within a test setup method using a context manager or monkeypatch to isolate the changes.

gemini-code-assist · 2025-07-29T22:51:31Z

conftest.py

+    try:
+        tf.config.experimental_disconnect_from_cluster()
+    except:
+        pass


Using a bare except: can catch system-exiting exceptions like SystemExit or KeyboardInterrupt, making it harder to debug or stop the program. It's better to catch Exception to handle standard errors while letting system-level exceptions propagate.

Suggested change

try:

tf.config.experimental_disconnect_from_cluster()

except:

pass

try:

tf.config.experimental_disconnect_from_cluster()

except Exception:

pass

gemini-code-assist · 2025-07-29T22:51:31Z

conftest.py

+    try:
+        tf.config.experimental_reset_memory_stats("TPU_SYSTEM")
+    except:
+        pass


Using a bare except: can catch system-exiting exceptions like SystemExit or KeyboardInterrupt, making it harder to debug or stop the program. It's better to catch Exception to handle standard errors while letting system-level exceptions propagate.

Suggested change

try:

tf.config.experimental_reset_memory_stats("TPU_SYSTEM")

except:

pass

try:

tf.config.experimental_reset_memory_stats("TPU_SYSTEM")

except Exception:

pass

gemini-code-assist · 2025-07-29T22:51:31Z

conftest.py

+
+    import tensorflow as tf
+
+    os.environ["TPU_NAME"] = "harshith-tf-4"


The TPU name is hardcoded. This makes the tests less portable and harder to run in different environments. This should be configured via an environment variable set outside the test code, which TPUClusterResolver can then read automatically.

gemini-code-assist · 2025-07-29T22:51:32Z

keras/src/backend/common/dtypes_test.py

+    @classmethod
+    def setUpClass(cls):
+        super().setUpClass()
+        cls.tpu_available = False
+        cls.tpu_strategy = None
+
+        # Only attempt TPU initialization if the Keras backend is TensorFlow
+        if backend.backend() == "tensorflow":
+            print("\nAttempting TPU initialization from DtypesTest.setUpClass...")
+            try:
+                # Use empty string '' for auto-detection or 'grpc://<ip_address>:8470'
+                # or your specific TPU_NAME from env var
+                resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
+                tf.config.experimental_connect_to_cluster(resolver)
+                tf.tpu.experimental.initialize_tpu_system(resolver)
+                cls.tpu_strategy = tf.distribute.TPUStrategy(resolver)
+                cls.tpu_available = True
+                print("✓ TPU initialization successful from DtypesTest.setUpClass!")
+                print(f"Number of TPU devices: {cls.tpu_strategy.num_replicas_in_sync}")
+                print(f"Logical TPU devices: {tf.config.list_logical_devices('TPU')}")
+            except Exception as e:
+                print(f"✗ TPU initialization failed from DtypesTest.setUpClass: {e}")
+                print("Falling back to CPU/GPU testing for this class.")
+                cls.tpu_available = False
+        else:
+            print(f"Skipping TPU initialization for backend: {backend.backend()}")


This setUpClass method duplicates the TPU initialization logic. This is the third implementation of it in this PR. It's important to have a single, reliable way to set up the TPU environment for tests to avoid inconsistencies and maintenance overhead. The pytest fixture in conftest.py is the recommended approach.

gemini-code-assist · 2025-07-29T22:51:32Z

keras/src/backend/common/dtypes_TPU_test.py

+# Ensure the backend is set to TensorFlow
+os.environ["KERAS_BACKEND"] = "tensorflow"
+
+os.environ["TPU_NAME"] = "harshith-tf-4"


The TPU name is hardcoded. This makes the tests less portable and harder to run in different environments. This should be configured via an environment variable set outside the test code.

gemini-code-assist · 2025-07-29T22:51:32Z

keras/src/backend/common/dtypes_TPU_test.py

+    @classmethod
+    def setUpClass(cls):
+        super().setUpClass()
+        cls.tpu_available = False
+        cls.tpu_strategy = None
+
+        try:
+            resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
+            tf.config.experimental_connect_to_cluster(resolver)
+            tf.tpu.experimental.initialize_tpu_system(resolver)
+            cls.tpu_strategy = tf.distribute.TPUStrategy(resolver)
+            cls.tpu_available = True
+            print("✓ TPU initialization successful!")
+            print(f"Number of TPU devices: {cls.tpu_strategy.num_replicas_in_sync}")
+            print(f"Logical TPU devices: {tf.config.list_logical_devices('TPU')}")
+        except Exception as e:
+            print(f"✗ TPU initialization failed: {e}")
+            print("Falling back to CPU/GPU testing")
+            cls.tpu_available = False


This setUpClass method duplicates the TPU initialization logic that is also present in conftest.py and keras/src/backend/common/dtypes_test.py. The logic here is also less robust as it lacks the retry mechanism. To keep the code DRY and maintainable, it would be better to consolidate on a single TPU setup mechanism, preferably the pytest fixture defined in conftest.py, and use it across all TPU tests with the @pytest.mark.requires_tpu marker.

gemini-code-assist · 2025-07-29T22:51:32Z

keras/src/backend/common/dtypes_test_TPU.py

+from keras.src.testing import test_case
+from keras.src.testing.test_utils import named_product
+
+os.environ["TPU_NAME"] = "harshith-tf-4"


The TPU name is hardcoded. This makes the tests less portable and harder to run in different environments. This should be configured via an environment variable set outside the test code.

codecov-commenter · 2025-07-29T23:25:40Z

Codecov Report

❌ Patch coverage is 20.25316% with 63 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.28%. Comparing base (6bc6203) to head (2acd44c).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
keras/src/backend/common/dtypes_test_TPU.py	0.00%	60 Missing ⚠️
keras/src/export/openvino.py	84.21%	2 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (6bc6203) and HEAD (2acd44c). Click for more details.

HEAD has 7 uploads less than BASE

Flag BASE (6bc6203) HEAD (2acd44c)

keras 5 2

keras-openvino 1 0

keras-numpy 1 0

keras-torch 1 0

keras-jax 1 0

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #21528       +/-   ##
===========================================
- Coverage   82.72%   64.28%   -18.44%     
===========================================
  Files         567      568        +1     
  Lines       56245    56324       +79     
  Branches     8790     8802       +12     
===========================================
- Hits        46527    36210    -10317     
- Misses       7561    18215    +10654     
+ Partials     2157     1899      -258

Flag	Coverage Δ
keras	`64.28% <20.25%> (-18.24%)`	⬇️
keras-jax	`?`
keras-numpy	`?`
keras-openvino	`?`
keras-tensorflow	`64.28% <20.25%> (-0.07%)`	⬇️
keras-torch	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kharshith-k and others added 30 commits June 16, 2025 15:23

added requirements-tensorflow-tpu.txt and tpu configuration in .kokoro

1d7c685

updated .kokoro/github/ubuntu/tpu/build.sh with jax and torch backend…

19b5e6b

… configs

Merge branch 'keras-team:master' into tf-tpu

d203ca3

Changed the tpu CI config files path to .github from .kokoro

f45e5d0

Added new job in .github/workflows/actions.yml to run TPU tests

6771cc0

fixed runs-on option in acvtions.yml for tpu_build job to run on self…

87d36e7

… hosted TPU based runner

Added another runner in the actions TPU job

9901298

Update continuous.cfg

be97210

updated build file path

Update presubmit.cfg

a1cd5c3

updated build file path

Merge branch 'keras-team:master' into tf-tpu

c5e3a5c

Update actions.yml

f0ab676

Updated tpu_build job of actions.yml with specific runner label

Developed Dockerfile for TPU build job in actions.yml

09161d7

Merge branch 'keras-team:master' into tf-tpu

9a3948f

Update actions.yml

058fdff

Added container section

Included few more runners in tpu_build job

d47e39e

Merge branch 'keras-team:master' into tf-tpu

a6a59d7

Using linux-x86-ct6e-44-1tpu

ba4f6ae

Modified requirement-commmon.txt and updated requirements-tensorflow-…

a5a3624

…tpu.txt

Added Dtypes_TPU_tests.py and requirements-jax-tpu.txt

b9998af

Progress bar now handles steps_per_execution. (keras-team#21422)

f68be97

Progress bar would always report the starting batch + 1 at the end of the batch. Now it takes into account `steps_per_execution` for the last batch reported. Fixes keras-team#20861

Fix symbolic call of logsumexp with int axis. (keras-team#21428)

1018abf

Using `keras.ops.math.logsumexp` with an int for `axis` in a functional model would throw an error.

commented tensorflow deps

cb639c5

Added log of dtypes_test_tpu.py and the test script for the same

c0d1743

modified dtypes_test_tpu.py as per pre-commit standards

306e6e7

Added TPU initiaization and teardown functionalities in conftest.py, …

4e584fc

…developed dtypes_new_test.py to use requires_tpu marker

Added dtypes_test_TPU.py and dtypes_new_test.py, modified conftest.py

bb09e95

Added Dcokerfile and tests list command

8a63d09

Updated Dockerfile

4651454

Restored Dockerfile to previous changes

40af241

kharshith-k and others added 22 commits July 28, 2025 13:29

updated actions.yml file to include container option without volume b…

1c307fc

…inding

updated actions.yml file to change TPU

693886b

Updated container path in build-and-test-on-tpu job

e74b851

seperated TPU workflow from actions.yml

d31b3c4

updated trigger condition for TPU tests workflow

a70d19e

updated container usage configuration for TPU tests workflow

5f5b609

updated env vars for TPU tests workflow

72e729f

updated env vars parsing syntax in TPU tests workflow

e129299

updated env vars syntax in TPU tests workflow

3fe5b57

updated env vars syntax in TPU tests workflow

10df307

updated env vars syntax in TPU tests workflow

dd21e09

updated env vars syntax in TPU tests workflow

328628f

updated image name in TPU tests workflow

01f0c17

updated image name with generic ubuntu image

3e41c37

updated tpu-tests to use ghcr

5e55c2c

updated tpu-tests to store built image as local tar

ea9ff88

updated image name from ubuntu:22.04 to docker:24.0-cli in tpu tests …

6d92aa9

…workflow

updated image name from docker:24.0-cli to ubuntu:22.04 in tpu tests …

3c75bf8

…workflow and added a step to install docker client

added volume mount from host in load-and-test-job

1589a75

Merge branch 'keras-team:master' into tf-tpu

36bd682

Reverted tpu-tests.yml to version using ghcr.io for image storage

04112cf

google-ml-butler bot added the size:XL label Jul 29, 2025

google-ml-butler bot assigned gbaned Jul 29, 2025

gemini-code-assist bot reviewed Jul 29, 2025

View reviewed changes

sachinprasadhs closed this Jul 29, 2025

sachinprasadhs reopened this Jul 29, 2025

sachinprasadhs marked this pull request as draft July 29, 2025 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TESTING ONLY (TPU test) #21528

TESTING ONLY (TPU test) #21528

Uh oh!

sachinprasadhs commented Jul 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

codecov-commenter commented Jul 29, 2025

Uh oh!

Uh oh!


		import tensorflow as tf

		os.environ["TPU_NAME"] = "harshith-tf-4"

TESTING ONLY (TPU test) #21528

Are you sure you want to change the base?

TESTING ONLY (TPU test) #21528

Uh oh!

Conversation

sachinprasadhs commented Jul 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jul 29, 2025

Codecov Report

Uh oh!

Uh oh!