Skip to content

[bazel]Move HAVE_GETAUXVAL from config.h to config.bzl #127637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 18, 2025

Conversation

dklimkin
Copy link
Member

This fixes build errors on mac OS.

@llvmbot llvmbot added the bazel "Peripheral" support tier build system: utils/bazel label Feb 18, 2025
@dklimkin dklimkin merged commit 27fe2c9 into llvm:main Feb 18, 2025
6 checks passed
@dklimkin dklimkin deleted the macfix branch February 18, 2025 14:12
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 18, 2025
copybara-service bot pushed a commit to openxla/shardy that referenced this pull request Feb 18, 2025
copybara-service bot pushed a commit to google/tsl that referenced this pull request Feb 18, 2025
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Feb 18, 2025
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 18, 2025
wldfngrs pushed a commit to wldfngrs/llvm-project that referenced this pull request Feb 19, 2025
saisindhuri91 added a commit to linux-on-ibm-z/tensorflow that referenced this pull request Feb 24, 2025
commit a3df3d5f334210847fdef8a91a5e78cb5625ce6a
Author: Ilia Sergachev <[email protected]>
Date:   Wed Feb 19 04:22:24 2025 -0800

    PR #22830: Fix macOS build.

    Imported from GitHub PR https://github.com/openxla/xla/pull/22830

    This fixes builds on macOS without Xcode (with Clang from Homebrew for example).
    Copybara import of the project:

    --
    4b5afd519ff62e8fe9d013030789edfc56ad6110 by Ilia Sergachev <[email protected]>:

    Update apple_support to 1.18.0.

    --
    9a9c42f6836ce758add28550cf157552a426ae92 by Ilia Sergachev <[email protected]>:

    Add bazel_features 1.25.0.

    Merging this change closes #22830

    PiperOrigin-RevId: 728612172

commit e4814241af383db57176d6509e5851029598f20b
Author: Sergei Lebedev <[email protected]>
Date:   Wed Feb 19 04:20:49 2025 -0800

    [pjrt] Removed unused `PjRtBuffer::CopyToRemoteDeviceScattered` and `ScatterDetails`

    PiperOrigin-RevId: 728611683

commit f326dfa0eb5efee65b34bfcc7601b40514f41800
Author: Sergei Lebedev <[email protected]>
Date:   Wed Feb 19 03:15:08 2025 -0800

    [pjrt] Removed unused `PjRtClient::MakeCrossHostReceiveBufferForGather` and `GatherDetails`

    PiperOrigin-RevId: 728593081

commit bba7fc38d654b7430a1bddfd154639b3043c174c
Author: Aliia Khasanova <[email protected]>
Date:   Wed Feb 19 03:08:32 2025 -0800

    Use custom HLO deserialization for HloUnoptimizedSnapshot.

    This change makes it possible to read HloUnoptimizedSnapshot protos that are over 2GiB in size and were dumped using custom proto serialization.

    PiperOrigin-RevId: 728590991

commit 6433e8ec4fb107a8ba02ea1c57c642d9b2a4abe7
Author: Alexander Belyaev <[email protected]>
Date:   Wed Feb 19 02:40:49 2025 -0800

    [XLA:GPU][TMA] Add an attribute to label func args with the TMA data.

    PiperOrigin-RevId: 728582980

commit a0f83c2e142b4f2c73989f4f9efba2132acd3dd7
Author: A. Unique TensorFlower <[email protected]>
Date:   Wed Feb 19 02:28:51 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728579638

commit 6ee026808f7d0f51e2ecc7952f9f3ee14b6f6ec9
Author: Will Froom <[email protected]>
Date:   Wed Feb 19 02:23:25 2025 -0800

    [XLA:CPU] Improve invariant argument checking in KernelThunk execution

    PiperOrigin-RevId: 728577867

commit bd590a478a6dda4d333b64658ec36360a533b3ea
Author: Aliia Khasanova <[email protected]>
Date:   Wed Feb 19 01:02:57 2025 -0800

    Use custom HLO serialization for HloUnoptimizedSnapshot.

    This change makes it possible to dump HloUnoptimizedSnapshot protos that are over 2GiB in size (the proto binary serialization limit).

    PiperOrigin-RevId: 728553680

commit e49cc07356322efd128b3b7d7e9123e562e81c51
Author: A. Unique TensorFlower <[email protected]>
Date:   Wed Feb 19 01:02:30 2025 -0800

    Update GraphDef version to 2143.

    PiperOrigin-RevId: 728553521

commit f1430d2a49e796146e41e554a42c3fa541e9a465
Author: A. Unique TensorFlower <[email protected]>
Date:   Wed Feb 19 01:02:20 2025 -0800

    compat: Update forward compatibility horizon to 2025-02-19

    PiperOrigin-RevId: 728553462

commit c5a063e3e094ce35bc6a51916223a37dc31d0806
Author: A. Unique TensorFlower <[email protected]>
Date:   Wed Feb 19 00:48:54 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728549276

commit e9b25b8f6f753b37af11ff7c9cbc4569a221b44b
Author: Luke Boyer <[email protected]>
Date:   Tue Feb 18 23:23:59 2025 -0800

    Add composite test model

    PiperOrigin-RevId: 728523510

commit 8e457cf300410ff635b0b34cd82b9fef6d2cefb1
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 23:20:42 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728522632

commit 29bf5fe0d5e36cf2110788d2cc73c76ecce6efd8
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 23:07:06 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728518571

commit 23f7adfeb1803ee6aa5050420c6cf8dcc899cba5
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 23:01:22 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728516819

commit b5f455d9805a1071394abb18f50efa49c9074970
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 22:46:47 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728513284

commit d170d89e5d0ed2abfc576af944b6fdb6a5e40c6c
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 21:08:49 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728486781

commit c2ee958e2ccc3b6b6ee8e88490b66719801c3531
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 21:06:58 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728486295

commit d5344e29c5938c461dd0437223d75ec62529ecc9
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 20:57:17 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728483972

commit fb546cb77826f679dc617b6ba63d712810655307
Merge: 5c9432bc5b6 1ee2e1dabdf
Author: TensorFlower Gardener <[email protected]>
Date:   Tue Feb 18 18:43:37 2025 -0800

    Merge pull request #86715 from jiunkaiy:dev/chuntl/validate_soc_table

    PiperOrigin-RevId: 728443070

commit 5c9432bc5b6b79829fbdc3a0d87a12f8771c35fa
Merge: 3187d932578 84abc58d90e
Author: TensorFlower Gardener <[email protected]>
Date:   Tue Feb 18 18:28:35 2025 -0800

    Merge pull request #87305 from jiunkaiy:dev/chunhsue/new_DUS

    PiperOrigin-RevId: 728437834

commit 3187d932578087f48c0792fe1113e21e296357c3
Author: Vamsi Manchala <[email protected]>
Date:   Tue Feb 18 18:09:31 2025 -0800

    Fix TFL::ReshapeOp to allow correct per-axis quantization types.

    PiperOrigin-RevId: 728436491

commit d120e39920c0e61cc1227bc1abe50fd6ecd3ce66
Author: Andrew Zhang <[email protected]>
Date:   Tue Feb 18 17:35:33 2025 -0800

    Partially roll back to old ways of initializing QNN manager.

    PiperOrigin-RevId: 728426746

commit 72a11de4bb8a1520acf5d2676e0d6d682e3ddbd0
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 16:53:04 2025 -0800

    We need to insert a copy-from-host before X64SplitLow/High custom-calls
    for host tensors.

    PiperOrigin-RevId: 728414481

commit d3215aa5a464f67a8016886de35def11587ea829
Author: Abhinav Gunjal <[email protected]>
Date:   Tue Feb 18 16:39:48 2025 -0800

    [HLO-OPT] Tool : register HWI passes from hlo/transforms/ directory

    PiperOrigin-RevId: 728410275

commit b6a432412edfdf80e003e7d943ec0b0cf50cf3a9
Author: Bryan Massoth <[email protected]>
Date:   Tue Feb 18 16:35:12 2025 -0800

    Add AllReduceInfo to step data for TPUs.

    PiperOrigin-RevId: 728408466

commit 7c12d8d156f4f71a34ec36d980a78a5c5156348b
Author: Michael Whittaker <[email protected]>
Date:   Tue Feb 18 16:26:26 2025 -0800

    Avoid operating on aborted NCCL communicator.

    PiperOrigin-RevId: 728405699

commit be956f3df3cd1d65c565b1d383a17e75afd95742
Author: Andrew Zhang <[email protected]>
Date:   Tue Feb 18 16:17:08 2025 -0800

    No need to include QNN headers in dispatch delegate BUILD.

    PiperOrigin-RevId: 728402643

commit 90ef4630f475c474899ab70b99dfbbd3dc7c1440
Merge: 38ce2b08906 ab600f07a7c
Author: TensorFlower Gardener <[email protected]>
Date:   Tue Feb 18 16:07:14 2025 -0800

    Merge pull request #86942 from Flamefire:add_default_shell_env

    PiperOrigin-RevId: 728393629

commit 38ce2b0890691571abd0d6d0ac830be1b5d78cc5
Author: Quentin Khan <[email protected]>
Date:   Tue Feb 18 15:27:40 2025 -0800

    Fix typos in `lite/core:model_building` (2nd edition)

    PiperOrigin-RevId: 728385051

commit 2218aba0ea222c44139135e5f770950c61a0069e
Author: Terry Heo <[email protected]>
Date:   Tue Feb 18 15:26:03 2025 -0800

    Add subgraph_input_names(), subgraph_output_names() to the SignatureRunner

    In LiteRT, it assumes that the order of names are aligned with Subgraph.
    But the existing input_names(), output_names() API doesn't follow it, also
    there are customers who rely on the legacy order.

    This cl creates these new methods which returns the names which reflects
    the underlying Subgraph I/O order.

    PiperOrigin-RevId: 728384297

commit 4aac416b232bf991c7be2deefe7755ec548a5755
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 15:20:39 2025 -0800

    Move utility aspect rules to the `third_party/py/python_wheel.bzl`.

    The aspects can be used in wheel build rules across all Google ML projects. They provide capability to build the wheels for the cross-compile configuration.

    PiperOrigin-RevId: 728381803

commit e17a2ccbf89bc4ff560da2e15922c1d09e3760c7
Author: Seher Ellis <[email protected]>
Date:   Tue Feb 18 15:20:01 2025 -0800

    [XLA] Fix a bug in SchedulingAnnotationCrossesOverlapLimit.

    Before this code change, we checked whether scheduling each instruction in a given group individually would cross any limits. This check was not strong enough because
    i) async instructions in the group are supposed to overlap each other and
    ii) total resource usage of the group can still exceed the limit while individual usages do not.

    For example, if the all-gather limit is 1, we should not allow a group with 2 async all-gathers to be scheduled.

    With this code change, we compute the "accumulated" resource usage of the annotation group and compare that against the limit.

    PiperOrigin-RevId: 728381430

commit bd7ad11857e29776e09517c0f6248e5ae4677dcc
Author: Matthias Guenther <[email protected]>
Date:   Tue Feb 18 15:14:01 2025 -0800

    Tweak formatting of a few lines in `generate_hlo_test_checks.py`.

    PiperOrigin-RevId: 728378722

commit 258d574ed53a5a05cc12d0b067e31dbfd92e8f44
Author: Michael Whittaker <[email protected]>
Date:   Tue Feb 18 15:03:00 2025 -0800

    Removed unused AsNcclUniqueIds function.

    PiperOrigin-RevId: 728373887

commit 9968751a81f349d3d7fd5d3aee8732e27984fc51
Author: Allan Renucci <[email protected]>
Date:   Tue Feb 18 14:54:58 2025 -0800

    Reduce dependencies on `MultiHostHloRunner::LoadHloModuleAndArguments`.

    Most users just need to load the module from a text file.

    PiperOrigin-RevId: 728370230

commit f2f4e6e4b7eb60ce07914c90c7f4cc5f243d9bca
Author: Niklas Vangerow <[email protected]>
Date:   Tue Feb 18 14:38:59 2025 -0800

    HloRunnerPjRt should respect static device layout if present.

    `ExecuteReplicated` was overwriting the device layout even if one was present in
    the module config.

    PiperOrigin-RevId: 728363119

commit ca4715ea0d399e6fc536e511ca3ee2d90db03318
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 14:38:13 2025 -0800

    Increase the TF GPU wheel size to 630 MB.

    PiperOrigin-RevId: 728362828

commit bef0b9e3ae6f23eebe469546ce8b3f45a7e5973a
Author: David Dunleavy <[email protected]>
Date:   Tue Feb 18 14:36:21 2025 -0800

    Remove `ci_continuous_only.yml` for now as we have no continuous only GitHub Actions builds

    PiperOrigin-RevId: 728362155

commit ad4decaf5c3952a75a20e7fab469f45d76c2272a
Author: David Dunleavy <[email protected]>
Date:   Tue Feb 18 13:58:15 2025 -0800

    Remove `mac_arm.patch` now that we use an LLVM with https://github.com/llvm/llvm-project/pull/127637

    PiperOrigin-RevId: 728347518

commit adf9b275b63068c5fc00b20d34260452cf49c429
Author: Quoc Truong <[email protected]>
Date:   Tue Feb 18 12:32:31 2025 -0800

    Create two new ml-build images. One with libcudnn9.1 and cuda 12.1 and the other one with libcudnn 9.1 and cuda 12.3.

    PiperOrigin-RevId: 728316388

commit ae4f6742234f80dc5178606e3ef05c060a01ba53
Author: Yin Zhang <[email protected]>
Date:   Tue Feb 18 11:07:39 2025 -0800

    Create data_table_utils preparing for the profiler conversion

    PiperOrigin-RevId: 728282467

commit 4f7980f1c5b5dadc42d2860aed35833e5ee268d9
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 10:28:21 2025 -0800

    Integrate LLVM at llvm/llvm-project@9d24f9437944

    Updates LLVM usage to match
    [9d24f9437944](https://github.com/llvm/llvm-project/commit/9d24f9437944)

    PiperOrigin-RevId: 728265165

commit ea72e93812837a2a9d8218e8cadac68795b0b86e
Author: Kanglan Tang <[email protected]>
Date:   Tue Feb 18 09:40:39 2025 -0800

    Install free threaded python3.13t to the ml_build arm64 docker image

    python3.13-nogil is a free-threaded build of python3.13.

    PiperOrigin-RevId: 728246352

commit 5efb5dfaa6aa57658adb678dce4e20205ae3c88b
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 09:19:18 2025 -0800

    Fix a typo in third_party/tensorflow/core/framework/tensor.h.

    PiperOrigin-RevId: 728238232

commit e75fae832796b06b026dd957f34b000b5e9a6fa6
Author: Frederik Gossen <[email protected]>
Date:   Tue Feb 18 09:11:05 2025 -0800

    Use seprate collective resource when scheduling p2p communication

    This is in preparation of removing all the 4 existing p2p resources.
    We are simplifying the pipeline parallelism implementation here.

    PiperOrigin-RevId: 728235315

commit 9fa9fed3a4471e847b7f87e04f78e9887c76b9e9
Author: Yash Katariya <[email protected]>
Date:   Tue Feb 18 08:58:35 2025 -0800

    Add a patch to llvm to fix jax's mac arm builds

    PiperOrigin-RevId: 728230563

commit d094cccdf72ff8e276e8f2de21a9cb76bc67430b
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 08:50:12 2025 -0800

    Fix neuron library loading when libneuron_adapter.so is installed system-wide.

    PiperOrigin-RevId: 728227684

commit 54a1b798ae6369ef18c7786620871be78b792457
Author: Eugene Zhulenev <[email protected]>
Date:   Tue Feb 18 08:45:37 2025 -0800

    [xla:cpu] Optimize non-grouped convolution

    ```
    BM_Conv2D<F32>/8/5/5/1/1/1/32/process_time                     1.57µs ± 8%  1.54µs ± 8%   -1.84%  (p=0.007 n=40+40)
    BM_Conv2D<F32>/8/5/5/4/1/1/32/process_time                     3.09µs ± 4%  3.09µs ± 4%     ~     (p=0.770 n=40+40)
    BM_Conv2D<F32>/8/128/128/4/1/1/8/process_time                   872µs ±10%   899µs ±14%   +3.09%  (p=0.044 n=38+40)
    BM_Conv2D<F32>/8/32/32/128/1/1/1024/process_time               71.4ms ±11%  70.7ms ±10%     ~     (p=0.382 n=38+35)
    BM_Conv2D<F32>/16/32/32/128/1/1/1024/process_time               187ms ±13%   189ms ±11%     ~     (p=0.622 n=40+40)
    BM_Conv2D<F32>/32/32/32/128/1/1/1024/process_time               347ms ±10%   345ms ±10%     ~     (p=0.642 n=40+40)
    BM_Conv2D<F32>/32/64/64/32/1/1/64/process_time                 43.2ms ± 7%  43.6ms ±13%     ~     (p=0.418 n=40+40)
    BM_Conv2D<F32>/32/256/256/4/1/1/16/process_time                 127ms ± 4%   128ms ± 4%     ~     (p=0.074 n=35+39)
    BM_Conv2D<F32>/32/64/64/4/1/1/16/process_time                  2.46ms ±11%  2.46ms ±15%     ~     (p=0.717 n=38+40)
    BM_Conv2D<F32>/32/32/32/96/1/1/96/process_time                 23.7ms ± 7%  23.7ms ± 6%     ~     (p=0.939 n=40+37)
    BM_Conv2D<F32>/8/5/5/1/3/3/32/process_time                     20.4µs ± 1%  21.3µs ± 2%   +4.27%  (p=0.000 n=39+38)
    BM_Conv2D<F32>/8/5/5/4/3/3/32/process_time                     51.0µs ± 1%  27.0µs ± 2%  -47.13%  (p=0.000 n=37+35)
    BM_Conv2D<F32>/8/128/128/4/3/3/8/process_time                  72.3ms ± 3%  21.9ms ± 3%  -69.71%  (p=0.000 n=40+39)
    BM_Conv2D<F32>/8/32/32/128/3/3/1024/process_time                761ms ± 5%   637ms ± 7%  -16.30%  (p=0.000 n=38+40)
    BM_Conv2D<F32>/16/32/32/128/3/3/1024/process_time               1.28s ± 6%   1.10s ± 5%  -14.50%  (p=0.000 n=38+39)
    BM_Conv2D<F32>/32/32/32/128/3/3/1024/process_time               2.55s ± 3%   2.23s ± 5%  -12.41%  (p=0.000 n=40+39)
    BM_Conv2D<F32>/32/64/64/32/3/3/64/process_time                  301ms ± 6%   180ms ±10%  -40.25%  (p=0.000 n=38+38)
    BM_Conv2D<F32>/32/256/256/4/3/3/16/process_time                 1.46s ± 4%   0.39s ±14%  -73.50%  (p=0.000 n=38+37)
    BM_Conv2D<F32>/32/64/64/4/3/3/16/process_time                  80.0ms ± 4%  24.0ms ± 3%  -69.97%  (p=0.000 n=40+39)
    BM_Conv2D<F32>/32/32/32/96/3/3/96/process_time                  259ms ± 6%   201ms ± 9%  -22.40%  (p=0.000 n=39+38)
    BM_GroupedConv2D/1/45/45/1024/5/5/1024/1024/process_time        491ms ± 6%   509ms ± 8%   +3.55%  (p=0.001 n=40+40)
    BM_Conv1DStrided/1/129/process_time                            29.2ms ± 6%  25.7ms ± 9%  -11.94%  (p=0.000 n=39+40)
    BM_Conv1DStrided/3/129/process_time                             136ms ± 4%    76ms ± 8%  -44.51%  (p=0.000 n=40+40)
    BM_Conv1DTransposedStrided/129/1/process_time                  28.1ms ± 5%  28.2ms ± 6%     ~     (p=0.950 n=40+40)
    BM_Conv1DTransposedStrided/129/3/process_time                  67.8ms ± 7%  66.5ms ± 7%   -2.00%  (p=0.010 n=38+40)
    BM_Conv1DTransposedStridedNonDefaultLayout/129/1/process_time  25.0ms ± 5%  24.8ms ± 6%     ~     (p=0.074 n=39+39)
    BM_Conv1DTransposedStridedNonDefaultLayout/129/3/process_time  67.1ms ± 4%  65.5ms ± 7%   -2.33%  (p=0.001 n=38+40)
    BM_Conv2DStrided/process_time                                  31.6ms ± 6%  28.0ms ± 5%  -11.50%  (p=0.000 n=40+40)
    BM_Conv2DTransposedStrided/process_time                        28.0ms ± 4%  27.9ms ± 4%     ~     (p=0.555 n=39+40)
    BM_GroupedConv2DStrided/128/128/128/process_time               64.7ms ± 4%  64.8ms ± 3%     ~     (p=0.429 n=40+40)
    BM_GroupedConv2DTransposedStrided/128/128/128/process_time      5.58s ± 1%   5.58s ± 1%     ~     (p=0.172 n=39+39)
    BM_GroupedConv2DStrided/128/128/16/process_time                37.3ms ± 3%  37.6ms ± 2%   +0.72%  (p=0.010 n=40+38)
    BM_GroupedConv2DTransposedStrided/128/128/16/process_time       1.02s ± 2%   1.02s ± 2%     ~     (p=0.195 n=37+38)
    ```

    PiperOrigin-RevId: 728226255

commit 7f4b5452d790385e1f26755fa4311e2eccea2ee4
Author: Alexander Pivovarov <[email protected]>
Date:   Tue Feb 18 07:47:02 2025 -0800

    PR #22723: Fix call of overloaded Tile is ambiguous

    Imported from GitHub PR https://github.com/openxla/xla/pull/22723

    #### Fix GCC-13 Build Error in AutoSharding Due to vector<vector> vs. absl::Span Ambiguity

    When building auto_sharding with GCC-13, the following build error occurred:

    ```
    xla/hlo/experimental/auto_sharding/auto_sharding.cc:895:37: error: call of overloaded 'Tile(const xla::Shape&, <brace-enclosed initializer list>, <brace-enclosed initializer list>, const xla::spmd::DeviceMesh&)' is ambiguous
      895 |       HloSharding output_spec = Tile(shape, {i}, {j}, device_mesh);
          |                                 ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In file included from ./xla/hlo/experimental/auto_sharding/cluster_environment.h:33,
                     from ./xla/hlo/experimental/auto_sharding/auto_sharding.h:41:
    ./xla/hlo/experimental/auto_sharding/auto_sharding_util.h:499:13: note: candidate: 'xla::HloSharding xla::spmd::Tile(const xla::Shape&, absl::lts_20230802::Span<const long int>, const std::vector<std::vector<long int> >&, const DeviceMesh&)'
      499 | HloSharding Tile(const Shape& tensor_shape,
          |             ^~~~
    ./xla/hlo/experimental/auto_sharding/auto_sharding_util.h:504:13: note: candidate: 'xla::HloSharding xla::spmd::Tile(const xla::Shape&, absl::lts_20230802::Span<const long int>, absl::lts_20230802::Span<const long int>, const DeviceMesh&)'
      504 | HloSharding Tile(const Shape& tensor_shape,
          |             ^~~~
    ```

    #### Solution:
    To resolve the ambiguity between `std::vector<std::vector<int64_t>>` and `absl::Span<const int64_t>` in `Tile()`, I introduced an overloaded Tile() function that takes `std::initializer_list<int64_t> mesh_dims`.

    Now, expressions like the following now compile successfully with GCC-13:

    ```
    Tile(shape, {0}, {0}, device_mesh);
    ```

    #### Additional Changes
    - Removed the `Tile()` declaration from `auto_sharding.h` since it is already declared in `auto_sharding_util.h`.

    Copybara import of the project:

    --
    a28207f171c086130097d520079648b1548976dc by Alexander Pivovarov <[email protected]>:

    Fix call of overloaded Tile is ambiguous

    Merging this change closes #22723

    PiperOrigin-RevId: 728206576

commit 07a69f256be7b5b53c6771b0e90af8f1527bdbe4
Author: Frederik Gossen <[email protected]>
Date:   Tue Feb 18 07:46:31 2025 -0800

    Add more vlogs to p2p pipeliner to aid debugging

    PiperOrigin-RevId: 728206464

commit 0baaf5df819670e6aba9a35563a7d09649c54db9
Author: Dan Foreman-Mackey <[email protected]>
Date:   Tue Feb 18 07:14:40 2025 -0800

    [xla:ffi] Add context decoding for some FFI internals.

    PiperOrigin-RevId: 728197028

commit 9d04d11fefa754b4ddfabe9b00a0021c392b71f0
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 04:07:03 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728144120

commit 40021bd73e52681f1b3c9f2ce2cf8637110362b5
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 03:43:09 2025 -0800

    Add TensorAdapter for dynamic loading of Google Tensor Compiler API

    This commit introduces the `Adapter` class, which provides an abstraction for dynamically loading and interfacing with the Google Tensor Compiler API. The adapter uses `dlopen` and `dlsym` to load the shared library at runtime and expose the `CompileFlatbuffer` function for compiling TFLite buffers.

    1. Supports dynamic loading to decouple client code from the compiler implementation.
    2. Provides a C++ interface to interact with the compiler API.
    3. Utilizes C-style linkage for compatibility and to avoid name mangling.
    4. Includes error handling for library loading and symbol resolution.

    This change ensures modularity, allowing flexibility in loading different versions of the compiler without recompiling the client code.

    PiperOrigin-RevId: 728137559

commit 01197637cb4eee7da2f1645f7367268ee44efa87
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 03:04:38 2025 -0800

    Integrate LLVM at llvm/llvm-project@34cf04b59b8d

    Updates LLVM usage to match
    [34cf04b59b8d](https://github.com/llvm/llvm-project/commit/34cf04b59b8d)

    PiperOrigin-RevId: 728126746

commit 894bcccb7005a5d88f32aae20f9ae5da2181f778
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 01:02:38 2025 -0800

    Update GraphDef version to 2142.

    PiperOrigin-RevId: 728091405

commit 3c7c34dc2e166c95088e1bb556eb24f0fecedf4a
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 01:02:33 2025 -0800

    compat: Update forward compatibility horizon to 2025-02-18

    PiperOrigin-RevId: 728091377

commit 19aa387f08b12381c38b5076168ff9ed07c987f9
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 18 00:14:15 2025 -0800

    Reverts 35fb4a99534dc85e47eeedf760ab691ddda31219

    PiperOrigin-RevId: 728078996

commit 98e2e6f75033f658c752245173b24f44561a2f3c
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 23:39:03 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728069637

commit dd7bc1258d68b852e30a733707159dbfdb8e4940
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 23:08:48 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728062876

commit af1040e7fe6d54dba051c03c6ca9e5a7756576a1
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 23:08:47 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728062865

commit 4fe72bae5981e7aa701a0d1420d8dc15aaec262c
Author: Adrian Kuegel <[email protected]>
Date:   Mon Feb 17 23:07:13 2025 -0800

    Extract SizeAndStrideExpression into its own target (NFC).

    We want to reuse some of the logic to compute size and stride from affine
    expressions. This move is a preparation for that.

    PiperOrigin-RevId: 728062444

commit 307e016343f238baa09a7e57f21c03d5ef608bb9
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 22:06:46 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 728046575

commit 1e0f639a26b2aafd2732ae0e64817b7cdd387c81
Author: Terry Heo <[email protected]>
Date:   Mon Feb 17 21:45:08 2025 -0800

    Reverts 37bffc34ec432bcc331fa0cc7634c3890418c48f

    PiperOrigin-RevId: 728041551

commit b219e9ef94e40c3eaf70cc428b4dfff728cba26f
Author: Frederik Gossen <[email protected]>
Date:   Mon Feb 17 20:10:34 2025 -0800

    Fix length of separators

    PiperOrigin-RevId: 728020445

commit e8aeed09867b67054e9869e6ed9929c7f50c64b9
Author: pemeliya <[email protected]>
Date:   Mon Feb 17 10:22:14 2025 -0800

    PR #21886: [ROCM][NFC] BlasLt interface refactoring & simplifying: part I

    Imported from GitHub PR https://github.com/openxla/xla/pull/21886

    After this PR https://github.com/tensorflow/tensorflow/pull/73926 is merged, we can remove unnecessary low-level DoMatmul functions from GpuBlasLt interface (which otherwise looks scary and unnecessarily complicated).

    Furthermore, we can also remove **ValidateInputs** function from the interface and derived classes since a high-level **ExecuteOnStream** function already handles data-types correctly. This also greatly simplifies the code.

    Also, I have packed the input arguments of ExecuteOnStream calls to a struct **MemoryArgs** to simplify arguments passing in derived classes and improve code readability.

    Finally, in the original GpuBlasLt PR: https://github.com/openxla/xla/pull/5911, I made a sort of mistake by adding a reference to **blas_lt** to the MatmulPlan class [here](https://github.com/openxla/xla/blob/main/xla/stream_executor/rocm/hip_blas_lt.h#L135), thereby making MatmulPlans bound to a **particular BlasLt instance**. This resulted in some further bugfixes and, most importantly, complicated GpuBlasLt cache design in gpublas_lt_matmul_thunk.cc/.h. In this PR, I remove this reference again from MatmulPlan class and in the next NFC PR the cache mechanics can also be simplified.

    Unfortunately, this change also requires a tandem PR for Tensorflow: https://github.com/tensorflow/tensorflow/pull/85835

    @xla-rotation Would you please have a look

    Copybara import of the project:

    --
    e96bb2fbedab3f53b31ef0e1748582c76e9fb105 by Pavel Emeliyanenko <[email protected]>:

    blaslt interface refactoring: removing blas_lt_ref

    added cuda adaptions

    cuda-side adaptions

    cuda side adaptions

    fix

    fixing pointers

    Merging this change closes #21886

    PiperOrigin-RevId: 727898957

commit 1b71f5e6263f7857154bfd4fae2b88fd372bfecb
Author: Allan Renucci <[email protected]>
Date:   Mon Feb 17 10:13:25 2025 -0800

    Debug Apple Silicon CI disk space issue.

    PiperOrigin-RevId: 727896465

commit 3a4212cb57885758bd00a2c378b36e7d55678ab8
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 10:07:50 2025 -0800

    Integrate LLVM at llvm/llvm-project@912b154f3a3f

    Updates LLVM usage to match
    [912b154f3a3f](https://github.com/llvm/llvm-project/commit/912b154f3a3f)

    PiperOrigin-RevId: 727895384

commit 8cdf9deecce7ec6feabb9d051b7fdeeca2d300e0
Author: Shaogang Wang <[email protected]>
Date:   Mon Feb 17 08:18:11 2025 -0800

    PR #22385: [XLA:GPU] Fix test ConditionalOpTest.SwappedInputsInSequentialConditionals failed when enabling command buffer.

    Imported from GitHub PR https://github.com/openxla/xla/pull/22385

    This PR fix test ConditionalOpTest.SwappedInputsInSequentialConditionals failure when enabling command buffer. The issue is that command buffer lowering process mis-interprets the bool predict to int32 predict, so the predict value is corrupted.

    the original errors are:

    ```
    xla/tests/client_library_test_base.cc:466: Failure
    Value of: LiteralTestUtil::Near(expected, actual, error)
      Actual: false (
    Mismatches in shape (f32[], f32[]) (2 elements):
    Array at shape index {0},
    Mismatch count 1 (100.0000%) in shape f32[] (1 elements), abs bound 0.001, rel bound 0
    Top relative error mismatches:
      actual             5.55000019, expected             11.2399998, index {}, rel error    0.506, abs error     5.69
    : Array at shape index {1},
    Mismatch count 1 (100.0000%) in shape f32[] (1 elements), abs bound 0.001, rel bound 0
    Top relative error mismatches:
      actual             11.2399998, expected             5.55000019, index {}, rel error     1.03, abs error     5.69

    Expected literal:
    (
    f32[] 11.24,
    f32[] 5.55
    )

    Actual literal:
    (
    f32[] 5.55,
    f32[] 11.24
    ))
    ```
    Copybara import of the project:

    --
    272959fb11a5306414ef924536130e4b3456fa37 by Shawn Wang <[email protected]>:

    fix case command buffer unittest failure

    --
    a254286ec86013186380bb0502769bdc01580130 by Shawn Wang <[email protected]>:

    restore default falgs

    --
    df91f58d5546edfd625ad3d4825c9bbf5e72971e by Shawn Wang <[email protected]>:

    fix test failure

    --
    0f12478c89a85f2d0bbca6fceae4eb7779f84868 by Shawn Wang <[email protected]>:

    fix

    --
    33befa603257c76897a250130582bde67049cae5 by Shawn Wang <[email protected]>:

    update ptx code

    Merging this change closes #22385

    PiperOrigin-RevId: 727871172

commit f269156fb6748e98482e632d744a4b74e378a928
Author: Terry Sun <[email protected]>
Date:   Mon Feb 17 07:34:23 2025 -0800

    PR #21746: [NVIDIA GPU] Add collective-permute combiner

    Imported from GitHub PR https://github.com/openxla/xla/pull/21746

    For collective-permutes with small message sizes, it is beneficial to combine them into a single collective because
    1. it gets rid of some kernel launch overhead, and allows NCCL to do some message fusion;
    2. fewer collectives make it easier for LHS to make better decision.

    On top of the multi-operand collective-permute added in https://github.com/openxla/xla/pull/18838, this PR adds a combiner for collective-permutes.
    Copybara import of the project:

    --
    c03a8fb5bd42cf3a365e1684537e78544a75a937 by Terry Sun <[email protected]>:

    add collective permute combiner

    --
    6a3159e89444ea342c25d8d996c994accd68a30d by Terry Sun <[email protected]>:

    polishing and doc string updates

    Merging this change closes #21746

    PiperOrigin-RevId: 727859633

commit 0fdd7a6becdaa14a541c2a6be6787221b3632924
Author: Sergey Kozub <[email protected]>
Date:   Mon Feb 17 07:08:18 2025 -0800

    PR #22575: [XLA:GPU] Fix triton sparse dot lowering on Blackwell

    Imported from GitHub PR https://github.com/openxla/xla/pull/22575

    Sparse dot is supported for MMA v2 and v3 only, and sm100/sm120 should use MMA v2 (v3 is Hopper-only).
    Copybara import of the project:

    --
    bd4c827db0e4adbff629bf0b02d09ff2860e4fb2 by Sergey Kozub <[email protected]>:

    [XLA:GPU] Fix triton sparse dot lowering on Blackwell

    Merging this change closes #22575

    PiperOrigin-RevId: 727853619

commit 1cfb662549aa6e47258883cfcc087a8a1548f7e6
Author: Quentin Khan <[email protected]>
Date:   Mon Feb 17 05:22:35 2025 -0800

    Add a new mechanism to inline StableHLO composite ops.

    PiperOrigin-RevId: 727827608

commit 0bb9a7ea8adb5b06e6ff9f47fc1682b1ab9d40f7
Author: Alexander Belyaev <[email protected]>
Date:   Mon Feb 17 05:07:36 2025 -0800

    [XLA:GPU] Update printers/parsers for the tile/extract/insert ops.

    PiperOrigin-RevId: 727823967

commit 805e38c3103cc30b6d714fb6b6949c77eeceef17
Author: Adrian Kuegel <[email protected]>
Date:   Mon Feb 17 04:58:53 2025 -0800

    Extract ConstraintExpression class into its own target (NFC).

    It is usually a good idea to put separate classes into their own targets.
    This makes merge conflicts less likely and can also mean that fewer tests will
    be triggered (e.g. only the SymbolicTile tests when changing something related
    to SymbolicTile, but not changing ConstraintExpression).

    PiperOrigin-RevId: 727821427

commit d4eafdbcad85796f41ed5f29d73402e68c483ca8
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 02:45:41 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 727785601

commit 093187c0279f1fb1516b7ede30ee6bb73904d0cb
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 02:26:50 2025 -0800

    Support creating HloModule from proto with compilation environments.

    PiperOrigin-RevId: 727780075

commit 61c2376dbbbdad5536ecdca308be49f5679db553
Author: Alexander Belyaev <[email protected]>
Date:   Mon Feb 17 01:35:12 2025 -0800

    [XLA:GPU] Move lit tests for triton_xla dialect into [ir|transforms]/tests.

    PiperOrigin-RevId: 727764291

commit 612845cdff1d834b19455dc5013cd90d3780d5dd
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 01:34:21 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 727763993

commit 5902c52936171932d1ed9a04d7f98bdb4fb019c7
Author: TJ Xu <[email protected]>
Date:   Mon Feb 17 01:22:28 2025 -0800

    PR #22757: Fix MemcpyP2pWhileLoopCorrectness test failure in collective e2e suite

    Imported from GitHub PR https://github.com/openxla/xla/pull/22757

    Copybara import of the project:

    --
    906cec63655f93e2fcf23b1530d721b499cc8750 by TJ Xu <[email protected]>:

    Fix MemcpyP2pWhileLoopCorrectness test failure in collective e2e suite

    Merging this change closes #22757

    PiperOrigin-RevId: 727760726

commit 9084954bccc612e3dce5deb9d02c549ce96b6163
Author: Kasper Nielsen <[email protected]>
Date:   Mon Feb 17 01:16:07 2025 -0800

    PR #22706: Increase the compilation thread stack size to 4MB

    Imported from GitHub PR https://github.com/openxla/xla/pull/22706

    When JIT compiling modules using a process runner, the default stack size on OS X is too low. Fx [this StableHLO module](https://gist.github.com/kasper0406/6e73ebd7c2e2475c69d4043d8c9e7465) will fail to compile with a stack overflow error using this command:
    ```
    bazel run --spawn_strategy=sandboxed //xla/tools:run_hlo_module -- --platform=CPU --input_format=stablehlo stack_overflow.mlir
    ```

    This PR will increase the thread stack size for the compilation thread pool to 4MB per thread.

    Copybara import of the project:

    --
    31d88dbd8cfbb99986202bd450c71fd21b5a4223 by Kasper Nielsen <[email protected]>:

    Increase the compilation thread stack size to 4MB

    Merging this change closes #22706

    PiperOrigin-RevId: 727758662

commit 6b99189d9c114af72ca419e08afb9946abd9f7b9
Author: Shraiysh <[email protected]>
Date:   Mon Feb 17 01:13:12 2025 -0800

    PR #22680: Changing the default value of the flag xla_dump_hlo_as_long_text

    Imported from GitHub PR https://github.com/openxla/xla/pull/22680

    This stems from the difference in HLO dumps from tools like hlo-opt, multihost_hlo_runner, and hlo_runner_main. Although the options related to printing HLO are tied to the HLO via DebugOptions, these tools have different behaviors because the tool hlo-opt uses `ToString()` function from `HloModule` while `functional_hlo_runner` uses a separate dumping utility from `xla/service/dump.cc`. To standardize this behavior, we first are making the default behavior uniform by setting the default value of long text HLO dumps to be true. This ensures that the HLO dumps will be functional by default (for example, backend_config will be printed).
    Copybara import of the project:

    --
    0818c7871688a8e18cefdaea7bd17fd7aba293d9 by Shraiysh Vaishay <[email protected]>:

    Changing the default value of the flag xla_dump_hlo_as_long_text

    This stems from the difference in HLO dumps from tools like hlo-opt,
    multihost_hlo_runner, and hlo_runner_main. Although the options
    related to printing HLO are tied to the HLO via DebugOptions, these
    tools have different behaviors because the tool hlo-opt uses `ToString()`
    function from `HloModule` while `functional_hlo_runner` uses a separate
    dumping utility from `xla/service/dump.cc`. To standardize this behavior,
    we first are making the default behavior uniform by setting the default
    value of long text HLO dumps to be true. This ensures that the HLO dumps
    will be functional by default (for example, backend_config will be printed).

    Merging this change closes #22680

    PiperOrigin-RevId: 727757839

commit 0a0fe9d11320ffd9f5c4fc9f626af8e56931ee76
Author: Ilia Sergachev <[email protected]>
Date:   Mon Feb 17 01:11:12 2025 -0800

    PR #22640: [NFC] Deduplicate functions between HLO runners.

    Imported from GitHub PR https://github.com/openxla/xla/pull/22640

    Copybara import of the project:

    --
    e0e1de485c9120c1d52c8f10d843c13097802e07 by Ilia Sergachev <[email protected]>:

    [NFC] Deduplicate functions between HLO runners.

    --
    aeff13d83e159fd8e2cb12082a2549ccdbe0b2b6 by Ilia Sergachev <[email protected]>:

    [NFC] Delete unused functions.

    --
    c073163cd1ce91ddb12c7b52d760194f33de9fe2 by Ilia Sergachev <[email protected]>:

    Update xla/service/hlo_runner_interface.cc

    Co-authored-by: Allan Renucci <[email protected]>
    --
    b043017e52b444fbfff1c49655952899f6c4fd6d by Ilia Sergachev <[email protected]>:

    Update xla/service/hlo_runner_interface.h

    Co-authored-by: Allan Renucci <[email protected]>
    --
    48383f2873e8614afc678c3d7842ca1815d53d40 by Ilia Sergachev <[email protected]>:

    Update xla/service/hlo_runner_interface.h

    Co-authored-by: Allan Renucci <[email protected]>
    --
    d5bca65f50c2e1b05ce36f1d485c107806bfbc94 by Ilia Sergachev <[email protected]>:

    Rename HloProtoToModule to CreateModuleFromProto for consistency.

    --
    be158dda3fd2450e1df9d91e833a563eb5a70119 by Ilia Sergachev <[email protected]>:

    Update xla/service/hlo_runner_interface.h

    Co-authored-by: Allan Renucci <[email protected]>
    --
    e34efb499fa492929d9a3116a9dfbc112667e1b3 by Ilia Sergachev <[email protected]>:

    Update xla/service/hlo_runner_interface.h

    Co-authored-by: Allan Renucci <[email protected]>

    Merging this change closes #22640

    PiperOrigin-RevId: 727757454

commit ebc9b16e275b1e18b99e48a1699ac1d43b4bbcc9
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 01:02:34 2025 -0800

    compat: Update forward compatibility horizon to 2025-02-17

    PiperOrigin-RevId: 727754785

commit 724527729596bbd99cdef18eebf6cad406e2f8b4
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 01:02:26 2025 -0800

    Update GraphDef version to 2141.

    PiperOrigin-RevId: 727754733

commit 4e9cde9316e98f99e9bac5c6b2ca36ee8c7009ae
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 17 00:07:43 2025 -0800

    Skip custom fusions in ParallelTaskAssigner.

    PiperOrigin-RevId: 727739338

commit 6532c2ce639a81ee667923fd05593e28edde9bc4
Merge: afc26ebcaf8 494c81367dc
Author: TensorFlower Gardener <[email protected]>
Date:   Sun Feb 16 23:14:41 2025 -0800

    Merge pull request #87313 from tensorflow:fixtypos16

    PiperOrigin-RevId: 727724210

commit afc26ebcaf89c469ba069cf4bd2256cd9f6f4f70
Author: A. Unique TensorFlower <[email protected]>
Date:   Sun Feb 16 16:39:37 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 727641736

commit 67dd0dbf66a152804fb98ba3909bad0dae36ec81
Author: A. Unique TensorFlower <[email protected]>
Date:   Sun Feb 16 01:02:44 2025 -0800

    compat: Update forward compatibility horizon to 2025-02-16

    PiperOrigin-RevId: 727478277

commit 5425fdd4037affb0d5febe82721ceb2cb2852de3
Author: A. Unique TensorFlower <[email protected]>
Date:   Sun Feb 16 01:02:42 2025 -0800

    Update GraphDef version to 2140.

    PiperOrigin-RevId: 727478263

commit 166e5fad5fb7b0dd16abeb11c41f1af3e4376d16
Author: Vlad Sytchenko <[email protected]>
Date:   Sat Feb 15 15:09:26 2025 -0800

    [XLA] Googly changes

    Don't mixup host {in,out}feed with other types of {in,out}feed.

    PiperOrigin-RevId: 727370147

commit fb7ef44e762861c0a7fd12fb723950c16c5b0daf
Author: Frederik Gossen <[email protected]>
Date:   Sat Feb 15 13:54:48 2025 -0800

    Fully pipeline recv and recv-done ops and do not pipeline send ops

    This allows start receiving data in an earlier loop iteration while the data to be sent may not be availavle yet.
    In the context of pipeline parallelism, this enables overlap of the stages' compute and communication.

    PiperOrigin-RevId: 727357008

commit 84690d2525240e0560701402c106572e36c25df6
Author: A. Unique TensorFlower <[email protected]>
Date:   Sat Feb 15 11:21:12 2025 -0800

    Regenerate tf_generated_ops.td after adding a new attribute to the WriteTrainingPredictions custom op to support writing vector predictions to file storage.

    PiperOrigin-RevId: 727330085

commit 6662896d468bc8038b6f4183bd7392dd469964b5
Author: Eugene Zhulenev <[email protected]>
Date:   Sat Feb 15 07:59:29 2025 -0800

    [xla:cpu] Add OneDnnFusionThunk

    PiperOrigin-RevId: 727295388

commit 8a6f519eb79493ccbbee12af45732b19498242da
Author: A. Unique TensorFlower <[email protected]>
Date:   Sat Feb 15 01:18:24 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 727222577

commit a4eeee90693544b714c85cc5b338eae1d5be24ba
Author: A. Unique TensorFlower <[email protected]>
Date:   Sat Feb 15 01:02:29 2025 -0800

    compat: Update forward compatibility horizon to 2025-02-15

    PiperOrigin-RevId: 727219116

commit ae00d0966ccc663c39237f279b72fc6a834d47c9
Author: A. Unique TensorFlower <[email protected]>
Date:   Sat Feb 15 01:02:24 2025 -0800

    Update GraphDef version to 2139.

    PiperOrigin-RevId: 727219095

commit 86ff0ed09a0358c3b79e2781a348647f9de9f312
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 23:16:41 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 727197557

commit e97debd0383f6fb575fb40fca2639776bed176da
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 21:58:02 2025 -0800

    Fix litert vendor shared lib names to be distinct.

    PiperOrigin-RevId: 727181626

commit 0b31741f9f495fb837dc8d913548f2c8f66cc662
Author: Abhinav Gunjal <[email protected]>
Date:   Fri Feb 14 17:53:00 2025 -0800

    [XLA:CPU] Remove unneeded MHLO dependencies from XLA CPU compiler

    PiperOrigin-RevId: 727123960

commit 87c216ad7d9d3b0243db35752b998f813bad0b09
Author: Terry Heo <[email protected]>
Date:   Fri Feb 14 17:07:18 2025 -0800

    Reverts 8f0f3a5af6edc9fe9471929245312df6658af888

    PiperOrigin-RevId: 727112701

commit 0d78562e0ffa59c5d1e2beaf4b49d99f891a2a38
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 15:39:16 2025 -0800

    IFRT proxy: Add custom_call_program_serdes to common_serdes

    PiperOrigin-RevId: 727087280

commit 35fb4a99534dc85e47eeedf760ab691ddda31219
Author: Ryan M. Lefever <[email protected]>
Date:   Fri Feb 14 14:47:22 2025 -0800

    1. Replaced the BaseCosts API with the OpCostManager API in MSA's CostAnalysis.
    2. MSA's CostAnalysis directly exposes OperandBytesAccess() and OutputBytesAccess() methods, rather than through base_costs(), as it previously did.

    PiperOrigin-RevId: 727070950

commit da938f7b16b5be37ccdf0dd66dc3fe4cc0bff256
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 14:45:25 2025 -0800

    Support optimization_level and memory_fitting_level XLA compilation options.

    PiperOrigin-RevId: 727070422

commit d97f27edf3f279b7b79eab7b0e8bad8263146fa5
Author: Allan Renucci <[email protected]>
Date:   Fri Feb 14 13:41:11 2025 -0800

    Don't run the `--nobuild` command for MACOS_CPU_ARM64.

    We can probably install `parallel` via `brew` in an extra setup command but for now align with MACOS_CPU_X86.

    PiperOrigin-RevId: 727049046

commit 49a0e7a5a140b868df26488ae0e0a0db476f875d
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 13:38:41 2025 -0800

    Adds support for string and binary data processing in Colocated Python.

    PiperOrigin-RevId: 727048049

commit 0d872b94c5381dd64dd72193e917e5b9f1fe1144
Author: Seher Ellis <[email protected]>
Date:   Fri Feb 14 13:06:04 2025 -0800

    [XLA:SchedulingAnnotations] Uniquify annotation ids for unrolled scheduling groups.

    PiperOrigin-RevId: 727036617

commit 95938b1f47ed2bf26ea4143fa9c4190fd0c4119a
Author: Parker Schuh <[email protected]>
Date:   Fri Feb 14 12:44:34 2025 -0800

    Update TrackedDeviceBuffer to contain a list of refcounted raw device buffers.

    Note that this still has a problem when donating foreign buffers, but allows this to be fixed as a followup.

    PiperOrigin-RevId: 727029628

commit 7fa06756445457996fc806107dcde272e5eb8a82
Author: Sandeep Dasgupta <[email protected]>
Date:   Fri Feb 14 12:35:33 2025 -0800

    Handling incompatible operand type during HLO -> Mhlo conversion.

    PiperOrigin-RevId: 727026128

commit 0f0b5e889584dd903dea8258da4b4be29e4d1d52
Author: Allan Renucci <[email protected]>
Date:   Fri Feb 14 12:25:59 2025 -0800

    Don't install Bazelisk.

    Install currently fails:
    ```
    INFO:root:Starting process: sudo wget --no-verbose -O /usr/local/bin/bazel https://github.com/bazelbuild/bazelisk/releases/download/v1.11.0/bazelisk-darwin-arm64
    /usr/local/bin/bazel: No such file or directory
    ```
    PiperOrigin-RevId: 727023038

commit 2f8cc0350f1285bee641c79afaebacd7e428aca8
Author: Ionel Gog <[email protected]>
Date:   Fri Feb 14 12:25:30 2025 -0800

    [IFRT] Emit errors after compilations have been dispatched to avoid capturing them in scoped diagnostic handlers.

    PiperOrigin-RevId: 727022873

commit fc59f8fdd15d12ff8380b0846b1fe2ee3758031e
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 12:15:34 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 727019305

commit 119f588434be1db3f7304e7a40605dc2ff7ee117
Author: David Dunleavy <[email protected]>
Date:   Fri Feb 14 12:03:23 2025 -0800

    Use machines with 4 GPUs for TensorFlow CI

    PiperOrigin-RevId: 727013948

commit cd3b8103f9445502d1495d29ca1c3ce5da8bc74a
Author: Matthias Guenther <[email protected]>
Date:   Fri Feb 14 10:52:45 2025 -0800

    Make `generate_hlo_test_checks.py` backwards-compatible with Python 3.9.

    PiperOrigin-RevId: 726986626

commit 78f255fcc59febe26338104a449d86259ec5358f
Author: Derek Murray <[email protected]>
Date:   Fri Feb 14 10:47:49 2025 -0800

    [Function runtime] Avoid copying reachable function definitions when graph collection is disabled.

    Additionally, avoid copying each function during the `UpdateTPUEmbeddingModePass` for the common case where the function does not include any TPUEmbedding layer ops.

    PiperOrigin-RevId: 726984736

commit 37bffc34ec432bcc331fa0cc7634c3890418c48f
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 10:27:50 2025 -0800

    Allow user to pass model file descriptor through Dispatch API

    PiperOrigin-RevId: 726976815

commit 17dadc492fc84f747313968ff0cd22739ffe144a
Author: Allan Renucci <[email protected]>
Date:   Fri Feb 14 09:57:47 2025 -0800

    Add macOS ARM64 Kokoro config.

    PiperOrigin-RevId: 726965851

commit b80271fb915f9ef553bb6a1fc1f209aa90eb2104
Author: Quentin Khan <[email protected]>
Date:   Fri Feb 14 09:31:39 2025 -0800

    Fix typos in `lite/core:model_building`

    PiperOrigin-RevId: 726957451

commit 02eb5dd743af820794891313b08ee28a3a0dfe2d
Author: Greg Olechwierowicz <[email protected]>
Date:   Fri Feb 14 07:22:06 2025 -0800

    [XLA:GPU] Add merge functionality to collective perf table gen.

    Convenience function to merge multiple perf tables as one.

    PiperOrigin-RevId: 726919223

commit 99e54320371802b9b77e05967acb19c9af7ab6a8
Author: Greg Olechwierowicz <[email protected]>
Date:   Fri Feb 14 06:38:07 2025 -0800

    [XLA:GPU] Add python bindings to collective perf table generator.

    Also add logical defaults for search space and pass in replica groups as string instead of plumbing through IotaReplicaGroupList to python bindings.

    PiperOrigin-RevId: 726907010

commit 438f6a735ae52e4efaca53c4ad0331c35426a36e
Merge: 61dba176d02 03d141bf566
Author: TensorFlower Gardener <[email protected]>
Date:   Fri Feb 14 06:50:14 2025 -0800

    Merge pull request #87213 from fujunwei:enable_opencl_delegate

    PiperOrigin-RevId: 726905007

commit 61dba176d02bb5d9e6628221b04ae952856b0a4c
Author: Frederik Gossen <[email protected]>
Date:   Fri Feb 14 05:59:53 2025 -0800

    Use IsOkAndHolds in all-reduce combiner test

    PiperOrigin-RevId: 726896504

commit 21c5f140cf2030efa7570aa07667b863d2641b35
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 05:16:09 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 726884970

commit 9de9aea6277c3eb7d6a40d4fec290a2e4e9974a2
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 03:32:59 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 726857207

commit 494c81367dcc7c28a17bf285f076865f5834a548
Author: Venkat6871 <[email protected]>
Date:   Fri Feb 14 17:06:21 2025 +0530

    Fix typos in documentation strings

commit c32dbe1b258fca03576c45a62cf70459ccf6842d
Author: Mikhail Goncharov <[email protected]>
Date:   Fri Feb 14 02:18:30 2025 -0800

    Integrate LLVM at llvm/llvm-project@5586541d220e

    Updates LLVM usage to match
    [5586541d220e](https://github.com/llvm/llvm-project/commit/5586541d220e)

    PiperOrigin-RevId: 726837599

commit 33318b8d1264a4d72a5c190fce55489b17f41c9f
Author: Dimitris Vardoulakis <[email protected]>
Date:   Fri Feb 14 01:11:17 2025 -0800

    PR #22645: Fix error in the gpu_specs README. The spec is the TargetConfig, which includes the device description.

    Imported from GitHub PR https://github.com/openxla/xla/pull/22645

    Copybara import of the project:

    --
    5fefad4bdb1b947844c7c2d8ff1029f5c99aace5 by Dimitris Vardoulakis <[email protected]>:

    Fix error in the specs README. The spec is the TargetConfig, which includes the device description.

    --
    39f8e9f343de45b4678a6760b9a9720d48163299 by Dimitris Vardoulakis <[email protected]>:

    Update xla/tools/hlo_opt/gpu_specs/README.md

    Co-authored-by: Allan Renucci <[email protected]>

    Merging this change closes #22645

    PiperOrigin-RevId: 726817625

commit e879b2214b018f96f5cbd4c0d97b001987ede2e0
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 01:02:32 2025 -0800

    compat: Update forward compatibility horizon to 2025-02-14

    PiperOrigin-RevId: 726815010

commit bfafd54974540a60feb3ee974fec946ff515151b
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 01:02:24 2025 -0800

    Update GraphDef version to 2138.

    PiperOrigin-RevId: 726814947

commit b894623899670545ed1c5691411bb50854e04864
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 00:11:35 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 726800949

commit 9863432fa84bd65f8591f378259e1557284dc76d
Author: A. Unique TensorFlower <[email protected]>
Date:   Fri Feb 14 00:02:02 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 726798360

commit 84abc58d90e7504acc000fe16e164422e05f111d
Author: chunhsue <[email protected]>
Date:   Fri Feb 14 16:07:29 2025 +0800

    Qualcomm AI Engine Direct - Another implementation of DUS without accuracy issue.

commit a5bbbebcc2d4a8bb68907bba60020965b401f071
Author: A. Unique TensorFlower <[email protected]>
Date:   Thu Feb 13 23:32:03 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 726791347

commit 5f11075f8fceecd2f916b4045a5b2056468fd518
Author: Zixuan Jiang <[email protected]>
Date:   Thu Feb 13 23:11:22 2025 -0800

    Refactor `SpmdPartitioningVisitor::HandleReshape`. No behavior change.

    This change recovers cl/717991433 with modification. The previous one is not a pure refactoring since it assumes that the inference `in_sharding_1 -> out_sharding -> in_sharding_2` will have `in_sharding_1 == in_sharding_2`. This assumption may be false. In the added test target, we reshape 24 -> 6x4, and have the following inferred shardings.
    ```
    in_sharding_1: [4]
    out_sharding: [2,1,2] last_tile_dim_replicate
    in_sharding_2: [2,2] last_tile_dim_replicate
    ```

    This change should a pure refactoring without behavior change.

    PiperOrigin-RevId: 726786386

commit 3d267b97b713011a52e9a7fc8ddb8d4072e6965e
Author: A. Unique TensorFlower <[email protected]>
Date:   Thu Feb 13 22:41:53 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 726779015

commit 9d767e29e50beba4a104d0217bbe94cd886f6f66
Author: A. Unique TensorFlower <[email protected]>
Date:   Thu Feb 13 22:34:39 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 726777146

commit fa1ae0b9c836735214574260f34820e04b42761d
Author: A. Unique TensorFlower <[email protected]>
Date:   Thu Feb 13 22:29:53 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 726775801

commit 59b5b12dcb0c6fada8c58e3ee0f38f7f26bf8117
Author: A. Unique TensorFlower <[email protected]>
Date:   Thu Feb 13 22:02:02 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 726769224

commit 5b9ce147986715803514e91595fbd29e848f2406
Author: Zixuan Jiang <[email protected]>
Date:   Thu Feb 13 19:36:24 2025 -0800

    Refactor `hlo_sharding_util::ReshapeSharding` by reducing the if-else branches.

    We also highlight a TODO in this cl, which will be revisited later.

    No behavior change.

    PiperOrigin-RevId: 726731325

commit 1f784be30927dfbd42e4b728441119a59b5b2fde
Author: Toli Yevtushenko <[email protected]>
Date:   Thu Feb 13 19:35:26 2025 -0800

    Make simpler to reason about expansion logic from Iota to legacy replica groups.

    PiperOrigin-RevId: 726730891

commit 8b5955f2eb0b5f28363093e25bf3a7ca8a94fc6e
Author: Toli Yevtushenko <[email protected]>
Date:   Thu Feb 13 17:48:52 2025 -0800

    Improve structure of GPU IR emission logic. #cleanup

    PiperOrigin-RevId: 726704548

commit ee8150c49ad8e73cf1dffadd5e09b453311d780b
Author: Julia Guo <[email protected]>
Date:   Thu Feb 13 17:47:57 2025 -0800

    [MultiHostHloRunner] Fix the scope for `GPURunnerProfiler`

    PiperOrigin-RevId: 726704332

commit af3646095fc927bf800e25ee72e0b94f1e623a0a
Author: Andrew Zhang <[email protected]>
Date:   Thu Feb 13 17:27:26 2025 -0800

    Disable weight sharing for legacy chips.

    PiperOrigin-RevId: 726699150

commit 439ed32feec3544499f8917fdc3cc0b3ac18b173
Author: A. Unique TensorFlower <[email protected]>
Date:   Thu Feb 13 17:14:48 2025 -0800

    Exit checkpoints_iterator promptly when no more checkpoints are expected.

    PiperOrigin-RevId: 726695438

commit c4ea8b99943d6dc07be941dc8f73dbdf1334e6c2
Author: Yin Zhang <[email protected]>
Date:   Thu Feb 13 17:11:08 2025 -0800

    Removed fingerprint compute in TfOpStats write time.

    PiperOrigin-RevId: 726694304

commit 7cc71bf4b43ef2dc1c9abf5ac82698ad127bb22e
Author: David Dunleavy <[email protected]>
Date:   Thu Feb 13 17:04:04 2025 -0800

    Cleanup obselete logic for Kokoro builds now that all builds modulo MacOS are on GitHub Actions

    `generate_index_html.sh` is unneeded for the MacOS build due to the build script calling it anyway: https://github.com/openxla/xla/blob/main/.kokoro/macos/build.sh#L23, and otherwise this change is an NFC

    PiperOrigin-RevId: 726692254

commit 068c9bad32335df487aa9b0ef0a626d974ad22a5
Author: Praveen Narayanan <[email protected]>
Date:   Thu Feb 13 16:57:06 2025 -0800

    Vectorize group_sizes by including more lhs dimensions.

    PiperOrigin-RevId: 726690023

commit 312141ffd1294fe663f9c28fa70834aa18dd1a57
Merge: 8b4f9966239 2a54437cf8f
Author: TensorFlower Gardener <[email protected]>
Date:   Thu Feb 13 16:40:18 2025 -0800

    Merge pull request #86630 from jiunkaiy:dev/chunhsue/add_DUS_and_Pack

    PiperOrigin-RevId: 726681462

commit 8b4f99662391806b6001bec15b4dfae4aa081cfc
Author: David Dunleavy <[email protected]>
Date:   Thu Feb 13 15:06:32 2025 -0800

    Remove docker logic from `build.py`.

    Docker images are fetched by GitHub Actions, and the only build that should remain on Kokoro is MacOS which doesn't use docker anyway.

    PiperOrigin-RevId: 726652107

commit b99e64062b9d5a4e5ea794858bb3b6ae7891bb62
Author: Kevin Gleason <[email protected]>
Date:   Thu Feb 13 14:53:58 2025 -0800

    [MHLO->StableHLO] Allow MHLO with XLA features to be partially imported to StableHLO+CHLO

    PiperOrigin-RevId: 726647695

commit a520eea12705b1e633ddad84391d0db7af6cd340
Author: Sergei Lebedev <[email protected]>
Date:   Thu Feb 13 14:52:07 2025 -0800

    [pjrt] Removed PjRtDevice overloads of `PjRtClient::CreateBuffersForAsyncHostToDevice`

    I also pulled in the `Shape`->`ShapeSpec` conversion code into the default
    implementation, since it was duplicated in a few clients.

    PiperOrigin-RevId: 726646980

commit bbe34850eadc09a0d19f58d36009a22918129112
Author: Frederik Gossen <[email protected]>
Date:   Thu Feb 13 14:42:21 2025 -0800

    Do not add control dependency from send to recv-done in decomposed collective-permute

    This control dependency is not needed in GPU.

    PiperOrigin-RevId: 726643669

commit 526895e26e6f1808737b524f2bf2fe8ed8456d93
Author: TJ Xu <[email protected]>
Date:   Thu Feb 13 14:41:34 2025 -0800

    PR #22588: Use cuda event and Rendezvous instead of nccl allreduce as a barrier

    Imported from GitHub PR https://github.com/openxla/xla/pull/22588

    Copybara import of the project:

    --
    5acbea5c6f5a1ed2deaae4bdbf472b9e72c6bf5e by TJ Xu <[email protected]>:

    Use cuda event and Rendezvous instead of nccl allreduce as a barrier

    --
    3d78e81091eb0b6cb1dc22f1ff27c3a8de4cff4f by TJ Xu <[email protected]>:

    Improve comment for the motivation

    Merging this change closes #22588

    PiperOrigin-RevId: 726643328

commit 55b6f82e929407962054b901340b89f527ea4367
Author: Frederik Gossen <[email protected]>
Date:   Thu Feb 13 13:52:08 2025 -0800

    Do not combine all-reduces with control dependencies.

    These control dependencies would otherwise be transferred to some get-tuple-element op, which triggers an assertion when replacing the op.

    PiperOrigin-RevId: 726624437

commit 103b18b5f169adb8a559fe50f6f7728e8bd917d7
Author: Dan Foreman-Mackey <[email protected]>
Date:   Thu Feb 13 13:41:12 2025 -0800

    [xla:python] Remove unused _single_device_array_to_np_array on ArrayImpl.

    PiperOrigin-RevId: 726620574

commit eed3fdd4f5a2fb82ece1e6908ad78d6755ded3fd
Author: A. Unique TensorFlower <[email protected]>
Date:   Thu Feb 13 13:30:32 2025 -0800

    Make tsl_cc_test default to linkstatic to catch duplicate symbols at build time.

    PiperOrigin-RevId: 726616180

commit c4226f427e5197b1f78ff28a3f90a50562999e48
Author: Kevin Gleason <[email protected]>
Date:   Thu Feb 13 13:04:00 2025 -0800

    [hlo-translate] Accept VHLO in hlo-translate tool

    PiperOrigin-RevId: 726606437

commit d5180bd9a33c729c62df8ea3d74db975ede3a790
Author: Matthias Guenther <[email protected]>
Date:   Thu Feb 13 12:11:48 2025 -0800

    Remake tool for inserting FileCheck directives in HLO optimization-pass tests.

    The tool previously required the user to perform most of the steps manually, only automating the replacement of hard-coded symbols with regex captures. It now automatically runs an optimizer on the test file, writes FileCheck directives based on the optimized HLO, replaces symbols with regex captures, and inserts the FileCheck directives above their respective test cases.

    The step of replacing explicit symbols with regex captures has also been improved to support capturing function names and to only add disambiguation suffixes when necessary.

    PiperOrigin-RevId: 726586730

commit 9c3b200fb5bbf4b50ce02ca0bee79c3f0162e057
Author: Abhinav Gunjal <[email protected]>
Date:   Thu Feb 13 12:01:56 2025 -0800

    [ODML] StablehloFuseConvolutionPass : migrated from MHLO -> StableHLO

    PiperOrigin-RevId: 726582966

commit a6cb1bb34dc0f3a848a2603b5107ec9c848b5005
Author: David Dunleavy <[email protected]>
Date:   Thu Feb 13 12:01:38 2025 -0800

    Enable `bazel build --nobuild` to prevent network flakes for TensorFlow builds

    Removes the usage of their `py_cpp_test_filters` config which is incompatible with `bazel build --nobuild` and instead replicate the effect of the config by specifying bazel options explicitly.

    PiperOrigin-RevId: 726582864

commit 3b7c491196733ecd842fb728a7b57d3195b7aa14
Author: Ezekiel Calubaquib <[email protected]>
Date:   Thu Feb 13 11:55:18 2025 -0800

    Flag to exclude tensorflow.lite from tf_python_api_gen_v2 to fix duplicate registration under LiteRT repo

    PiperOrigin-RevId: 726580613

commit 1eef54580ec6977c7dea249906b55dddcc32f6f8
Author: Matthias Guenther <[email protected]>
Date:   Thu Feb 13 11:45:57 2025 -0800

    Make `hlo-opt` return error status when given an invalid `--passes` argument.

    `hlo-opt` previously logged an error in this case but did not indicate an error in its exit status.

    Note that the program now exits immediately upon encountering an invalid `--passes` argument; it previously logged an error and continued executing.

    Fixing this bug also revealed that the `algebraic_simplifier.hlo` test file wasn't doing anything because the `AlgebraicSimplifier` pass wasn't registered in `hlo-opt`. This CL therefore also registers that pass and updates its test accordingly.

    PiperOrigin-RevId: 726576856

commit 5cd8f67e3e5dd19457187d91ae44931d408819e5
Author: A. Unique TensorFlower <[email protected]>
Date:   Thu Feb 13 11:39:55 2025 -0800

    Add model_name as field in v1 compat graph conversion count streamz metrics.

    PiperOrigin-RevId: 726574223

commit b8ffdbdf19bcc43639bc0f3c05ef7d9a4667527f
Author: Derek Murray <[email protected]>
Date:   Thu Feb 13 11:21:13 2025 -0800

    Factor out CheckpointIterator test from disabled test file.

    PiperOrigin-RevId: 726566643

commit 4f5c472843c638b2e3a45b35f43a25e9ee65fe67
Author: Reed Wanderman-Milne <[email protected]>
Date:   Thu Feb 13 10:44:18 2025 -0800

    In cudnn_fused_conv_rewriter.h, allow clamp to omitted when converting f32 to s8.

    As an implementation detail, XLA already clamps when converting float to int, so it's ok to pattern-match a fused_conv_outputting_f32->convert_to_s8 into a fused_conv_outputting_s8, even without a clamp in between the fused conv and convert.

    Even so, I would still recommend users have a clamp in their code, since the implicit clamping behavior is unspecified.

    PiperOrigin-RevId: 726550567

commit daceccba5f5435bbabdc021cf27a8c9da64eb3d1
Author: A. Unique TensorFlower <[email protected]>
Date:   Thu Feb 13 10:33:33 2025 -0800

    [XLA] Add debug option for detecting cycles in fixed-point loops.

    Due to the way the "changed" signal is reported by passes within a fixed-point loop today, there are various scenarios in which a fixed-point loop that is "converged" may continue to run forever:

    *  A composite pipeline is being run to fixed-point, and one pass exactly undoes the effect of another.
    *  An individual pass falsely reports that it changed a module (perhaps because it undoes its own change).
    *  The fixed-point loop sees the module go through a cycle of states.

    While this check is too expensive to enable by default, it presents as a useful debug option. If we have reason to suspect one of the above scenarios is occurring, this option will allow us to identify the passes involved and address the root cause on an individual basis.

    PiperOrigin-RevId: 726546024

commit de57f8004bff2bfab99692cf8e2faa0bd5c2871f
Author: Junwhan Ahn <[email protected]>
Date:   Thu Feb 13 10:21:05 2025 -0800

    Split `BasicDeviceList` into its own BUILD target and make it visible only to IFRT implementations

    After this CL, IFRT users will no longer have visibility to `BasicDeviceList`. This ensures that IFRT users use `Client::MakeDeviceList()` to create a device list instead of directly calling `BasicDeviceList::Create()`.

    PiperOrigin-RevId: 726540745

commit 44b2c2f8d49666cd971a68efae512dd36d67678f
Author: David Dunleavy <[email protected]>
Date:   Thu Feb 13 10:06:58 2025 -0800

    Delete Kokoro GPU builds now that GitHub Actions GPU builds block

    PiperOrigin-RevId: 726535011

commit 00cb7a92e10c2af2b79cef1788acde745b3b3178
Author: Dan Foreman-Mackey <[email protected]>
Date:   Thu Feb 13 09:34:32 2025 -0800

    Only cache jax.Array._npy_value when a copy is required.

    As discovered in https://github.com/jax-ml/jax/issues/26216, for non-standard dtypes, calling `np.array` on a JAX array will unnecessarily cache the constructed `_npy_value` even when a copy isn't require…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bazel "Peripheral" support tier build system: utils/bazel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants