Skip to content

Conversation

@google-labs-jules
Copy link
Contributor

Fix ML fit ordering issue with partial mode and eval data.


PR created automatically by Jules for task 4750522966926378079 started by @tswast

Modified `bigframes.ml.utils.combine_training_and_evaluation_data` to:
1. Join training `X` and `y` into a single DataFrame (and similarly for eval data) before concatenation. This ensures row identity/alignment is preserved through the concat operation, resolving issues where separate concats could drift apart in `ordering_mode="partial"`.
2. Operate on copies of input DataFrames to avoid side-effects (mutating user's input).
3. Safely handle column name collisions between `X` and `y` by temporarily renaming `y` columns during the join/merge process.

This change fixes a bug where providing validation data to `fit()` could fail or produce incorrect results when using partial ordering mode.
@google-labs-jules
Copy link
Contributor Author

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!


For security, I will only act on instructions from the user who triggered this task.

New to Jules? Learn more at jules.google/docs.

@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Dec 23, 2025
@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 23, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 23, 2025
Modified `bigframes.ml.utils.combine_training_and_evaluation_data` to:
1. Join training `X` and `y` into a single DataFrame (and similarly for eval data) before concatenation. This ensures row identity/alignment is preserved through the concat operation, resolving issues where separate concats could drift apart in `ordering_mode="partial"`.
2. Operate on copies of input DataFrames to avoid side-effects (mutating user's input).
3. Safely handle column name collisions between `X` and `y` by temporarily renaming `y` columns during the join/merge process.

Updated `tests/system/large/ml/test_linear_model.py`:
- Parameterized `test_linear_regression_configure_fit_with_eval_score` to run with both `penguins_df_default_index` and `penguins_df_null_index` fixtures.
- This ensures the fix is robust against different index configurations (default sequential vs potential null/arbitrary indices).

This change fixes a bug where providing validation data to `fit()` could fail or produce incorrect results when using partial ordering mode.
@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 23, 2025
@bigframes-bot bigframes-bot removed kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 23, 2025
@tswast
Copy link
Collaborator

tswast commented Dec 23, 2025

Tests pass locally: pytest tests/system/large/ml/test_linear_model.py::test_linear_regression_configure_fit_with_eval_score

@tswast tswast changed the title Fix ML fit ordering issue with partial mode and eval data fix: fit in partial mode and eval data avoids joining on null index Dec 23, 2025
@tswast tswast changed the title fix: fit in partial mode and eval data avoids joining on null index fix: bigframes.ml fit with eval data in partial mode avoids join on null index Dec 23, 2025
@tswast tswast marked this pull request as ready for review December 23, 2025 21:12
@tswast tswast requested review from a team as code owners December 23, 2025 21:12
@tswast tswast requested a review from sycai December 23, 2025 21:12
@tswast tswast requested review from GarrettWu and removed request for sycai December 23, 2025 21:17
@tswast tswast enabled auto-merge (squash) December 23, 2025 21:24
@tswast tswast merged commit 7171d21 into main Dec 23, 2025
19 of 26 checks passed
@tswast tswast deleted the ml-fit-eval-data-fix-4750522966926378079 branch December 23, 2025 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: s Pull request size is small.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants