-
Notifications
You must be signed in to change notification settings - Fork 63
feat: display series in anywidget mode #2346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This reverts commit c39293a.
… be displayed in one page
…t _literal (#2337) This change can resolve two doctests failures in #2248: `groupby.GroupBy.rank` and `bigframes.ml.metrics.roc_curve` Fixes internal issue 417774347🦕 --------- Co-authored-by: Shenyang Cai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of this code is nearly identical to the logic in https://github.com/googleapis/python-bigquery-dataframes/blob/main/bigframes/dataframe.py with the following differences:
- Call
self.to_frame()before rendering the TableWidget. - Use the Series object when rendering the
text/htmlandtext/plain.
Could you please refactor dataframe.py and this file to use shared helper functions to reduce duplication? https://github.com/googleapis/python-bigquery-dataframes/blob/main/bigframes/display/html.py seems like it'd be and appropriate place for such helpers.
bigframes/series.py
Outdated
| # Add text representation | ||
| widget_repr["text/plain"] = self._create_text_representation( | ||
| widget._cached_data, widget.row_count | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like with dataframe, we should have the html rendering here too. See
python-bigquery-dataframes/bigframes/dataframe.py
Lines 871 to 883 in 3c54c68
| # At this point, we have already executed the query as part of the | |
| # widget construction. Let's use the information available to render | |
| # the HTML and plain text versions. | |
| widget_repr["text/html"] = self._create_html_representation( | |
| widget._cached_data, | |
| widget.row_count, | |
| len(self.columns), | |
| blob_cols, | |
| ) | |
| widget_repr["text/plain"] = self._create_text_representation( | |
| widget._cached_data, widget.row_count | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd refactor so that the code is shared. One way to do this would be to call the respective Series._create_html_representation or DataFrame._create_html_representation from the helper function.
tests/js/series_widget.test.js
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually need the SeriesWidget class, right? I believe this file can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no SeriesWidget class. We are reusing the existing TableWidget to display Series data. This file is a test file. It checks if bf DataFrames Series is correctly rendered as a two-column table (one for the index, one for the data). But I agree that the file name is misleading, I will merge all tests to table_widget.test.js to avoid confusion.
| assert "text/plain" in data | ||
|
|
||
|
|
||
| def test_repr_in_anywidget_mode_should_not_be_deferred( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this test removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it is an important testcase. I will add them back.
bigframes/display/html.py
Outdated
| return "\n".join(table_html) | ||
|
|
||
|
|
||
| def create_text_representation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the html module seems an inappropriate place for this. bigframes.display.plaintext would make more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how does this differ from DataFrame._create_text_representation? Seems this is pretty redundant.
Just as I commented on create_html_representation below, it would be more object oriented to avoid the isinstance checks and put _create_text_representation on Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I have refactored the display logic significantly in the latest revision to address this:
- Moved to
bigframes.display.plaintext: As suggested, I created bigframes.display.plaintext and moved the text formatting logic there. - Removed Redundancy: I removed _create_text_representation from both DataFrame and Series (and _create_html_representation from DataFrame). Both classes now delegate to the centralized functions in bigframes.display.plaintext and bigframes.display.html.
- Design Choice: I opted to centralize the display logic within the bigframes.display package to separate formatting concerns from the core data structures. This necessitates the isinstance checks within the display modules, but it keeps the DataFrame and Series classes cleaner and avoids circular dependency issues with the display logic.
| from bigframes.display import html | ||
|
|
||
| return {"text/html": html_string, "text/plain": text_representation} | ||
| return html.repr_mimebundle(self, include=include, exclude=exclude) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any checks for deferred mode in html.repr_mimebundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. repr_mimebundle now checks for the display mode and will delegate to a new repr_mimebundle_deferred function when appropriate.
bigframes/display/html.py
Outdated
| return widget_repr, widget_metadata | ||
|
|
||
|
|
||
| def repr_mimebundle( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create separate methods for head and deferred, like repr_mimebundle_head and repr_mimebundle_deferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I've split the logic into repr_mimebundle_head for standard display and repr_mimebundle_deferred for deferred execution mode.
bigframes/display/html.py
Outdated
| blob_cols: list[str], | ||
| ) -> str: | ||
| """Create an HTML representation of the DataFrame or Series.""" | ||
| if isinstance(obj, bigframes.series.Series): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had obj._create_html_representation on Series, we could avoid this isinstance check.
We could even add the same to Index at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to keep the logic centralized in bigframes.display.html rather than adding a new method to Series. This follows the direction of the other refactors in this branch, centralizing all formatting concerns within the bigframes.display package to keep the core data classes cleaner and simplify dependency management.
This reverts commit 593f9ae.
cfbab13 to
bd56992
Compare
…and fix mypy errors
bigframes/display/html.py
Outdated
| else: | ||
| # It's a DataFrame | ||
| opts = options.display | ||
| with display_options.pandas_repr(opts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move this out one level so that the pandas formatting options also affect the series repr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I refactored create_html_representation to wrap both the Series and DataFrame logic within the display_options.pandas_repr(opts) context manager.
bigframes/display/html.py
Outdated
| except AttributeError: | ||
| html_string = f"<pre>{pd_series.to_string()}</pre>" | ||
|
|
||
| html_string += f"[{total_rows} rows]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's only do this if len(pd_series.index) < total_rows. We can use the same logic as in the plain text:
| html_string += f"[{total_rows} rows]" | |
| is_truncated = total_rows is not None and total_rows > len(pandas_df) | |
| if is_truncated: | |
| html_string += f"<p>[{total_rows} rows]</p>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I updated the logic to check is_truncated = total_rows is not None and total_rows > len(pandas_df) and only append the row count if that condition is met.
bigframes/display/html.py
Outdated
| if options.display.blob_display and blob_cols: | ||
|
|
||
| def obj_ref_rt_to_html(obj_ref_rt) -> str: | ||
| obj_ref_rt_json = json.loads(obj_ref_rt) | ||
| obj_ref_details = obj_ref_rt_json["objectref"]["details"] | ||
| if "gcs_metadata" in obj_ref_details: | ||
| gcs_metadata = obj_ref_details["gcs_metadata"] | ||
| content_type = typing.cast( | ||
| str, gcs_metadata.get("content_type", "") | ||
| ) | ||
| if content_type.startswith("image"): | ||
| size_str = "" | ||
| if options.display.blob_display_width: | ||
| size_str = ( | ||
| f' width="{options.display.blob_display_width}"' | ||
| ) | ||
| if options.display.blob_display_height: | ||
| size_str = ( | ||
| size_str | ||
| + f' height="{options.display.blob_display_height}"' | ||
| ) | ||
| url = obj_ref_rt_json["access_urls"]["read_url"] | ||
| return f'<img src="{url}"{size_str}>' | ||
|
|
||
| return f'uri: {obj_ref_rt_json["objectref"]["uri"]}, authorizer: {obj_ref_rt_json["objectref"]["authorizer"]}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that we'd want rich display for blob Series too. If possible refactor this out so that it can be used by both series and dataframe. If not, add a todo and internal bug for rich output for blob series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted the blob formatting logic into a helper function _obj_ref_rt_to_html for future reusability. I also added a TODO(b/464053870) to implement rich display for Blob Series in a follow-up, linking it to the bug for implementing a proper repr module for non-pandas types
This PR introduces interactive table displays for bigframes.Series objects, aligning their notebook behavior with bigframes.DataFrame. Now, when using the "anywidget" display mode, Series are rendered as interactive tables, significantly improving data exploration. Additionally, the standard text representation for a Series has been enhanced to include the total row count, providing more context at a glance.
User-Impactful Changes
Interactive Series Display: When bigframes.options.display.repr_mode is set to "anywidget", displaying a bigframes.Series in a notebook will now render an interactive widget. The widget presents the Series as a two-column table, showing the index and the values.
Example Usage:
These changes create a more consistent and intuitive user experience between Series and DataFrames within notebook environments.
Verified at:
Fixes #<460860439> 🦕