Skip to content

Conversation

@shuoweil
Copy link
Contributor

@shuoweil shuoweil commented Dec 19, 2025

This PR introduces interactive table displays for bigframes.Series objects, aligning their notebook behavior with bigframes.DataFrame. Now, when using the "anywidget" display mode, Series are rendered as interactive tables, significantly improving data exploration. Additionally, the standard text representation for a Series has been enhanced to include the total row count, providing more context at a glance.

User-Impactful Changes

  • Interactive Series Display: When bigframes.options.display.repr_mode is set to "anywidget", displaying a bigframes.Series in a notebook will now render an interactive widget. The widget presents the Series as a two-column table, showing the index and the values.

    Example Usage:

    import bigframes.pandas as bpd

    # Enable interactive mode
    bpd.options.display.repr_mode = "anywidget"

    # Create or load a DataFrame
    df = bpd.read_gbq(...)
    series = df["my_column"]

   # Displaying the series will now show an interactive table
   series
  • Enhanced Text Representation: The default text representation (e.g., when using print()) for a bigframes.Series now includes the total number of rows at the end, similar to DataFrames. For example:
   0       a
   1       b
   2       c
   ...
   Name: my_column, Length: 10, dtype: string

   [1000 rows]

These changes create a more consistent and intuitive user experience between Series and DataFrames within notebook environments.

Verified at:

  • vs code notebook: screen/6QWpRs5nuNESp6e
  • colab notebook: screen/4SPpQXtHxk5bRpk

Fixes #<460860439> 🦕

@shuoweil shuoweil self-assigned this Dec 19, 2025
@shuoweil shuoweil requested review from a team as code owners December 19, 2025 02:33
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@shuoweil shuoweil marked this pull request as draft December 19, 2025 02:33
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Dec 19, 2025
@shuoweil shuoweil removed the request for review from TrevorBergeron December 19, 2025 02:33
@shuoweil shuoweil marked this pull request as ready for review December 19, 2025 20:35
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this code is nearly identical to the logic in https://github.com/googleapis/python-bigquery-dataframes/blob/main/bigframes/dataframe.py with the following differences:

  1. Call self.to_frame() before rendering the TableWidget.
  2. Use the Series object when rendering the text/html and text/plain.

Could you please refactor dataframe.py and this file to use shared helper functions to reduce duplication? https://github.com/googleapis/python-bigquery-dataframes/blob/main/bigframes/display/html.py seems like it'd be and appropriate place for such helpers.

Comment on lines 597 to 600
# Add text representation
widget_repr["text/plain"] = self._create_text_representation(
widget._cached_data, widget.row_count
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like with dataframe, we should have the html rendering here too. See

# At this point, we have already executed the query as part of the
# widget construction. Let's use the information available to render
# the HTML and plain text versions.
widget_repr["text/html"] = self._create_html_representation(
widget._cached_data,
widget.row_count,
len(self.columns),
blob_cols,
)
widget_repr["text/plain"] = self._create_text_representation(
widget._cached_data, widget.row_count
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd refactor so that the code is shared. One way to do this would be to call the respective Series._create_html_representation or DataFrame._create_html_representation from the helper function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually need the SeriesWidget class, right? I believe this file can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no SeriesWidget class. We are reusing the existing TableWidget to display Series data. This file is a test file. It checks if bf DataFrames Series is correctly rendered as a two-column table (one for the index, one for the data). But I agree that the file name is misleading, I will merge all tests to table_widget.test.js to avoid confusion.

assert "text/plain" in data


def test_repr_in_anywidget_mode_should_not_be_deferred(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it is an important testcase. I will add them back.

@shuoweil shuoweil requested a review from tswast December 19, 2025 22:55
@shuoweil shuoweil marked this pull request as draft December 19, 2025 23:25
@shuoweil shuoweil marked this pull request as ready for review December 19, 2025 23:25
return "\n".join(table_html)


def create_text_representation(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the html module seems an inappropriate place for this. bigframes.display.plaintext would make more sense to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how does this differ from DataFrame._create_text_representation? Seems this is pretty redundant.

Just as I commented on create_html_representation below, it would be more object oriented to avoid the isinstance checks and put _create_text_representation on Series.

Copy link
Contributor Author

@shuoweil shuoweil Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I have refactored the display logic significantly in the latest revision to address this:

  1. Moved to bigframes.display.plaintext: As suggested, I created bigframes.display.plaintext and moved the text formatting logic there.
  2. Removed Redundancy: I removed _create_text_representation from both DataFrame and Series (and _create_html_representation from DataFrame). Both classes now delegate to the centralized functions in bigframes.display.plaintext and bigframes.display.html.
  3. Design Choice: I opted to centralize the display logic within the bigframes.display package to separate formatting concerns from the core data structures. This necessitates the isinstance checks within the display modules, but it keeps the DataFrame and Series classes cleaner and avoids circular dependency issues with the display logic.

from bigframes.display import html

return {"text/html": html_string, "text/plain": text_representation}
return html.repr_mimebundle(self, include=include, exclude=exclude)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any checks for deferred mode in html.repr_mimebundle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. repr_mimebundle now checks for the display mode and will delegate to a new repr_mimebundle_deferred function when appropriate.

return widget_repr, widget_metadata


def repr_mimebundle(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create separate methods for head and deferred, like repr_mimebundle_head and repr_mimebundle_deferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've split the logic into repr_mimebundle_head for standard display and repr_mimebundle_deferred for deferred execution mode.

blob_cols: list[str],
) -> str:
"""Create an HTML representation of the DataFrame or Series."""
if isinstance(obj, bigframes.series.Series):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had obj._create_html_representation on Series, we could avoid this isinstance check.

We could even add the same to Index at some point.

Copy link
Contributor Author

@shuoweil shuoweil Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep the logic centralized in bigframes.display.html rather than adding a new method to Series. This follows the direction of the other refactors in this branch, centralizing all formatting concerns within the bigframes.display package to keep the core data classes cleaner and simplify dependency management.

@shuoweil shuoweil requested a review from tswast December 22, 2025 22:36
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Dec 23, 2025
@shuoweil shuoweil force-pushed the shuowei-anywidget-series-display branch from cfbab13 to bd56992 Compare December 23, 2025 02:44
else:
# It's a DataFrame
opts = options.display
with display_options.pandas_repr(opts):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this out one level so that the pandas formatting options also affect the series repr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I refactored create_html_representation to wrap both the Series and DataFrame logic within the display_options.pandas_repr(opts) context manager.

except AttributeError:
html_string = f"<pre>{pd_series.to_string()}</pre>"

html_string += f"[{total_rows} rows]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's only do this if len(pd_series.index) < total_rows. We can use the same logic as in the plain text:

Suggested change
html_string += f"[{total_rows} rows]"
is_truncated = total_rows is not None and total_rows > len(pandas_df)
if is_truncated:
html_string += f"<p>[{total_rows} rows]</p>"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I updated the logic to check is_truncated = total_rows is not None and total_rows > len(pandas_df) and only append the row count if that condition is met.

Comment on lines 136 to 160
if options.display.blob_display and blob_cols:

def obj_ref_rt_to_html(obj_ref_rt) -> str:
obj_ref_rt_json = json.loads(obj_ref_rt)
obj_ref_details = obj_ref_rt_json["objectref"]["details"]
if "gcs_metadata" in obj_ref_details:
gcs_metadata = obj_ref_details["gcs_metadata"]
content_type = typing.cast(
str, gcs_metadata.get("content_type", "")
)
if content_type.startswith("image"):
size_str = ""
if options.display.blob_display_width:
size_str = (
f' width="{options.display.blob_display_width}"'
)
if options.display.blob_display_height:
size_str = (
size_str
+ f' height="{options.display.blob_display_height}"'
)
url = obj_ref_rt_json["access_urls"]["read_url"]
return f'<img src="{url}"{size_str}>'

return f'uri: {obj_ref_rt_json["objectref"]["uri"]}, authorizer: {obj_ref_rt_json["objectref"]["authorizer"]}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we'd want rich display for blob Series too. If possible refactor this out so that it can be used by both series and dataframe. If not, add a todo and internal bug for rich output for blob series.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted the blob formatting logic into a helper function _obj_ref_rt_to_html for future reusability. I also added a TODO(b/464053870) to implement rich display for Blob Series in a follow-up, linking it to the bug for implementing a proper repr module for non-pandas types

@shuoweil shuoweil requested a review from tswast December 23, 2025 20:37
@shuoweil shuoweil merged commit 7395d41 into main Dec 23, 2025
24 of 25 checks passed
@shuoweil shuoweil deleted the shuowei-anywidget-series-display branch December 23, 2025 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: xl Pull request size is extra large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants