Skip to content

Commit 6c6b9b7

Browse files
Merge remote-tracking branch 'github/main' into polars_semi
2 parents fa02dff + c88a825 commit 6c6b9b7

File tree

89 files changed

+3651
-865
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+3651
-865
lines changed

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,31 @@
44

55
[1]: https://pypi.org/project/bigframes/#history
66

7+
## [2.8.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.7.0...v2.8.0) (2025-06-23)
8+
9+
10+
### ⚠ BREAKING CHANGES
11+
12+
* add required param 'engine' to multimodal functions ([#1834](https://github.com/googleapis/python-bigquery-dataframes/issues/1834))
13+
14+
### Features
15+
16+
* Add `bpd.options.compute.maximum_result_rows` option to limit client data download ([#1829](https://github.com/googleapis/python-bigquery-dataframes/issues/1829)) ([e22a3f6](https://github.com/googleapis/python-bigquery-dataframes/commit/e22a3f61a02cc1b7a5155556e5a07a1a2fea1d82))
17+
* Add `bpd.options.display.repr_mode = "anywidget"` to create an interactive display of the results ([#1820](https://github.com/googleapis/python-bigquery-dataframes/issues/1820)) ([be0a3cf](https://github.com/googleapis/python-bigquery-dataframes/commit/be0a3cf7711dadc68d8366ea90b99855773e2a2e))
18+
* Add DataFrame.ai.forecast() support ([#1828](https://github.com/googleapis/python-bigquery-dataframes/issues/1828)) ([7bc7f36](https://github.com/googleapis/python-bigquery-dataframes/commit/7bc7f36fc20d233f4cf5ed688cc5dcaf100ce4fb))
19+
* Add describe() method to Series ([#1827](https://github.com/googleapis/python-bigquery-dataframes/issues/1827)) ([a4205f8](https://github.com/googleapis/python-bigquery-dataframes/commit/a4205f882012820c034cb15d73b2768ec4ad3ac8))
20+
* Add required param 'engine' to multimodal functions ([#1834](https://github.com/googleapis/python-bigquery-dataframes/issues/1834)) ([37666e4](https://github.com/googleapis/python-bigquery-dataframes/commit/37666e4c137d52c28ab13477dfbcc6e92b913334))
21+
22+
23+
### Performance Improvements
24+
25+
* Produce simpler sql ([#1836](https://github.com/googleapis/python-bigquery-dataframes/issues/1836)) ([cf9c22a](https://github.com/googleapis/python-bigquery-dataframes/commit/cf9c22a09c4e668a598fa1dad0f6a07b59bc6524))
26+
27+
28+
### Documentation
29+
30+
* Add ai.forecast notebook ([#1840](https://github.com/googleapis/python-bigquery-dataframes/issues/1840)) ([2430497](https://github.com/googleapis/python-bigquery-dataframes/commit/24304972fdbdfd12c25c7f4ef5a7b280f334801a))
31+
732
## [2.7.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.6.0...v2.7.0) (2025-06-16)
833

934

bigframes/_config/compute_options.py

Lines changed: 39 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -55,29 +55,7 @@ class ComputeOptions:
5555
{'test2': 'abc', 'test3': False}
5656
5757
Attributes:
58-
maximum_bytes_billed (int, Options):
59-
Limits the bytes billed for query jobs. Queries that will have
60-
bytes billed beyond this limit will fail (without incurring a
61-
charge). If unspecified, this will be set to your project default.
62-
See `maximum_bytes_billed`: https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJobConfig#google_cloud_bigquery_job_QueryJobConfig_maximum_bytes_billed.
63-
64-
enable_multi_query_execution (bool, Options):
65-
If enabled, large queries may be factored into multiple smaller queries
66-
in order to avoid generating queries that are too complex for the query
67-
engine to handle. However this comes at the cost of increase cost and latency.
68-
69-
extra_query_labels (Dict[str, Any], Options):
70-
Stores additional custom labels for query configuration.
71-
72-
semantic_ops_confirmation_threshold (int, optional):
73-
.. deprecated:: 1.42.0
74-
Semantic operators are deprecated. Please use AI operators instead
75-
76-
semantic_ops_threshold_autofail (bool):
77-
.. deprecated:: 1.42.0
78-
Semantic operators are deprecated. Please use AI operators instead
79-
80-
ai_ops_confirmation_threshold (int, optional):
58+
ai_ops_confirmation_threshold (int | None):
8159
Guards against unexpected processing of large amount of rows by semantic operators.
8260
If the number of rows exceeds the threshold, the user will be asked to confirm
8361
their operations to resume. The default value is 0. Set the value to None
@@ -87,26 +65,57 @@ class ComputeOptions:
8765
Guards against unexpected processing of large amount of rows by semantic operators.
8866
When set to True, the operation automatically fails without asking for user inputs.
8967
90-
allow_large_results (bool):
68+
allow_large_results (bool | None):
9169
Specifies whether query results can exceed 10 GB. Defaults to False. Setting this
9270
to False (the default) restricts results to 10 GB for potentially faster execution;
9371
BigQuery will raise an error if this limit is exceeded. Setting to True removes
9472
this result size limit.
73+
74+
enable_multi_query_execution (bool | None):
75+
If enabled, large queries may be factored into multiple smaller queries
76+
in order to avoid generating queries that are too complex for the query
77+
engine to handle. However this comes at the cost of increase cost and latency.
78+
79+
extra_query_labels (Dict[str, Any] | None):
80+
Stores additional custom labels for query configuration.
81+
82+
maximum_bytes_billed (int | None):
83+
Limits the bytes billed for query jobs. Queries that will have
84+
bytes billed beyond this limit will fail (without incurring a
85+
charge). If unspecified, this will be set to your project default.
86+
See `maximum_bytes_billed`: https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJobConfig#google_cloud_bigquery_job_QueryJobConfig_maximum_bytes_billed.
87+
88+
maximum_result_rows (int | None):
89+
Limits the number of rows in an execution result. When converting
90+
a BigQuery DataFrames object to a pandas DataFrame or Series (e.g.,
91+
using ``.to_pandas()``, ``.peek()``, ``.__repr__()``, direct
92+
iteration), the data is downloaded from BigQuery to the client
93+
machine. This option restricts the number of rows that can be
94+
downloaded. If the number of rows to be downloaded exceeds this
95+
limit, a ``bigframes.exceptions.MaximumResultRowsExceeded``
96+
exception is raised.
97+
98+
semantic_ops_confirmation_threshold (int | None):
99+
.. deprecated:: 1.42.0
100+
Semantic operators are deprecated. Please use AI operators instead
101+
102+
semantic_ops_threshold_autofail (bool):
103+
.. deprecated:: 1.42.0
104+
Semantic operators are deprecated. Please use AI operators instead
95105
"""
96106

97-
maximum_bytes_billed: Optional[int] = None
107+
ai_ops_confirmation_threshold: Optional[int] = 0
108+
ai_ops_threshold_autofail: bool = False
109+
allow_large_results: Optional[bool] = None
98110
enable_multi_query_execution: bool = False
99111
extra_query_labels: Dict[str, Any] = dataclasses.field(
100112
default_factory=dict, init=False
101113
)
114+
maximum_bytes_billed: Optional[int] = None
115+
maximum_result_rows: Optional[int] = None
102116
semantic_ops_confirmation_threshold: Optional[int] = 0
103117
semantic_ops_threshold_autofail = False
104118

105-
ai_ops_confirmation_threshold: Optional[int] = 0
106-
ai_ops_threshold_autofail: bool = False
107-
108-
allow_large_results: Optional[bool] = None
109-
110119
def assign_extra_query_labels(self, **kwargs: Any) -> None:
111120
"""
112121
Assigns additional custom labels for query configuration. The method updates the

bigframes/_config/display_options.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ class DisplayOptions:
2929
max_columns: int = 20
3030
max_rows: int = 25
3131
progress_bar: Optional[str] = "auto"
32-
repr_mode: Literal["head", "deferred"] = "head"
32+
repr_mode: Literal["head", "deferred", "anywidget"] = "head"
3333

3434
max_info_columns: int = 100
3535
max_info_rows: Optional[int] = 200000

bigframes/core/compile/compiler.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ def compile_sql(request: configs.CompileRequest) -> configs.CompileResult:
6565
ordering: Optional[bf_ordering.RowOrdering] = result_node.order_by
6666
result_node = dataclasses.replace(result_node, order_by=None)
6767
result_node = cast(nodes.ResultNode, rewrites.column_pruning(result_node))
68+
result_node = cast(nodes.ResultNode, rewrites.defer_selection(result_node))
6869
sql = compile_result_node(result_node)
6970
# Return the ordering iff no extra columns are needed to define the row order
7071
if ordering is not None:

bigframes/core/compile/googlesql/query.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,13 @@ def _select_field(self, field) -> SelectExpression:
8383
return SelectExpression(expression=expr.ColumnExpression(name=field))
8484

8585
else:
86-
alias = field[1] if (field[0] != field[1]) else None
86+
alias = (
87+
expr.AliasExpression(field[1])
88+
if isinstance(field[1], str)
89+
else field[1]
90+
if (field[0] != field[1])
91+
else None
92+
)
8793
return SelectExpression(
8894
expression=expr.ColumnExpression(name=field[0]), alias=alias
8995
)
@@ -119,7 +125,7 @@ def sql(self) -> str:
119125
return "\n".join(text)
120126

121127

122-
@dataclasses.dataclass
128+
@dataclasses.dataclass(frozen=True)
123129
class SelectExpression(abc.SQLSyntax):
124130
"""This class represents `select_expression`."""
125131

bigframes/core/compile/sqlglot/compiler.py

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,9 @@ def _compile_sql(self, request: configs.CompileRequest) -> configs.CompileResult
8787
nodes.ResultNode, rewrite.column_pruning(result_node)
8888
)
8989
result_node = self._remap_variables(result_node)
90+
result_node = typing.cast(
91+
nodes.ResultNode, rewrite.defer_selection(result_node)
92+
)
9093
sql = self._compile_result_node(result_node)
9194
return configs.CompileResult(
9295
sql, result_node.schema.to_bigquery(), result_node.order_by
@@ -97,6 +100,9 @@ def _compile_sql(self, request: configs.CompileRequest) -> configs.CompileResult
97100
result_node = typing.cast(nodes.ResultNode, rewrite.column_pruning(result_node))
98101

99102
result_node = self._remap_variables(result_node)
103+
result_node = typing.cast(
104+
nodes.ResultNode, rewrite.defer_selection(result_node)
105+
)
100106
sql = self._compile_result_node(result_node)
101107
# Return the ordering iff no extra columns are needed to define the row order
102108
if ordering is not None:
@@ -125,10 +131,7 @@ def _compile_result_node(self, root: nodes.ResultNode) -> str:
125131
(name, scalar_compiler.compile_scalar_expression(ref))
126132
for ref, name in root.output_cols
127133
)
128-
# Skip squashing selections to ensure the right ordering and limit keys
129-
sqlglot_ir = self.compile_node(root.child).select(
130-
selected_cols, squash_selections=False
131-
)
134+
sqlglot_ir = self.compile_node(root.child).select(selected_cols)
132135

133136
if root.order_by is not None:
134137
ordering_cols = tuple(
@@ -208,6 +211,13 @@ def compile_projection(
208211
)
209212
return child.project(projected_cols)
210213

214+
@_compile_node.register
215+
def compile_filter(
216+
self, node: nodes.FilterNode, child: ir.SQLGlotIR
217+
) -> ir.SQLGlotIR:
218+
condition = scalar_compiler.compile_scalar_expression(node.predicate)
219+
return child.filter(condition)
220+
211221
@_compile_node.register
212222
def compile_concat(
213223
self, node: nodes.ConcatNode, *children: ir.SQLGlotIR
@@ -219,6 +229,14 @@ def compile_concat(
219229
uid_gen=self.uid_gen,
220230
)
221231

232+
@_compile_node.register
233+
def compile_explode(
234+
self, node: nodes.ExplodeNode, child: ir.SQLGlotIR
235+
) -> ir.SQLGlotIR:
236+
offsets_col = node.offsets_col.sql if (node.offsets_col is not None) else None
237+
columns = tuple(ref.id.sql for ref in node.column_ids)
238+
return child.explode(columns, offsets_col)
239+
222240

223241
def _replace_unsupported_ops(node: nodes.BigFrameNode):
224242
node = nodes.bottom_up(node, rewrite.rewrite_slice)
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
from __future__ import annotations
16+
17+
import sqlglot.expressions as sge
18+
19+
from bigframes import dtypes
20+
from bigframes import operations as ops
21+
from bigframes.core.compile.sqlglot.expressions.op_registration import OpRegistration
22+
from bigframes.core.compile.sqlglot.expressions.typed_expr import TypedExpr
23+
24+
BINARY_OP_REGISTRATION = OpRegistration()
25+
26+
27+
def compile(op: ops.BinaryOp, left: TypedExpr, right: TypedExpr) -> sge.Expression:
28+
return BINARY_OP_REGISTRATION[op](op, left, right)
29+
30+
31+
# TODO: add parenthesize for operators
32+
@BINARY_OP_REGISTRATION.register(ops.add_op)
33+
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
34+
if left.dtype == dtypes.STRING_DTYPE and right.dtype == dtypes.STRING_DTYPE:
35+
# String addition
36+
return sge.Concat(expressions=[left.expr, right.expr])
37+
38+
# Numerical addition
39+
return sge.Add(this=left.expr, expression=right.expr)
40+
41+
42+
@BINARY_OP_REGISTRATION.register(ops.ge_op)
43+
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
44+
return sge.GTE(this=left.expr, expression=right.expr)
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
from __future__ import annotations
16+
17+
import sqlglot.expressions as sge
18+
19+
from bigframes import operations as ops
20+
from bigframes.core.compile.sqlglot.expressions.op_registration import OpRegistration
21+
from bigframes.core.compile.sqlglot.expressions.typed_expr import TypedExpr
22+
23+
NARY_OP_REGISTRATION = OpRegistration()
24+
25+
26+
def compile(op: ops.NaryOp, *args: TypedExpr) -> sge.Expression:
27+
return NARY_OP_REGISTRATION[op](op, *args)
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
from __future__ import annotations
16+
17+
import typing
18+
19+
from sqlglot import expressions as sge
20+
21+
from bigframes import operations as ops
22+
23+
# We should've been more specific about input types. Unfortunately,
24+
# MyPy doesn't support more rigorous checks.
25+
CompilationFunc = typing.Callable[..., sge.Expression]
26+
27+
28+
class OpRegistration:
29+
def __init__(self) -> None:
30+
self._registered_ops: dict[str, CompilationFunc] = {}
31+
32+
def register(
33+
self, op: ops.ScalarOp | type[ops.ScalarOp]
34+
) -> typing.Callable[[CompilationFunc], CompilationFunc]:
35+
def decorator(item: CompilationFunc):
36+
def arg_checker(*args, **kwargs):
37+
if not isinstance(args[0], ops.ScalarOp):
38+
raise ValueError(
39+
f"The first parameter must be an operator. Got {type(args[0])}"
40+
)
41+
return item(*args, **kwargs)
42+
43+
key = typing.cast(str, op.name)
44+
if key in self._registered_ops:
45+
raise ValueError(f"{key} is already registered")
46+
self._registered_ops[key] = item
47+
return arg_checker
48+
49+
return decorator
50+
51+
def __getitem__(self, key: str | ops.ScalarOp) -> CompilationFunc:
52+
if isinstance(key, ops.ScalarOp):
53+
return self._registered_ops[key.name]
54+
return self._registered_ops[key]

0 commit comments

Comments
 (0)