-
Notifications
You must be signed in to change notification settings - Fork 50
feat: add allow_large_results option #1428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -243,11 +244,13 @@ def execute( | |||
*, | |||
ordered: bool = True, | |||
col_id_overrides: Mapping[str, str] = {}, | |||
use_explicit_destination: bool = False, | |||
use_explicit_destination: Optional[bool] = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably add a check, that only one of ordered, use_explicit_destination is allowed at a time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some tests, seems this is against current to_pandas_batches logic? It set both to True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As chatted offline, they may work together, keep it for now.
bigframes/session/executor.py
Outdated
@@ -333,11 +336,13 @@ def export_gcs( | |||
uri: str, | |||
format: Literal["json", "csv", "parquet"], | |||
export_options: Mapping[str, Union[bool, str]], | |||
allow_large_results: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think export jobs should just assume a large result. Can't remember why we don't just export as a single job anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, now it's always use a destination table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit (repeated throughout): I'd like to make sure we only expose allow_large_results
as a keyword argument, not allowing positional access. That'll make sure we prevent breakages if folks are coming from pandas.
|
||
def to_latex( | ||
self, | ||
buf=None, | ||
columns: Sequence | None = None, | ||
header: bool | Sequence[str] = True, | ||
index: bool = True, | ||
allow_large_results=None, | ||
**kwargs, | ||
) -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: I'm a bit surprised this works. I guess we are type checking on Python 3.10 where this syntax was added (https://peps.python.org/pep-0604/), not 3.9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
columns=None, | ||
header=True, | ||
index=True, | ||
allow_large_results=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we don't want people to access allow_large_results
positionally.
allow_large_results=None, | |
*, | |
allow_large_results=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
return self.to_pandas().to_list() | ||
def tolist( | ||
self, | ||
allow_large_results: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we don't want people to access allow_large_results positionally.
allow_large_results: Optional[bool] = None, | |
*, | |
allow_large_results: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
bigframes/series.py
Outdated
@@ -1809,14 +1841,17 @@ def to_markdown( | |||
buf: typing.IO[str] | None = None, | |||
mode: str = "wt", | |||
index: bool = True, | |||
allow_large_results=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we don't want people to access allow_large_results positionally.
allow_large_results=None, | |
*, | |
allow_large_results=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
bigframes/core/indexes/base.py
Outdated
@@ -490,17 +490,28 @@ def __getitem__(self, key: int) -> typing.Any: | |||
else: | |||
raise NotImplementedError(f"Index key not supported {key}") | |||
|
|||
def to_pandas(self) -> pandas.Index: | |||
def to_pandas(self, allow_large_results: Optional[bool] = None) -> pandas.Index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's less necessary in this context, since we aren't trying to mimic pandas, but I'd still like to avoid using this parameter positionally.
def to_pandas(self, allow_large_results: Optional[bool] = None) -> pandas.Index: | |
def to_pandas(self, *, allow_large_results: Optional[bool] = None) -> pandas.Index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
bigframes/core/indexes/base.py
Outdated
|
||
def to_numpy(self, dtype=None, **kwargs) -> np.ndarray: | ||
return self.to_pandas().to_numpy(dtype, **kwargs) | ||
def to_numpy(self, dtype=None, allow_large_results=None, **kwargs) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we are trying to mimic pandas (https://pandas.pydata.org/pandas-docs/version/2.1.2/reference/api/pandas.Index.to_numpy.html), so it is very important to restrict use positionally.
def to_numpy(self, dtype=None, allow_large_results=None, **kwargs) -> np.ndarray: | |
def to_numpy(self, dtype=None, *, allow_large_results=None, **kwargs) -> np.ndarray: |
Otherwise, someone might have some pandas code that does index.to_numpy("int64", True)
where in pandas that "True" means copy=True
, but here it means something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THanks!
Modified behavior:
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕