feat: add `DataFrame.top_k` and `LazyFrame.top_k` #2977

raisadz · 2025-08-12T15:28:45Z

What type of PR is this? (check all applicable)

Related issues

Related issue #<issue number>
Closes enh?: {DataFrame/LazyFrame}.top_k #2947

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

…numeric

FBruzzesi

Thanks @raisadz - I left a few comments, only one which I really care about which is about length input validation at the narwhals level

FBruzzesi · 2025-08-16T09:00:36Z

narwhals/dataframe.py

+    def top_k(
+        self, k: int, *, by: str | Iterable[str], reverse: bool | Sequence[bool] = False
+    ) -> Self:
+        flatten_by = flatten([by])


Can we add a check that if reverse is a sequence, and it's length is different than flatten_by, then an exception is raise? This guarantees that zip(by, reverse) at the compliant level is same as zip_strict.

From polars:

df = pl.DataFrame( { "a": ["a", "b", "a", "b", "b", "c"], "b": [2, 1, 1, 3, 2, 1], } ) df.top_k(4, by=["b", "a"], reverse=[True])

ValueError: the length of reverse (1) does not match the length of by (2)

@raisadz I would still prefer to add a check at this level to also align the error with polars (notice that the output of flatten is a list anyway), but feel free to merge. We can follow up on it

i think there's some other places where this would be useful (like sort) so we could probably make a validation utility for this and use it in multiple places

FBruzzesi · 2025-08-16T09:07:08Z

narwhals/_duckdb/dataframe.py

@@ -409,6 +409,24 @@ def sort(self, *by: str, descending: bool | Sequence[bool], nulls_last: bool) ->
            )
        return self._with_native(self.native.sort(*it))

+    def top_k(self, k: int, *, by: Iterable[str], reverse: bool | Sequence[bool]) -> Self:
+        df = self.native  # noqa: F841


If you prefix the variable name with an underscore (_df) you can avoid the # noqa: F841 flag. It's hacky I know

narwhals/_ibis/dataframe.py

narwhals/_duckdb/dataframe.py

Co-authored-by: Francesco Bruzzesi <[email protected]>

raisadz · 2025-08-17T15:59:21Z

Thanks for the review @FBruzzesi ! I addressed your comments and will add zip_strict from #3003 after it is merged

MarcoGorelli

nice, thanks both @raisadz and @FBruzzesi !

happy to ship it if there's no further comments

raisadz added 4 commits August 11, 2025 19:16

feat: add top_k

181a723

add docstrings, flatten by, use nlargest/nsmallest for pandas/dask …

a1fb7c6

…numeric

handle column names with empty space, sort test results

6cd961b

Merge remote-tracking branch 'upstream/main' into feat/top_k

e920e66

raisadz added the pyspark Issue is related to pyspark backend label Aug 12, 2025

raisadz added 4 commits August 12, 2025 16:42

check schema only for columns in by

6cee5de

implement top_k for polars

d69f663

fix dask

32d9708

add no cover for exception

e2b76e1

raisadz marked this pull request as ready for review August 12, 2025 16:34

FBruzzesi reviewed Aug 16, 2025

View reviewed changes

FBruzzesi added the enhancement New feature or request label Aug 16, 2025

raisadz mentioned this pull request Aug 17, 2025

chore: replace zip with zip_strict #3003

Merged

10 tasks

raisadz and others added 7 commits August 17, 2025 11:30

Update narwhals/_ibis/dataframe.py

5c8ffdb

Co-authored-by: Francesco Bruzzesi <[email protected]>

Update narwhals/_duckdb/dataframe.py

68a505f

Co-authored-by: Francesco Bruzzesi <[email protected]>

Merge remote-tracking branch 'upstream/main' into feat/top_k

69f6bc5

address comments

ad7ea79

Merge remote-tracking branch 'upstream/main' into feat/top_k

cafb18d

add nulls_last for DuckDB, add a test with None

c57a406

skip polars test for Nones

bb1f349

raisadz added 2 commits August 19, 2025 13:53

Merge remote-tracking branch 'upstream/main' into feat/top_k

dbc0c72

replace zip with zip_strict

e45b151

MarcoGorelli approved these changes Aug 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add `DataFrame.top_k` and `LazyFrame.top_k` #2977

feat: add `DataFrame.top_k` and `LazyFrame.top_k` #2977

Uh oh!

raisadz commented Aug 12, 2025

Uh oh!

FBruzzesi left a comment

Uh oh!

FBruzzesi Aug 16, 2025

Uh oh!

FBruzzesi Aug 19, 2025 •

edited

Loading

Uh oh!

MarcoGorelli Aug 19, 2025

Uh oh!

FBruzzesi Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!

raisadz commented Aug 17, 2025

Uh oh!

MarcoGorelli left a comment

Uh oh!

Uh oh!

feat: add DataFrame.top_k and LazyFrame.top_k #2977

Are you sure you want to change the base?

feat: add DataFrame.top_k and LazyFrame.top_k #2977

Uh oh!

Conversation

raisadz commented Aug 12, 2025

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

FBruzzesi left a comment

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

raisadz commented Aug 17, 2025

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

feat: add `DataFrame.top_k` and `LazyFrame.top_k` #2977

feat: add `DataFrame.top_k` and `LazyFrame.top_k` #2977

FBruzzesi Aug 19, 2025 •

edited

Loading