Skip to content

[REVIEW] Match DataFrame.set_index with pandas #6231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Sep 23, 2020

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Sep 14, 2020

Closes #6198

Implements:

  • inplace, append, verify_integrity fields to the method
  • New test cases for these fields.

Addresses Issue 6198
- Adds support for `inplace` field
- Optimized logics

TODO:
- Add test cases
- Further support `append` and `verify_integirty` field
@isVoid isVoid requested a review from a team as a code owner September 14, 2020 23:21
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

TODO:
- verify_integrity
- write tests
	- input dataframe should include `Index` and `MultiIndex`
	- separate tests for inplace and verify_integrity

PR 6231
TODO:
- write tests
        - input dataframe should include `Index` and `MultiIndex`
        - separate tests for inplace and verify_integrity
- print duplicated keys

PR 6231
TODO:
	- print duplicated keys
	- support for heterogenous index lists

PR 6231
@codecov
Copy link

codecov bot commented Sep 17, 2020

Codecov Report

Merging #6231 into branch-0.16 will increase coverage by 0.27%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.16    #6231      +/-   ##
===============================================
+ Coverage        84.45%   84.73%   +0.27%     
===============================================
  Files               82       82              
  Lines            13846    14219     +373     
===============================================
+ Hits             11694    12048     +354     
- Misses            2152     2171      +19     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/io/parquet.py 91.35% <0.00%> (-0.85%) ⬇️
python/cudf/cudf/_version.py 44.80% <0.00%> (-0.72%) ⬇️
python/cudf/cudf/io/parquet.py 91.73% <0.00%> (-0.50%) ⬇️
python/cudf/cudf/core/series.py 90.96% <0.00%> (-0.37%) ⬇️
python/cudf/cudf/utils/applyutils.py 98.74% <0.00%> (-0.01%) ⬇️
python/cudf/cudf/utils/ioutils.py 85.84% <0.00%> (ø)
python/cudf/cudf/utils/docutils.py 100.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/csv.py 95.52% <0.00%> (ø)
python/cudf/cudf/core/column/string.py 86.05% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_csv.py 100.00% <0.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb6fb6f...1262e42. Read the comment docs.

@isVoid isVoid changed the title [WIP] Match DataFrame.set_index with pandas [REVIEW] Match DataFrame.set_index with pandas Sep 17, 2020
- Remove empty line to prevent empty code block
- Remove print statement
- Parameterize `inplace`

Co-authored-by: GALI PREM SAGAR <[email protected]>
@kkraus14 kkraus14 added 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer Python Affects Python cuDF API. labels Sep 17, 2020
@isVoid
Copy link
Contributor Author

isVoid commented Sep 18, 2020

Where's the problem with CI? Can't figure out the error message.

@galipremsagar
Copy link
Contributor

Probably one off issue. Rerun tests might fix it

@galipremsagar
Copy link
Contributor

rerun tests

Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor changes

- ._set_index default to None
- Use MultiIndex.from_frame instead of `source_data`

PR 6231
@kkraus14
Copy link
Collaborator

@galipremsagar can you review again when you get a chance?

@galipremsagar galipremsagar added 0 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Sep 22, 2020
isVoid and others added 3 commits September 21, 2020 20:54
- Code document updates
- Code style improvements
- Explicitly raising TypeError when input cannot be converted
- Throwing KeyError instead of ValueError when column is not found

Co-authored-by: GALI PREM SAGAR <[email protected]>
@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer and removed 0 - Waiting on Author Waiting for author to respond to review labels Sep 22, 2020
- Minor style and doc update.

Co-authored-by: GALI PREM SAGAR <[email protected]>
@isVoid
Copy link
Contributor Author

isVoid commented Sep 23, 2020

rerun tests

@kkraus14
Copy link
Collaborator

@galipremsagar is this good to go once CI reports green? This is blocking a nice dask-cudf groupby optimization 😄

@galipremsagar
Copy link
Contributor

@galipremsagar is this good to go once CI reports green? This is blocking a nice dask-cudf groupby optimization 😄

Yep, good to go

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Sep 23, 2020
@galipremsagar
Copy link
Contributor

rerun tests

@kkraus14 kkraus14 merged commit 2088b42 into rapidsai:branch-0.16 Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support "inplace" argument for set_index
4 participants