Skip to content

Add Series.str,Series.dt, and Series.cat accessors to docs #8757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 11, 2022

Conversation

scharlottej13
Copy link
Contributor

I added the Series.str methods and Series.dt accessors to dataframe-api.rst, but am having trouble generating the autosummary. The specific Sphinx warning is WARNING: autosummary: failed to import Series.str.zfill, e.g., for all newly added lines.

I first tried to mimic the pandas docs, copying the changes made in https://github.com/pandas-dev/pandas/pull/9322. Then I tried using sphinx-autosummary-accessors, which is reflected in this PR. Would very much appreciate ideas/suggestions at this point, I'm not quite sure what else to try!

@github-actions github-actions bot added the documentation Improve or add to documentation label Feb 25, 2022
@keewis
Copy link
Contributor

keewis commented Mar 4, 2022

sphinx-autosummary-accessors requires the attribute to be accessible from the class, so e.g. Series.str.zfill shouldn't raise an AttributeError.

The reason this currently fails is because Series.str uses cache_readonly, which works like the builtin property: if accessed on the class object (__get__ is called with obj=None), it will return itself instead of the object it wraps. See also pydata/xarray#3988 for more discussion.

In summary, you might need to use a custom descriptor like pandas does (this is what we chose in xarray), or get pandas to change their implementation of cache_readonly / vendor it here.

@scharlottej13
Copy link
Contributor Author

Thanks @keewis for taking the time to explain this. After looking through the links you shared, it seems like even with using a custom descriptor (e.g. the pandas example you linked) we still would run into problems, since in Dask the Accessor class reads the methods from the _meta object.

After chatting w/ @ian-r-rose @ncclementi, creating a new auto-documenter class might be an option, where I could get the docstrings from creating a Series instance. Is this something you tried already with xarray?

cc @jsignell

@keewis
Copy link
Contributor

keewis commented Mar 8, 2022

since in Dask the Accessor class reads the methods from the _meta object.

I see what you mean, this does indeed not work with the existing documenters in sphinx-autosummary-accessors.

Is this something you tried already with xarray?

While I did consider to instantiate the class in the accessor documenter I never actually tried it because changing the descriptor seemed easier (and I could follow what pandas did).

In any case, feel free to ping me if you need help, and I would also be happy to include those instantiating Accessor* documenters and their templates in sphinx-autosummary-accessors.

scharlottej13 and others added 3 commits March 9, 2022 11:39
- Refactors the dataframe accessor implementation to define all
methods/properties statically on import. This makes it so that methods
can be accessed on the class itself, rather than only on instances.
- Use a different descriptor for adding the accessor classes to the
`Series` class, which lets the accessor methods be accessed via e.g.
`dd.Series.str.cat`.
- Fix a bug in `dd.Series.str.rsplit`
- General cleanliness improvements for the accessor code
- Update the api docs to include the accessors (still not quite working)
@jcrist jcrist marked this pull request as ready for review March 10, 2022 20:26
@github-actions github-actions bot added dataframe dispatch Related to `Dispatch` extension objects labels Mar 10, 2022
@jcrist
Copy link
Member

jcrist commented Mar 10, 2022

I've pushed a patch up that refactors our accessor implementation to define all methods/properties statically (and did a bunch of other cleanliness fixes while I was at it). Things are almost working now, but I'm a bit confused what's still broken. Current status:

  • Code tests pass locally (so implementation works as intended)
  • Docs are accessible on instances (e.g. some_series.str.capitalize.__doc__ exists and is correct)
  • Docs are accessible on class methods (e.g. dd.Series.str.capitalize.__doc__ exists and is correct)
  • Docs are accessible on class properties (e.g. dd.Series.dt.date.__doc__ exists and is correct)
  • make html runs (with some warnings) and generates docs pages for all accessor methods/properties

However, the accessor methods aren't documenting correctly. The accessor property docs are all working perfectly, but the methods show up with no docstring and as data.

image

I'm not sure what the issue is - @keewis do you have any thoughts?

@scharlottej13 scharlottej13 changed the title [WIP] - Add Series.str,Series.dt accessors to docs Add Series.str,Series.dt, and Series.cat accessors to docs Mar 10, 2022
@scharlottej13
Copy link
Contributor Author

scharlottej13 commented Mar 10, 2022

@jcrist this looks great!!!

However, the accessor methods aren't documenting correctly. The accessor property docs are all working perfectly, but the methods show up with no docstring and as data.

I changed the template from autosummary/accessor_attribute.rst to autosummary/accessor_method.rst and I think this fixed the issue:

Screen Shot 2022-03-10 at 1 56 25 PM

Comment on lines 268 to 270
.. autosummary::
:toctree: generated/
:template: autosummary/accessor_attribute.rst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might have to split this in two parts: one for properties like Series.dt.day (using the accessor_attribute template), and another for the methods (using the accessor_method template).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice catch. That's a bit unfortunate that this is needed. Knowing nothing about how sphinx works - is this a sphinx limitation, or is it something that this extension could handle with the same template for both?

Copy link
Contributor

@keewis keewis Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more like that is what pandas did the last time I checked. It would probably be possible to detect whether the attribute is a descriptor or a method and automatically use the appropriate documenter, but I never really tried.

Edit: I'm not entirely sure, but this seems to work for pandas.Series.cat

Also, at least for the projects I work on the split serves as a reminder that those are called differently (I'm not too familiar with dask.dataframe, though)

@jcrist
Copy link
Member

jcrist commented Mar 11, 2022

This should be good to go, will merge once tests pass.

@jcrist jcrist merged commit 9634da1 into dask:main Mar 11, 2022
@keewis
Copy link
Contributor

keewis commented Mar 11, 2022

the missing docstring for Series.dt.freq will have to be fixed in pandas, which also doesn't seem to have a docstring (although it has a description, which is a bit weird)

@scharlottej13 scharlottej13 deleted the docs-series-methods branch October 4, 2022 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataframe dispatch Related to `Dispatch` extension objects documentation Improve or add to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.str / Series.dt methods missing from Docs
3 participants