Skip to content

Sync main docs and docstring for median_grouped(). #117214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 39 additions & 38 deletions Doc/library/statistics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ or sample.
:func:`median` Median (middle value) of data.
:func:`median_low` Low median of data.
:func:`median_high` High median of data.
:func:`median_grouped` Median, or 50th percentile, of grouped data.
:func:`median_grouped` Median (50th percentile) of grouped data.
:func:`mode` Single mode (most common value) of discrete or nominal data.
:func:`multimode` List of modes (most common values) of discrete or nominal data.
:func:`quantiles` Divide data into intervals with equal probability.
Expand Down Expand Up @@ -381,55 +381,56 @@ However, for reading convenience, most of the examples show sorted sequences.
be an actual data point rather than interpolated.


.. function:: median_grouped(data, interval=1)
.. function:: median_grouped(data, interval=1.0)

Return the median of grouped continuous data, calculated as the 50th
percentile, using interpolation. If *data* is empty, :exc:`StatisticsError`
is raised. *data* can be a sequence or iterable.
Estimates the median for numeric data that has been `grouped or binned
<https://en.wikipedia.org/wiki/Data_binning>`_ around the midpoints
of consecutive, fixed-width intervals.

.. doctest::
The *data* can be any iterable of numeric data with each value being
exactly the midpoint of a bin. At least one value must be present.

>>> median_grouped([52, 52, 53, 54])
52.5
The *interval* is the width of each bin.

In the following example, the data are rounded, so that each value represents
the midpoint of data classes, e.g. 1 is the midpoint of the class 0.5--1.5, 2
is the midpoint of 1.5--2.5, 3 is the midpoint of 2.5--3.5, etc. With the data
given, the middle value falls somewhere in the class 3.5--4.5, and
interpolation is used to estimate it:
For example, demographic information may have been summarized into
consecutive ten-year age groups with each group being represented
by the 5-year midpoints of the intervals:

.. doctest::

>>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
3.7

Optional argument *interval* represents the class interval, and defaults
to 1. Changing the class interval naturally will change the interpolation:
>>> from collections import Counter
>>> demographics = Counter({
... 25: 172, # 20 to 30 years old
... 35: 484, # 30 to 40 years old
... 45: 387, # 40 to 50 years old
... 55: 22, # 50 to 60 years old
... 65: 6, # 60 to 70 years old
... })
...

The 50th percentile (median) is the 536th person out of the 1071
member cohort. That person is in the 30 to 40 year old age group.

The regular :func:`median` function would assume that everyone in the
tricenarian age group was exactly 35 years old. A more tenable
assumption is that the 484 members of that age group are evenly
distributed between 30 and 40. For that, we use
:func:`median_grouped`:

.. doctest::

>>> median_grouped([1, 3, 3, 5, 7], interval=1)
3.25
>>> median_grouped([1, 3, 3, 5, 7], interval=2)
3.5

This function does not check whether the data points are at least
*interval* apart.

.. impl-detail::

Under some circumstances, :func:`median_grouped` may coerce data points to
floats. This behaviour is likely to change in the future.

.. seealso::
>>> data = list(demographics.elements())
>>> median(data)
35
>>> round(median_grouped(data, interval=10), 1)
37.5

* "Statistics for the Behavioral Sciences", Frederick J Gravetter and
Larry B Wallnau (8th Edition).
The caller is responsible for making sure the data points are separated
by exact multiples of *interval*. This is essential for getting a
correct result. The function does not check this precondition.

* The `SSMEDIAN
<https://help.gnome.org/users/gnumeric/stable/gnumeric.html#gnumeric-function-SSMEDIAN>`_
function in the Gnome Gnumeric spreadsheet, including `this discussion
<https://mail.gnome.org/archives/gnumeric-list/2011-April/msg00018.html>`_.
Inputs may be any numeric type that can be coerced to a float during
the interpolation step.


.. function:: mode(data)
Expand Down