Skip to content

Categorizing IterDataPipes #219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 132 additions & 46 deletions docs/source/torchdata.datapipes.iter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,109 @@ This is an updated version of ``IterableDataset`` in ``torch``.
.. autoclass:: IterDataPipe


We have three types of Iterable DataPipes:
We have different types of Iterable DataPipes:

1. Load - help you interact with the file systems or online databases (e.g. FileOpener, GDriveReader)
1. Archive - open and decompress archive files of different formats.

2. Transform - transform elements within DataPipes (e.g. batching, shuffling)
2. Augmenting - augment your samples (e.g. adding index, or cycle through indefinitely).

3. Utility - utility functions (e.g. caching, CSV parsing, filtering)
3. Combinatorial - perform combinatorial operations (e.g. sampling, shuffling).

Load DataPipes
4. Combining/Splitting - interact with multiple DataPipes by combining them or splitting one to many.

5. Grouping - group samples within a DataPipe

6. IO - interacting with the file systems or remote server (e.g. downloading, opening,
saving files, and listing the files in directories).

7. Mapping - apply the a given function to each element in the DataPipe.

8. Others - perform miscellaneous set of operations.

9. Selecting - select specific samples within a DataPipe.

10. Text - parse, read, and transform text files and data

Archive DataPipes
-------------------------

These DataPipes help opening and decompressing archive files of different formats.

.. autosummary::
:nosignatures:
:toctree: generated/
:template: datapipe.rst

Extractor
RarArchiveLoader
TarArchiveReader
XzFileReader
ZipArchiveReader

Augmenting DataPipes
-----------------------------
These DataPipes help to augment your samples.

.. autosummary::
:nosignatures:
:toctree: generated/
:template: datapipe.rst

Cycler
Enumerator
IndexAdder

Combinatorial DataPipes
-----------------------------
These DataPipes help to perform combinatorial operations.

.. autosummary::
:nosignatures:
:toctree: generated/
:template: datapipe.rst

Sampler
Shuffler

Combining/Spliting DataPipes
-----------------------------
These tend to involve multiple DataPipes, combining them or splitting one to many.

.. autosummary::
:nosignatures:
:toctree: generated/
:template: datapipe.rst

Concater
Demultiplexer
Forker
IterKeyZipper
MapKeyZipper
Multiplexer
SampleMultiplexer
UnZipper
Zipper

Grouping DataPipes
-----------------------------
These DataPipes have you group samples within a DataPipe.

.. autosummary::
:nosignatures:
:toctree: generated/
:template: datapipe.rst

Batcher
BucketBatcher
Collator
Grouper
UnBatcher

IO DataPipes
-------------------------

These DataPipes help you interact with the file systems or online databases (e.g. FileOpener, GDriveReader).
These DataPipes help interacting with the file systems or remote server (e.g. downloading, opening,
saving files, and listing the files in directories).

.. autosummary::
:nosignatures:
Expand All @@ -42,73 +133,68 @@ These DataPipes help you interact with the file systems or online databases (e.g
HttpReader
IoPathFileLister
IoPathFileOpener
IoPathSaver
OnlineReader
ParquetDataFrameLoader
Saver


Transform DataPipes
Mapping DataPipes
-------------------------

These DataPipes transform elements within DataPipes (e.g. batching, shuffling).
These DataPipes apply the a given function to each element in the DataPipe.

.. autosummary::
:nosignatures:
:toctree: generated/
:template: datapipe.rst

Batcher
BucketBatcher
Shuffler
FlatMapper
Mapper

Utility DataPipes
Other DataPipes
-------------------------

These DataPipes provide utility functions (e.g. caching, CSV parsing, filtering).
A miscellaneous set of DataPipes with different functionalities.

.. autosummary::
:nosignatures:
:toctree: generated/
:template: datapipe.rst

CSVDictParser
CSVParser
Collator
Concater
Cycler
DataFrameMaker
Demultiplexer
EndOnDiskCacheHolder
Enumerator
Extractor
Filter
FlatMapper
Forker
Grouper
HashChecker
Header
InMemoryCacheHolder
IndexAdder
IoPathSaver
IterKeyZipper
IterableWrapper
OnDiskCacheHolder
ShardingFilter

Selecting DataPipes
-------------------------

These DataPipes helps you select specific samples within a DataPipe.

.. autosummary::
:nosignatures:
:toctree: generated/
:template: datapipe.rst

Filter
Header

Text DataPipes
-----------------------------
These DataPipes help you parse, read, and transform text files and data.

.. autosummary::
:nosignatures:
:toctree: generated/
:template: datapipe.rst

CSVDictParser
CSVParser
JsonParser
LineReader
MapKeyZipper
Mapper
Multiplexer
OnDiskCacheHolder
ParagraphAggregator
RarArchiveLoader
RoutedDecoder
Rows2Columnar
SampleMultiplexer
Sampler
Saver
ShardingFilter
StreamReader
TarArchiveReader
UnBatcher
UnZipper
XzFileReader
ZipArchiveReader
Zipper