Skip to content

Commit dfa2676

Browse files
committed
Adding domain specific examples
ghstack-source-id: 632f255 Pull Request resolved: #216
1 parent 69f9b0f commit dfa2676

File tree

3 files changed

+56
-9
lines changed

3 files changed

+56
-9
lines changed

docs/source/examples.rst

Lines changed: 53 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,64 @@ Examples
33

44
.. currentmodule:: examples
55

6-
Vision
6+
In this section, you will find the data loading implementations (using DataPipes) of various
7+
popular datasets across different research domains.
8+
9+
Audio
710
-----------
811

12+
LibriSpeech
13+
^^^^^^^^^^^^^^^^^^^^^^^^^^
14+
15+
`LibriSpeech dataset <https://www.openslr.org/12/>`_ is corpus of approximately 1000 hours of 16kHz read
16+
English speech. Here is the
17+
`DataPipe implementation of LibriSpeech <https://github.com/pytorch/data/blob/main/examples/audio/librispeech.py>`_
18+
to load the data.
19+
920
Text
1021
-----------
1122

12-
Audio
23+
IMDB
24+
^^^^^^^^^^^^^^^^^^^^^^^^^^
25+
This is a `large movie review dataset <http://ai.stanford.edu/~amaas/data/sentiment/>`_ for binary sentiment
26+
classification containing 25,000 highly polar movie reviews for training and 25,00 for testing. Here is the
27+
`DataPipe implementation to load the data <https://github.com/pytorch/data/blob/main/examples/text/imdb.py>`_.
28+
29+
30+
SQuAD
31+
^^^^^^^^^^^^^^^^^^^^^^^^^^
32+
`SQuAD (Stanford Question Answering Dataset) <https://rajpurkar.github.io/SQuAD-explorer/>`_ is a dataset for
33+
reading comprehension. It consists of a list of questions by crowdworkers on a set of Wikipedia articles. Here are the
34+
DataPipe implementations for `version 1.1 <https://github.com/pytorch/data/blob/main/examples/text/squad1.py>`_
35+
is here and `version 2.0 <https://github.com/pytorch/data/blob/main/examples/text/squad2.py>`_.
36+
37+
Additional Datasets in TorchText
38+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
39+
In a separate PyTorch domain library `TorchText <https://github.com/pytorch/text>`_, you will find some of the most
40+
popular datasets in the NLP field implemented as loadable datasets using DataPipes. You can find
41+
all of those `NLP datasets here <https://github.com/pytorch/text/tree/main/torchtext/datasets>`_.
42+
43+
44+
Vision
1345
-----------
1446

15-
Module contents
16-
---------------
47+
Caltech 101
48+
^^^^^^^^^^^^^^^^^^^^^^^^^^
49+
The `Caltech 101 dataset <http://www.vision.caltech.edu/Image_Datasets/Caltech101/>`_ contains pictures of objects
50+
belonging to 101 categories. Here is the
51+
`DataPipe implementation of Caltech 101 <https://github.com/pytorch/data/blob/main/examples/vision/caltech101.py>`_.
52+
53+
Caltech 256
54+
^^^^^^^^^^^^^^^^^^^^^^^^^^
55+
The `Caltech 256 dataset <http://www.vision.caltech.edu/Image_Datasets/Caltech256/>`_ contains 30607 images
56+
from 256 categories. Here is the
57+
`DataPipe implementation of Caltech 256 <https://github.com/pytorch/data/blob/main/examples/vision/caltech256.py>`_.
58+
59+
Additional Datasets in TorchVision
60+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
61+
In a separate PyTorch domain library `TorchVision <https://github.com/pytorch/vision>`_, you will find some of the most
62+
popular datasets in the computer vision field implemented as loadable datasets using DataPipes. You can find all of
63+
those `vision datasets here <https://github.com/pytorch/vision/tree/main/torchvision/prototype/datasets/_builtin>`_.
1764

18-
.. automodule:: examples
19-
:members:
20-
:undoc-members:
21-
:show-inheritance:
65+
Note that these implementations are currently in the prototype phase, but they should be fully supported
66+
in the coming months. Nonetheless, they demonstrate the different ways DataPipes can be used for data loading.

docs/source/tutorial.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,3 +211,5 @@ The following statements will be printed to show the shapes of a single batch of
211211
212212
Labels batch shape: 50
213213
Feature batch shape: torch.Size([50, 20])
214+
215+
You can find more DataPipe implementation examples for various research domains `on this page <torchexamples.html>`_.

torchdata/datapipes/iter/util/plain_text_reader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ class CSVDictParserIterDataPipe(_CSVBaseParserIterDataPipe):
190190
within the CSV files one row at a time (functional name: ``parse_csv_as_dict``).
191191
192192
Each output is a `Dict` by default, but it depends on ``fmtparams``. The first row of each file, unless skipped,
193-
will be used as the header; the contents of the header row will be used as keys for the `Dict`s
193+
will be used as the header; the contents of the header row will be used as keys for the `Dict`\s
194194
generated from the remaining rows.
195195
196196
Args:

0 commit comments

Comments
 (0)