Adding domain specific examples

NivekT · NivekT · commit dfa26765ded2 · 2022-02-11T12:25:32.000-05:00
ghstack-source-id: 632f255 Pull Request resolved: #216
diff --git a/docs/source/examples.rst b/docs/source/examples.rst
@@ -3,19 +3,64 @@ Examples
 
 .. currentmodule:: examples
 
-Vision
+In this section, you will find the data loading implementations (using DataPipes) of various
+popular datasets across different research domains.
+
+Audio
 -----------
 
+LibriSpeech
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+`LibriSpeech dataset <https://www.openslr.org/12/>`_ is corpus of approximately 1000 hours of 16kHz read
+English speech. Here is the
+`DataPipe implementation of LibriSpeech <https://github.com/pytorch/data/blob/main/examples/audio/librispeech.py>`_
+to load the data.
+
 Text
 -----------
 
-Audio
+IMDB
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+This is a `large movie review dataset <http://ai.stanford.edu/~amaas/data/sentiment/>`_ for binary sentiment
+classification containing 25,000 highly polar movie reviews for training and 25,00 for testing. Here is the
+`DataPipe implementation to load the data <https://github.com/pytorch/data/blob/main/examples/text/imdb.py>`_.
+
+
+SQuAD
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+`SQuAD (Stanford Question Answering Dataset) <https://rajpurkar.github.io/SQuAD-explorer/>`_ is a dataset for
+reading comprehension. It consists of a list of questions by crowdworkers on a set of Wikipedia articles. Here are the
+DataPipe implementations for `version 1.1 <https://github.com/pytorch/data/blob/main/examples/text/squad1.py>`_
+is here and `version 2.0 <https://github.com/pytorch/data/blob/main/examples/text/squad2.py>`_.
+
+Additional Datasets in TorchText
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+In a separate PyTorch domain library `TorchText <https://github.com/pytorch/text>`_, you will find some of the most
+popular datasets in the NLP field implemented as loadable datasets using DataPipes. You can find
+all of those `NLP datasets here <https://github.com/pytorch/text/tree/main/torchtext/datasets>`_.
+
+
+Vision
 -----------
 
-Module contents
----------------
+Caltech 101
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+The `Caltech 101 dataset <http://www.vision.caltech.edu/Image_Datasets/Caltech101/>`_ contains pictures of objects
+belonging to 101 categories. Here is the
+`DataPipe implementation of Caltech 101 <https://github.com/pytorch/data/blob/main/examples/vision/caltech101.py>`_.
+
+Caltech 256
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+The `Caltech 256 dataset <http://www.vision.caltech.edu/Image_Datasets/Caltech256/>`_ contains 30607 images
+from 256 categories. Here is the
+`DataPipe implementation of Caltech 256 <https://github.com/pytorch/data/blob/main/examples/vision/caltech256.py>`_.
+
+Additional Datasets in TorchVision
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+In a separate PyTorch domain library `TorchVision <https://github.com/pytorch/vision>`_, you will find some of the most
+popular datasets in the computer vision field implemented as loadable datasets using DataPipes. You can find all of
+those `vision datasets here <https://github.com/pytorch/vision/tree/main/torchvision/prototype/datasets/_builtin>`_.
 
-.. automodule:: examples
-   :members:
-   :undoc-members:
-   :show-inheritance:
+Note that these implementations are currently in the prototype phase, but they should be fully supported
+in the coming months. Nonetheless, they demonstrate the different ways DataPipes can be used for data loading.
diff --git a/docs/source/tutorial.rst b/docs/source/tutorial.rst
@@ -211,3 +211,5 @@ The following statements will be printed to show the shapes of a single batch of
 
     Labels batch shape: 50
     Feature batch shape: torch.Size([50, 20])
+
+You can find more DataPipe implementation examples for various research domains `on this page <torchexamples.html>`_.
diff --git a/torchdata/datapipes/iter/util/plain_text_reader.py b/torchdata/datapipes/iter/util/plain_text_reader.py
@@ -190,7 +190,7 @@ class CSVDictParserIterDataPipe(_CSVBaseParserIterDataPipe):
     within the CSV files one row at a time (functional name: ``parse_csv_as_dict``).
 
     Each output is a `Dict` by default, but it depends on ``fmtparams``. The first row of each file, unless skipped,
-    will be used as the header; the contents of the header row will be used as keys for the `Dict`s
+    will be used as the header; the contents of the header row will be used as keys for the `Dict`\s
     generated from the remaining rows.
 
     Args: