diff --git a/docs/source/examples.rst b/docs/source/examples.rst index cce3ebaaf..3ac5216b5 100644 --- a/docs/source/examples.rst +++ b/docs/source/examples.rst @@ -3,19 +3,64 @@ Examples .. currentmodule:: examples -Vision +In this section, you will find the data loading implementations (using DataPipes) of various +popular datasets across different research domains. + +Audio ----------- +LibriSpeech +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`LibriSpeech dataset `_ is corpus of approximately 1000 hours of 16kHz read +English speech. Here is the +`DataPipe implementation of LibriSpeech `_ +to load the data. + Text ----------- -Audio +IMDB +^^^^^^^^^^^^^^^^^^^^^^^^^^ +This is a `large movie review dataset `_ for binary sentiment +classification containing 25,000 highly polar movie reviews for training and 25,00 for testing. Here is the +`DataPipe implementation to load the data `_. + + +SQuAD +^^^^^^^^^^^^^^^^^^^^^^^^^^ +`SQuAD (Stanford Question Answering Dataset) `_ is a dataset for +reading comprehension. It consists of a list of questions by crowdworkers on a set of Wikipedia articles. Here are the +DataPipe implementations for `version 1.1 `_ +is here and `version 2.0 `_. + +Additional Datasets in TorchText +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +In a separate PyTorch domain library `TorchText `_, you will find some of the most +popular datasets in the NLP field implemented as loadable datasets using DataPipes. You can find +all of those `NLP datasets here `_. + + +Vision ----------- -Module contents ---------------- +Caltech 101 +^^^^^^^^^^^^^^^^^^^^^^^^^^ +The `Caltech 101 dataset `_ contains pictures of objects +belonging to 101 categories. Here is the +`DataPipe implementation of Caltech 101 `_. + +Caltech 256 +^^^^^^^^^^^^^^^^^^^^^^^^^^ +The `Caltech 256 dataset `_ contains 30607 images +from 256 categories. Here is the +`DataPipe implementation of Caltech 256 `_. + +Additional Datasets in TorchVision +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +In a separate PyTorch domain library `TorchVision `_, you will find some of the most +popular datasets in the computer vision field implemented as loadable datasets using DataPipes. You can find all of +those `vision datasets here `_. -.. automodule:: examples - :members: - :undoc-members: - :show-inheritance: +Note that these implementations are currently in the prototype phase, but they should be fully supported +in the coming months. Nonetheless, they demonstrate the different ways DataPipes can be used for data loading. diff --git a/docs/source/tutorial.rst b/docs/source/tutorial.rst index dfcaf30d4..43b456ad6 100644 --- a/docs/source/tutorial.rst +++ b/docs/source/tutorial.rst @@ -211,3 +211,5 @@ The following statements will be printed to show the shapes of a single batch of Labels batch shape: 50 Feature batch shape: torch.Size([50, 20]) + +You can find more DataPipe implementation examples for various research domains `on this page `_. diff --git a/torchdata/datapipes/iter/util/plain_text_reader.py b/torchdata/datapipes/iter/util/plain_text_reader.py index 4dfe7d1b1..92b04f467 100644 --- a/torchdata/datapipes/iter/util/plain_text_reader.py +++ b/torchdata/datapipes/iter/util/plain_text_reader.py @@ -190,7 +190,7 @@ class CSVDictParserIterDataPipe(_CSVBaseParserIterDataPipe): within the CSV files one row at a time (functional name: ``parse_csv_as_dict``). Each output is a `Dict` by default, but it depends on ``fmtparams``. The first row of each file, unless skipped, - will be used as the header; the contents of the header row will be used as keys for the `Dict`s + will be used as the header; the contents of the header row will be used as keys for the `Dict`\s generated from the remaining rows. Args: