Added usage to README

BryanCutler · BryanCutler · commit d83619c57b0c · 2019-01-15T20:27:38.000-08:00
diff --git a/README.md b/README.md
@@ -5,11 +5,12 @@
 TensorFlow I/O is a collection of file systems and file formats that are not
 available in TensorFlow's built-in support.
 
-At the moment TensorFlow I/O supports 4 data sources:
+At the moment TensorFlow I/O supports 5 data sources:
 - `tensorflow_io.ignite`: Data source for Apache Ignite and Ignite File System (IGFS).
 - `tensorflow_io.kafka`: Apache Kafka stream-processing support.
 - `tensorflow_io.kinesis`: Amazon Kinesis data streams support.
 - `tensorflow_io.hadoop`: Hadoop SequenceFile format support.
+- `tensorflow_io.arrow`: Apache Arrow data format support.
 
 ## Installation
 
@@ -42,6 +43,119 @@ Type "help", "copyright", "credits" or "license" for more information.
 Note that python has to run outside of repo directory itself, otherwise python may not
 be able to find the correct path to the module.
 
+## Using TensorFlow I/O
+
+### Apache Arrow Datasets
+
+Apache Arrow is a standard for in-memory columnar data, see [here](https://arrow.apache.org)
+for more information on the project. An Arrow dataset makes it easy to bring in
+structured columnar data into TensorFlow from the following sources:
+
+#### Pandas DataFrame
+
+An `ArrowDataset` can be made directly from an existing Pandas DataFrame, or
+pyarrow record batches, in a Python process. Tensor types and shapes can be
+inferred from the DataFrame, although currently only scalar and vector values
+with primitive types are supported. PyArrow must be installed to use this
+Dataset. Example usage:
+
+```python
+import tensorflow as tf
+from tensorflow_io.arrow import ArrowDataset
+
+# Assume `df` is an existing Pandas DataFrame
+dataset = ArrowDataset.from_pandas(df)
+
+iterator = dataset.make_one_shot_iterator()
+next_element = iterator.get_next()
+
+with tf.Session() as sess:
+  for i in range(len(df)):
+    print(sess.run(next_element))
+```
+
+NOTE: The entire DataFrame will be serialized to the Dataset and is not
+recommended for a large amount of data
+
+#### Arrow Feather Dataset
+
+Feather is a light-weight file format that provides a simple and efficient way
+to write a Pandas DataFrame to disk, see [here](https://arrow.apache.org/docs/python/ipc.html#feather-format)
+for more information and limitations of the format. An `ArrowFeatherDataset`
+can be created to read one or more Feather files. The following example shows
+how to write a feather file from a Pandas DataFrame, then read multiple files
+back as an `ArrowFeatherDataset`:
+
+```python
+from pyarrow.feather import write_feather
+
+# Assume `df` is an existing Pandas DataFrame with dtypes=(int32, float32)
+write_feather(df, '/path/to/a.feather')
+```
+
+```python
+import tensorflow as tf
+from tensorflow_io.arrow import ArrowFeatherDataset
+
+# Each Feather file must have the same column types, here we use the above
+# DataFrame which has 2 columns with dtypes=(int32, float32)
+dataset = ArrowFeatherDataset(
+    ['/path/to/a.feather', '/path/to/b.feather'],
+    columns=(0, 1),
+    output_types=(tf.int32, tf.float32),
+    output_shapes=([], []))
+
+iterator = dataset.make_one_shot_iterator()
+next_element = iterator.get_next()
+
+# This will iterate over each row of each file provided
+with tf.Session() as sess:
+  while True:
+    try:
+      print(sess.run(next_element))
+    except tf.errors.OutOfRangeError:
+      break
+```
+
+An alternate constructor can also be used to infer output types and shapes from
+a given `pyarrow.Schema`, e.g. `dataset = ArrowFeatherDataset.from_schema(filenames, schema)`
+
+### Arrow Stream Dataset
+
+The `ArrowStreamDataset` provides a Dataset that will connect to a host over
+a socket that is serving Arrow record batches in the Arrow stream format. See
+[here](https://arrow.apache.org/docs/python/ipc.html#writing-and-reading-streams)
+for more on the stream format. The following example will create an
+`ArrowStreamDataset` that will connect to a host that is serving an Arrow
+stream of record batches with 2 columns of dtypes=(int32, float32):
+
+```python
+import tensorflow as tf
+from tensorflow_io.arrow import ArrowStreamDataset
+
+# The str `host` should be in the format '<HOSTNAME>:<PORT>'
+dataset = ArrowStreamDataset(
+    host,
+    columns=(0, 1),
+    output_types=(tf.int32, tf.float32),
+    output_shapes=([], []))
+
+iterator = dataset.make_one_shot_iterator()
+next_element = iterator.get_next()
+
+# The host connection is made when the Dataset op is run and will iterate over
+# each row of each record batch until the Arrow stream is finished
+with tf.Session() as sess:
+  while True:
+    try:
+      print(sess.run(next_element))
+    except tf.errors.OutOfRangeError:
+      break
+```
+
+An alternate constructor can also be used to infer output types and shapes from
+a given `pyarrow.Schema`, e.g. `dataset = ArrowStreamDataset.from_schema(host, schema)`
+
 ## Developing
 
 ### Python