Skip to content

Commit 7607058

Browse files
committed
Update python API stuff in the programming guides and python docs
1 parent 208fbca commit 7607058

File tree

4 files changed

+33
-12
lines changed

4 files changed

+33
-12
lines changed

docs/streaming-flume-integration.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,6 @@ title: Spark Streaming + Flume Integration Guide
55

66
[Apache Flume](https://flume.apache.org/) is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Here we explain how to configure Flume and Spark Streaming to receive data from Flume. There are two approaches to this.
77

8-
<span class="badge" style="background-color: grey">Python API</span> Flume is not yet available in the Python API.
9-
108
## Approach 1: Flume-style Push-based Approach
119
Flume is designed to push data between Flume agents. In this approach, Spark Streaming essentially sets up a receiver that acts an Avro agent for Flume, to which Flume can push the data. Here are the configuration steps.
1210

docs/streaming-programming-guide.md

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -50,13 +50,7 @@ all of which are presented in this guide.
5050
You will find tabs throughout this guide that let you choose between code snippets of
5151
different languages.
5252

53-
**Note:** Python API for Spark Streaming has been introduced in Spark 1.2. It has all the DStream
54-
transformations and almost all the output operations available in Scala and Java interfaces.
55-
However, it only has support for basic sources like text files and text data over sockets.
56-
APIs for additional sources, like Kafka and Flume, will be available in the future.
57-
Further information about available features in the Python API are mentioned throughout this
58-
document; look out for the tag
59-
<span class="badge" style="background-color: grey">Python API</span>.
53+
**Note:** There are a few APIs that are either different or not available in Python. Throughout this guide, you will find the tag <span class="badge" style="background-color: grey">Python API</span> highlighting these differences.
6054

6155
***************************************************************************************************
6256

@@ -683,7 +677,7 @@ for Java, and [StreamingContext](api/python/pyspark.streaming.html#pyspark.strea
683677
{:.no_toc}
684678

685679
<span class="badge" style="background-color: grey">Python API</span> As of Spark {{site.SPARK_VERSION_SHORT}},
686-
out of these sources, *only* Kafka, Flume and MQTT are available in the Python API. We will add more advanced sources in the Python API in future.
680+
out of these sources, Kafka, Kinesis, Flume and MQTT are available in the Python API.
687681

688682
This category of sources require interfacing with external non-Spark libraries, some of them with
689683
complex dependencies (e.g., Kafka and Flume). Hence, to minimize issues related to version conflicts
@@ -725,9 +719,9 @@ Some of these advanced sources are as follows.
725719

726720
- **Kafka:** Spark Streaming {{site.SPARK_VERSION_SHORT}} is compatible with Kafka 0.8.2.1. See the [Kafka Integration Guide](streaming-kafka-integration.html) for more details.
727721

728-
- **Flume:** Spark Streaming {{site.SPARK_VERSION_SHORT}} is compatible with Flume 1.4.0. See the [Flume Integration Guide](streaming-flume-integration.html) for more details.
722+
- **Flume:** Spark Streaming {{site.SPARK_VERSION_SHORT}} is compatible with Flume 1.6.0. See the [Flume Integration Guide](streaming-flume-integration.html) for more details.
729723

730-
- **Kinesis:** See the [Kinesis Integration Guide](streaming-kinesis-integration.html) for more details.
724+
- **Kinesis:** Spark Streaming {{site.SPARK_VERSION_SHORT}} is compatible with Kinesis Client Library 1.2.1. See the [Kinesis Integration Guide](streaming-kinesis-integration.html) for more details.
731725

732726
- **Twitter:** Spark Streaming's TwitterUtils uses Twitter4j 3.0.3 to get the public stream of tweets using
733727
[Twitter's Streaming API](https://dev.twitter.com/docs/streaming-apis). Authentication information

python/docs/index.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,14 @@ Core classes:
2929

3030
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
3131

32+
:class:`pyspark.streaming.StreamingContext`
33+
34+
Main entry point for Spark Streaming functionality.
35+
36+
:class:`pyspark.streaming.DStream`
37+
38+
A Discretized Stream (DStream), the basic abstraction in Spark Streaming.
39+
3240
:class:`pyspark.sql.SQLContext`
3341

3442
Main entry point for DataFrame and SQL functionality.

python/docs/pyspark.streaming.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,24 @@ pyspark.streaming.kafka module
1515
:members:
1616
:undoc-members:
1717
:show-inheritance:
18+
19+
pyspark.streaming.kinesis module
20+
--------------------------------
21+
.. automodule:: pyspark.streaming.kinesis
22+
:members:
23+
:undoc-members:
24+
:show-inheritance:
25+
26+
pyspark.streaming.flume.module
27+
------------------------------
28+
.. automodule:: pyspark.streaming.flume
29+
:members:
30+
:undoc-members:
31+
:show-inheritance:
32+
33+
pyspark.streaming.mqtt module
34+
-----------------------------
35+
.. automodule:: pyspark.streaming.mqtt
36+
:members:
37+
:undoc-members:
38+
:show-inheritance:

0 commit comments

Comments
 (0)