You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Apache Flume](https://flume.apache.org/) is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Here we explain how to configure Flume and Spark Streaming to receive data from Flume. There are two approaches to this.
7
7
8
-
<spanclass="badge"style="background-color: grey">Python API</span> Flume is not yet available in the Python API.
9
-
10
8
## Approach 1: Flume-style Push-based Approach
11
9
Flume is designed to push data between Flume agents. In this approach, Spark Streaming essentially sets up a receiver that acts an Avro agent for Flume, to which Flume can push the data. Here are the configuration steps.
**Note:** There are a few APIs that are either different or not available in Python. Throughout this guide, you will find the tag <spanclass="badge"style="background-color: grey">Python API</span> highlighting these differences.
@@ -683,7 +677,7 @@ for Java, and [StreamingContext](api/python/pyspark.streaming.html#pyspark.strea
683
677
{:.no_toc}
684
678
685
679
<spanclass="badge"style="background-color: grey">Python API</span> As of Spark {{site.SPARK_VERSION_SHORT}},
686
-
out of these sources, *only*Kafka, Flume and MQTT are available in the Python API. We will add more advanced sources in the Python API in future.
680
+
out of these sources, Kafka, Kinesis, Flume and MQTT are available in the Python API.
687
681
688
682
This category of sources require interfacing with external non-Spark libraries, some of them with
689
683
complex dependencies (e.g., Kafka and Flume). Hence, to minimize issues related to version conflicts
@@ -725,9 +719,9 @@ Some of these advanced sources are as follows.
725
719
726
720
-**Kafka:** Spark Streaming {{site.SPARK_VERSION_SHORT}} is compatible with Kafka 0.8.2.1. See the [Kafka Integration Guide](streaming-kafka-integration.html) for more details.
727
721
728
-
-**Flume:** Spark Streaming {{site.SPARK_VERSION_SHORT}} is compatible with Flume 1.4.0. See the [Flume Integration Guide](streaming-flume-integration.html) for more details.
722
+
-**Flume:** Spark Streaming {{site.SPARK_VERSION_SHORT}} is compatible with Flume 1.6.0. See the [Flume Integration Guide](streaming-flume-integration.html) for more details.
729
723
730
-
-**Kinesis:** See the [Kinesis Integration Guide](streaming-kinesis-integration.html) for more details.
724
+
-**Kinesis:**Spark Streaming {{site.SPARK_VERSION_SHORT}} is compatible with Kinesis Client Library 1.2.1. See the [Kinesis Integration Guide](streaming-kinesis-integration.html) for more details.
731
725
732
726
-**Twitter:** Spark Streaming's TwitterUtils uses Twitter4j 3.0.3 to get the public stream of tweets using
733
727
[Twitter's Streaming API](https://dev.twitter.com/docs/streaming-apis). Authentication information
0 commit comments