Skip to content

Commit 0447c9f

Browse files
committed
Removed sample code.
1 parent e9c3761 commit 0447c9f

File tree

2 files changed

+10
-123
lines changed

2 files changed

+10
-123
lines changed

core/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444
</exclusion>
4545
</exclusions>
4646
</dependency>
47-
<dependency>
47+
<dependency>
4848
<groupId>net.java.dev.jets3t</groupId>
4949
<artifactId>jets3t</artifactId>
5050
</dependency>

docs/openstack-integration.md

Lines changed: 9 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: global
3-
title: OpenStack Integration
3+
title: OpenStack Swift Integration
44
---
55

66
* This will become a table of contents (this text will be scraped).
@@ -9,16 +9,12 @@ title: OpenStack Integration
99

1010
# Accessing OpenStack Swift from Spark
1111

12-
Spark's file interface allows it to process data in OpenStack Swift using the same URI
13-
formats that are supported for Hadoop. You can specify a path in Swift as input through a
14-
URI of the form <code>swift://<container.PROVIDER/path</code>. You will also need to set your
12+
Spark's support for Hadoop InputFormat allows it to process data in OpenStack Swift using the
13+
same URI formats as in Hadoop. You can specify a path in Swift as input through a
14+
URI of the form <code>swift://container.PROVIDER/path</code>. You will also need to set your
1515
Swift security credentials, through <code>core-sites.xml</code> or via
16-
<code>SparkContext.hadoopConfiguration</code>.
17-
Openstack Swift driver was merged in Hadoop version 2.3.0
18-
([Swift driver](https://issues.apache.org/jira/browse/HADOOP-8545)).
19-
Users that wish to use previous Hadoop versions will need to configure Swift driver manually.
20-
Current Swift driver requires Swift to use Keystone authentication method. There are recent efforts
21-
to support temp auth [Hadoop-10420](https://issues.apache.org/jira/browse/HADOOP-10420).
16+
<code>SparkContext.hadoopConfiguration</code>.
17+
Current Swift driver requires Swift to use Keystone authentication method.
2218

2319
# Configuring Swift
2420
Proxy server of Swift should include <code>list_endpoints</code> middleware. More information
@@ -27,9 +23,9 @@ available
2723

2824
# Dependencies
2925

30-
Spark should be compiled with <code>hadoop-openstack-2.3.0.jar</code> that is distributted with
31-
Hadoop 2.3.0. For the Maven builds, the <code>dependencyManagement</code> section of Spark's main
32-
<code>pom.xml</code> should include:
26+
The Spark application should include <code>hadoop-openstack</code> dependency.
27+
For example, for Maven support, add the following to the <code>pom.xml</code> file:
28+
3329
{% highlight xml %}
3430
<dependencyManagement>
3531
...
@@ -42,19 +38,6 @@ Hadoop 2.3.0. For the Maven builds, the <code>dependencyManagement</code> sectio
4238
</dependencyManagement>
4339
{% endhighlight %}
4440

45-
In addition, both <code>core</code> and <code>yarn</code> projects should add
46-
<code>hadoop-openstack</code> to the <code>dependencies</code> section of their
47-
<code>pom.xml</code>:
48-
{% highlight xml %}
49-
<dependencies>
50-
...
51-
<dependency>
52-
<groupId>org.apache.hadoop</groupId>
53-
<artifactId>hadoop-openstack</artifactId>
54-
</dependency>
55-
...
56-
</dependencies>
57-
{% endhighlight %}
5841

5942
# Configuration Parameters
6043

@@ -171,99 +154,3 @@ Notice that
171154
We suggest to keep those parameters in <code>core-sites.xml</code> for testing purposes when running Spark
172155
via <code>spark-shell</code>.
173156
For job submissions they should be provided via <code>sparkContext.hadoopConfiguration</code>.
174-
175-
# Usage examples
176-
177-
Assume Keystone's authentication URL is <code>http://127.0.0.1:5000/v2.0/tokens</code> and Keystone contains tenant <code>test</code>, user <code>tester</code> with password <code>testing</code>. In our example we define <code>PROVIDER=SparkTest</code>. Assume that Swift contains container <code>logs</code> with an object <code>data.log</code>. To access <code>data.log</code> from Spark the <code>swift://</code> scheme should be used.
178-
179-
180-
## Running Spark via spark-shell
181-
182-
Make sure that <code>core-sites.xml</code> contains <code>fs.swift.service.SparkTest.tenant</code>, <code>fs.swift.service.SparkTest.username</code>,
183-
<code>fs.swift.service.SparkTest.password</code>. Run Spark via <code>spark-shell</code> and access Swift via <code>swift://</code> scheme.
184-
185-
{% highlight scala %}
186-
val sfdata = sc.textFile("swift://logs.SparkTest/data.log")
187-
sfdata.count()
188-
{% endhighlight %}
189-
190-
191-
## Sample Application
192-
193-
In this case <code>core-sites.xml</code> need not contain <code>fs.swift.service.SparkTest.tenant</code>, <code>fs.swift.service.SparkTest.username</code>,
194-
<code>fs.swift.service.SparkTest.password</code>. Example of Java usage:
195-
196-
{% highlight java %}
197-
/* SimpleApp.java */
198-
import org.apache.spark.api.java.*;
199-
import org.apache.spark.SparkConf;
200-
import org.apache.spark.api.java.function.Function;
201-
202-
public class SimpleApp {
203-
public static void main(String[] args) {
204-
String logFile = "swift://logs.SparkTest/data.log";
205-
SparkConf conf = new SparkConf().setAppName("Simple Application");
206-
JavaSparkContext sc = new JavaSparkContext(conf);
207-
sc.hadoopConfiguration().set("fs.swift.service.ibm.tenant", "test");
208-
sc.hadoopConfiguration().set("fs.swift.service.ibm.password", "testing");
209-
sc.hadoopConfiguration().set("fs.swift.service.ibm.username", "tester");
210-
211-
JavaRDD<String> logData = sc.textFile(logFile).cache();
212-
long num = logData.count();
213-
214-
System.out.println("Total number of lines: " + num);
215-
}
216-
}
217-
{% endhighlight %}
218-
219-
The directory structure is
220-
{% highlight bash %}
221-
./src
222-
./src/main
223-
./src/main/java
224-
./src/main/java/SimpleApp.java
225-
{% endhighlight %}
226-
227-
Maven pom.xml should contain:
228-
{% highlight xml %}
229-
<project>
230-
<groupId>edu.berkeley</groupId>
231-
<artifactId>simple-project</artifactId>
232-
<modelVersion>4.0.0</modelVersion>
233-
<name>Simple Project</name>
234-
<packaging>jar</packaging>
235-
<version>1.0</version>
236-
<repositories>
237-
<repository>
238-
<id>Akka repository</id>
239-
<url>http://repo.akka.io/releases</url>
240-
</repository>
241-
</repositories>
242-
<build>
243-
<plugins>
244-
<plugin>
245-
<groupId>org.apache.maven.plugins</groupId>
246-
<artifactId>maven-compiler-plugin</artifactId>
247-
<version>2.3</version>
248-
<configuration>
249-
<source>1.6</source>
250-
<target>1.6</target>
251-
</configuration>
252-
</plugin>
253-
</plugins>
254-
</build>
255-
<dependencies>
256-
<dependency> <!-- Spark dependency -->
257-
<groupId>org.apache.spark</groupId>
258-
<artifactId>spark-core_2.10</artifactId>
259-
<version>1.0.0</version>
260-
</dependency>
261-
</dependencies>
262-
</project>
263-
{% endhighlight %}
264-
265-
Compile and execute
266-
{% highlight bash %}
267-
mvn package
268-
SPARK_HOME/spark-submit --class SimpleApp --master local[4] target/simple-project-1.0.jar
269-
{% endhighlight %}

0 commit comments

Comments
 (0)