You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/openstack-integration.md
+36-16Lines changed: 36 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,22 @@
1
-
---
2
-
layout: global
1
+
yout: global
3
2
title: Accessing Openstack Swift storage from Spark
4
3
---
5
4
6
5
# Accessing Openstack Swift storage from Spark
7
6
8
-
Spark's file interface allows it to process data in Openstack Swift using the same URI formats that are supported for Hadoop. You can specify a path in Swift as input through a URI of the form `swift://<container.service_provider>/path`. You will also need to set your Swift security credentials, through `SparkContext.hadoopConfiguration`.
7
+
Spark's file interface allows it to process data in Openstack Swift using the same URI
8
+
formats that are supported for Hadoop. You can specify a path in Swift as input through a
9
+
URI of the form `swift://<container.service_provider>/path`. You will also need to set your
10
+
Swift security credentials, through `SparkContext.hadoopConfiguration`.
9
11
10
12
#Configuring Hadoop to use Openstack Swift
11
-
Openstack Swift driver was merged in Hadoop verion 2.3.0 ([Swift driver](https://issues.apache.org/jira/browse/HADOOP-8545)). Users that wish to use previous Hadoop versions will need to configure Swift driver manually. Current Swift driver requieres Swift to use Keystone authentication method. There are recent efforts to support also temp auth [Hadoop-10420](https://issues.apache.org/jira/browse/HADOOP-10420).
12
-
To configure Hadoop to work with Swift one need to modify core-sites.xml of Hadoop and setup Swift FS.
13
+
Openstack Swift driver was merged in Hadoop verion 2.3.0 ([Swift driver]
14
+
(https://issues.apache.org/jira/browse/HADOOP-8545)). Users that wish to use previous
15
+
Hadoop versions will need to configure Swift driver manually. Current Swift driver
16
+
requieres Swift to use Keystone authentication method. There are recent efforts to support
17
+
also temp auth [Hadoop-10420](https://issues.apache.org/jira/browse/HADOOP-10420).
18
+
To configure Hadoop to work with Swift one need to modify core-sites.xml of Hadoop and
19
+
setup Swift FS.
13
20
14
21
<configuration>
15
22
<property>
@@ -19,11 +26,12 @@ To configure Hadoop to work with Swift one need to modify core-sites.xml of Hado
19
26
</configuration>
20
27
21
28
#Configuring Swift
22
-
Proxy server of Swift should include `list_endpoints` middleware. More information available [here](https://github.com/openstack/swift/blob/master/swift/common/middleware/list_endpoints.py)
29
+
Proxy server of Swift should include `list_endpoints` middleware. More information
30
+
available [here] (https://github.com/openstack/swift/blob/master/swift/common/middleware/list_endpoints.py)
23
31
24
32
#Configuring Spark
25
-
To use Swift driver, Spark need to be compiled with `hadoop-openstack-2.3.0.jar`distributted with Hadoop 2.3.0.
26
-
For the Maven builds, Spark's main pom.xml should include
33
+
To use Swift driver, Spark need to be compiled with `hadoop-openstack-2.3.0.jar`
34
+
distributted with Hadoop 2.3.0. For the Maven builds, Spark's main pom.xml should include
27
35
28
36
<swift.version>2.3.0</swift.version>
29
37
@@ -42,10 +50,18 @@ in addition, pom.xml of the `core` and `yarn` projects should include
42
50
</dependency>
43
51
44
52
45
-
Additional parameters has to be provided to the Swift driver. Swift driver will use those parameters to perform authentication in Keystone prior accessing Swift. List of mandatory parameters is : `fs.swift.service.<PROVIDER>.auth.url`, `fs.swift.service.<PROVIDER>.auth.endpoint.prefix`, `fs.swift.service.<PROVIDER>.tenant`, `fs.swift.service.<PROVIDER>.username`,
46
-
`fs.swift.service.<PROVIDER>.password`, `fs.swift.service.<PROVIDER>.http.port`, `fs.swift.service.<PROVIDER>.http.port`, `fs.swift.service.<PROVIDER>.public`, where `PROVIDER` is any name. `fs.swift.service.<PROVIDER>.auth.url` should point to the Keystone authentication URL.
53
+
Additional parameters has to be provided to the Swift driver. Swift driver will use those
54
+
parameters to perform authentication in Keystone prior accessing Swift. List of mandatory
55
+
parameters is : `fs.swift.service.<PROVIDER>.auth.url`,
`fs.swift.service.<PROVIDER>.http.port`, `fs.swift.service.<PROVIDER>.public`, where
60
+
`PROVIDER` is any name. `fs.swift.service.<PROVIDER>.auth.url` should point to the Keystone
61
+
authentication URL.
47
62
48
-
Create core-sites.xml with the mandatory parameters and place it under /spark/conf directory. For example:
63
+
Create core-sites.xml with the mandatory parameters and place it under /spark/conf
64
+
directory. For example:
49
65
50
66
51
67
<property>
@@ -68,9 +84,13 @@ Create core-sites.xml with the mandatory parameters and place it under /spark/co
68
84
<value>true</value>
69
85
</property>
70
86
71
-
We left with `fs.swift.service.<PROVIDER>.tenant`, `fs.swift.service.<PROVIDER>.username`, `fs.swift.service.<PROVIDER>.password`. The best way to provide those parameters to SparkContext in run time, which seems to be impossible yet.
72
-
Another approach is to adapt Swift driver to obtain those values from system environment variables. For now we provide them via core-sites.xml.
73
-
Assume a tenant `test` with user `tester` was defined in Keystone, then the core-sites.xml shoud include:
87
+
We left with `fs.swift.service.<PROVIDER>.tenant`, `fs.swift.service.<PROVIDER>.username`,
88
+
`fs.swift.service.<PROVIDER>.password`. The best way to provide those parameters to
89
+
SparkContext in run time, which seems to be impossible yet.
90
+
Another approach is to adapt Swift driver to obtain those values from system environment
91
+
variables. For now we provide them via core-sites.xml.
92
+
Assume a tenant `test` with user `tester` was defined in Keystone, then the core-sites.xml
93
+
shoud include:
74
94
75
95
<property>
76
96
<name>fs.swift.service.<PROVIDER>.tenant</name>
@@ -85,8 +105,8 @@ Assume a tenant `test` with user `tester` was defined in Keystone, then the core
85
105
<value>testing</value>
86
106
</property>
87
107
# Usage
88
-
Assume there exists Swift container `logs` with an object `data.log`. To access `data.log`from Spark the `swift://` scheme should be used.
89
-
For example:
108
+
Assume there exists Swift container `logs` with an object `data.log`. To access `data.log`
109
+
from Spark the `swift://` scheme should be used. For example:
90
110
91
111
val sfdata = sc.textFile("swift://logs.<PROVIDER>/data.log")
0 commit comments