Skip to content

Commit d614967

Browse files
nchammasrxin
authored andcommitted
[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically
As described in [SPARK-2627](https://issues.apache.org/jira/browse/SPARK-2627), we'd like Python code to automatically be checked for PEP 8 compliance by Jenkins. This pull request aims to do that. Notes: * We may need to install [`pep8`](https://pypi.python.org/pypi/pep8) on the build server. * I'm expecting tests to fail now that PEP 8 compliance is being checked as part of the build. I'm fine with cleaning up any remaining PEP 8 violations as part of this pull request. * I did not understand why the RAT and scalastyle reports are saved to text files. I did the same for the PEP 8 check, but only so that the console output style can match those for the RAT and scalastyle checks. The PEP 8 report is removed right after the check is complete. * Updates to the ["Contributing to Spark"](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) guide will be submitted elsewhere, as I don't believe that text is part of the Spark repo. Author: Nicholas Chammas <[email protected]> Author: nchammas <[email protected]> Closes apache#1744 from nchammas/master and squashes the following commits: 274b238 [Nicholas Chammas] [SPARK-2627] [PySpark] minor indentation changes 983d963 [nchammas] Merge pull request #5 from apache/master 1db5314 [nchammas] Merge pull request #4 from apache/master 0e0245f [Nicholas Chammas] [SPARK-2627] undo erroneous whitespace fixes bf30942 [Nicholas Chammas] [SPARK-2627] PEP8: comment spacing 6db9a44 [nchammas] Merge pull request #3 from apache/master 7b4750e [Nicholas Chammas] merge upstream changes 91b7584 [Nicholas Chammas] [SPARK-2627] undo unnecessary line breaks 44e3e56 [Nicholas Chammas] [SPARK-2627] use tox.ini to exclude files b09fae2 [Nicholas Chammas] don't wrap comments unnecessarily bfb9f9f [Nicholas Chammas] [SPARK-2627] keep up with the PEP 8 fixes 9da347f [nchammas] Merge pull request #2 from apache/master aa5b4b5 [Nicholas Chammas] [SPARK-2627] follow Spark bash style for if blocks d0a83b9 [Nicholas Chammas] [SPARK-2627] check that pep8 downloaded fine dffb5dd [Nicholas Chammas] [SPARK-2627] download pep8 at runtime a1ce7ae [Nicholas Chammas] [SPARK-2627] space out test report sections 21da538 [Nicholas Chammas] [SPARK-2627] it's PEP 8, not PEP8 6f4900b [Nicholas Chammas] [SPARK-2627] more misc PEP 8 fixes fe57ed0 [Nicholas Chammas] removing merge conflict backups 9c01d4c [nchammas] Merge pull request #1 from apache/master 9a66cb0 [Nicholas Chammas] resolving merge conflicts a31ccc4 [Nicholas Chammas] [SPARK-2627] miscellaneous PEP 8 fixes beaa9ac [Nicholas Chammas] [SPARK-2627] fail check on non-zero status 723ed39 [Nicholas Chammas] always delete the report file 0541ebb [Nicholas Chammas] [SPARK-2627] call Python linter from run-tests 12440fa [Nicholas Chammas] [SPARK-2627] add Scala linter 61c07b9 [Nicholas Chammas] [SPARK-2627] add Python linter 75ad552 [Nicholas Chammas] make check output style consistent
1 parent a6cd311 commit d614967

32 files changed

+348
-136
lines changed

dev/lint-python

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/usr/bin/env bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
21+
SPARK_ROOT_DIR="$(dirname $SCRIPT_DIR)"
22+
PEP8_REPORT_PATH="$SPARK_ROOT_DIR/dev/pep8-report.txt"
23+
24+
cd $SPARK_ROOT_DIR
25+
26+
# Get pep8 at runtime so that we don't rely on it being installed on the build server.
27+
#+ See: https://github.com/apache/spark/pull/1744#issuecomment-50982162
28+
#+ TODOs:
29+
#+ - Dynamically determine latest release version of pep8 and use that.
30+
#+ - Download this from a more reliable source. (GitHub raw can be flaky, apparently. (?))
31+
PEP8_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pep8.py"
32+
PEP8_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/jcrocholl/pep8/1.5.7/pep8.py"
33+
34+
curl --silent -o "$PEP8_SCRIPT_PATH" "$PEP8_SCRIPT_REMOTE_PATH"
35+
curl_status=$?
36+
37+
if [ $curl_status -ne 0 ]; then
38+
echo "Failed to download pep8.py from \"$PEP8_SCRIPT_REMOTE_PATH\"."
39+
exit $curl_status
40+
fi
41+
42+
43+
# There is no need to write this output to a file
44+
#+ first, but we do so so that the check status can
45+
#+ be output before the report, like with the
46+
#+ scalastyle and RAT checks.
47+
python $PEP8_SCRIPT_PATH ./python > "$PEP8_REPORT_PATH"
48+
pep8_status=${PIPESTATUS[0]} #$?
49+
50+
if [ $pep8_status -ne 0 ]; then
51+
echo "PEP 8 checks failed."
52+
cat "$PEP8_REPORT_PATH"
53+
else
54+
echo "PEP 8 checks passed."
55+
fi
56+
57+
rm -f "$PEP8_REPORT_PATH"
58+
rm "$PEP8_SCRIPT_PATH"
59+
60+
exit $pep8_status

dev/lint-scala

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#!/usr/bin/env bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
21+
SPARK_ROOT_DIR="$(dirname $SCRIPT_DIR)"
22+
23+
"$SCRIPT_DIR/scalastyle"

dev/run-tests

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,16 +66,25 @@ fi
6666
set -e
6767
set -o pipefail
6868

69+
echo ""
6970
echo "========================================================================="
7071
echo "Running Apache RAT checks"
7172
echo "========================================================================="
7273
dev/check-license
7374

75+
echo ""
7476
echo "========================================================================="
7577
echo "Running Scala style checks"
7678
echo "========================================================================="
77-
dev/scalastyle
79+
dev/lint-scala
7880

81+
echo ""
82+
echo "========================================================================="
83+
echo "Running Python style checks"
84+
echo "========================================================================="
85+
dev/lint-python
86+
87+
echo ""
7988
echo "========================================================================="
8089
echo "Running Spark unit tests"
8190
echo "========================================================================="
@@ -89,11 +98,13 @@ fi
8998
echo -e "q\n" | sbt/sbt $SBT_MAVEN_PROFILES_ARGS clean package assembly/assembly test | \
9099
grep -v -e "info.*Resolving" -e "warn.*Merging" -e "info.*Including"
91100

101+
echo ""
92102
echo "========================================================================="
93103
echo "Running PySpark tests"
94104
echo "========================================================================="
95105
./python/run-tests
96106

107+
echo ""
97108
echo "========================================================================="
98109
echo "Detecting binary incompatibilites with MiMa"
99110
echo "========================================================================="

dev/scalastyle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,5 @@ if test ! -z "$ERRORS"; then
3030
echo -e "Scalastyle checks failed at following occurrences:\n$ERRORS"
3131
exit 1
3232
else
33-
echo -e "Scalastyle checks passed.\n"
33+
echo -e "Scalastyle checks passed."
3434
fi

python/pyspark/accumulators.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ def _deserialize_accumulator(aid, zero_value, accum_param):
110110

111111

112112
class Accumulator(object):
113+
113114
"""
114115
A shared variable that can be accumulated, i.e., has a commutative and associative "add"
115116
operation. Worker tasks on a Spark cluster can add values to an Accumulator with the C{+=}
@@ -166,6 +167,7 @@ def __repr__(self):
166167

167168

168169
class AccumulatorParam(object):
170+
169171
"""
170172
Helper object that defines how to accumulate values of a given type.
171173
"""
@@ -186,6 +188,7 @@ def addInPlace(self, value1, value2):
186188

187189

188190
class AddingAccumulatorParam(AccumulatorParam):
191+
189192
"""
190193
An AccumulatorParam that uses the + operators to add values. Designed for simple types
191194
such as integers, floats, and lists. Requires the zero value for the underlying type
@@ -210,6 +213,7 @@ def addInPlace(self, value1, value2):
210213

211214

212215
class _UpdateRequestHandler(SocketServer.StreamRequestHandler):
216+
213217
"""
214218
This handler will keep polling updates from the same socket until the
215219
server is shutdown.
@@ -228,7 +232,9 @@ def handle(self):
228232
# Write a byte in acknowledgement
229233
self.wfile.write(struct.pack("!b", 1))
230234

235+
231236
class AccumulatorServer(SocketServer.TCPServer):
237+
232238
"""
233239
A simple TCP server that intercepts shutdown() in order to interrupt
234240
our continuous polling on the handler.
@@ -239,6 +245,7 @@ def shutdown(self):
239245
self.server_shutdown = True
240246
SocketServer.TCPServer.shutdown(self)
241247

248+
242249
def _start_update_server():
243250
"""Start a TCP server to receive accumulator updates in a daemon thread, and returns it"""
244251
server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)

python/pyspark/broadcast.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ def _from_id(bid):
4545

4646

4747
class Broadcast(object):
48+
4849
"""
4950
A broadcast variable created with
5051
L{SparkContext.broadcast()<pyspark.context.SparkContext.broadcast>}.

python/pyspark/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@
5656

5757

5858
class SparkConf(object):
59+
5960
"""
6061
Configuration for a Spark application. Used to set various Spark
6162
parameters as key-value pairs.

python/pyspark/context.py

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@
4747

4848

4949
class SparkContext(object):
50+
5051
"""
5152
Main entry point for Spark functionality. A SparkContext represents the
5253
connection to a Spark cluster, and can be used to create L{RDD}s and
@@ -213,7 +214,7 @@ def _ensure_initialized(cls, instance=None, gateway=None):
213214

214215
if instance:
215216
if (SparkContext._active_spark_context and
216-
SparkContext._active_spark_context != instance):
217+
SparkContext._active_spark_context != instance):
217218
currentMaster = SparkContext._active_spark_context.master
218219
currentAppName = SparkContext._active_spark_context.appName
219220
callsite = SparkContext._active_spark_context._callsite
@@ -406,7 +407,7 @@ def sequenceFile(self, path, keyClass=None, valueClass=None, keyConverter=None,
406407
batchSize = max(1, batchSize or self._default_batch_size_for_serialized_input)
407408
ser = BatchedSerializer(PickleSerializer()) if (batchSize > 1) else PickleSerializer()
408409
jrdd = self._jvm.PythonRDD.sequenceFile(self._jsc, path, keyClass, valueClass,
409-
keyConverter, valueConverter, minSplits, batchSize)
410+
keyConverter, valueConverter, minSplits, batchSize)
410411
return RDD(jrdd, self, ser)
411412

412413
def newAPIHadoopFile(self, path, inputFormatClass, keyClass, valueClass, keyConverter=None,
@@ -437,7 +438,8 @@ def newAPIHadoopFile(self, path, inputFormatClass, keyClass, valueClass, keyConv
437438
batchSize = max(1, batchSize or self._default_batch_size_for_serialized_input)
438439
ser = BatchedSerializer(PickleSerializer()) if (batchSize > 1) else PickleSerializer()
439440
jrdd = self._jvm.PythonRDD.newAPIHadoopFile(self._jsc, path, inputFormatClass, keyClass,
440-
valueClass, keyConverter, valueConverter, jconf, batchSize)
441+
valueClass, keyConverter, valueConverter,
442+
jconf, batchSize)
441443
return RDD(jrdd, self, ser)
442444

443445
def newAPIHadoopRDD(self, inputFormatClass, keyClass, valueClass, keyConverter=None,
@@ -465,7 +467,8 @@ def newAPIHadoopRDD(self, inputFormatClass, keyClass, valueClass, keyConverter=N
465467
batchSize = max(1, batchSize or self._default_batch_size_for_serialized_input)
466468
ser = BatchedSerializer(PickleSerializer()) if (batchSize > 1) else PickleSerializer()
467469
jrdd = self._jvm.PythonRDD.newAPIHadoopRDD(self._jsc, inputFormatClass, keyClass,
468-
valueClass, keyConverter, valueConverter, jconf, batchSize)
470+
valueClass, keyConverter, valueConverter,
471+
jconf, batchSize)
469472
return RDD(jrdd, self, ser)
470473

471474
def hadoopFile(self, path, inputFormatClass, keyClass, valueClass, keyConverter=None,
@@ -496,7 +499,8 @@ def hadoopFile(self, path, inputFormatClass, keyClass, valueClass, keyConverter=
496499
batchSize = max(1, batchSize or self._default_batch_size_for_serialized_input)
497500
ser = BatchedSerializer(PickleSerializer()) if (batchSize > 1) else PickleSerializer()
498501
jrdd = self._jvm.PythonRDD.hadoopFile(self._jsc, path, inputFormatClass, keyClass,
499-
valueClass, keyConverter, valueConverter, jconf, batchSize)
502+
valueClass, keyConverter, valueConverter,
503+
jconf, batchSize)
500504
return RDD(jrdd, self, ser)
501505

502506
def hadoopRDD(self, inputFormatClass, keyClass, valueClass, keyConverter=None,
@@ -523,8 +527,9 @@ def hadoopRDD(self, inputFormatClass, keyClass, valueClass, keyConverter=None,
523527
jconf = self._dictToJavaMap(conf)
524528
batchSize = max(1, batchSize or self._default_batch_size_for_serialized_input)
525529
ser = BatchedSerializer(PickleSerializer()) if (batchSize > 1) else PickleSerializer()
526-
jrdd = self._jvm.PythonRDD.hadoopRDD(self._jsc, inputFormatClass, keyClass, valueClass,
527-
keyConverter, valueConverter, jconf, batchSize)
530+
jrdd = self._jvm.PythonRDD.hadoopRDD(self._jsc, inputFormatClass, keyClass,
531+
valueClass, keyConverter, valueConverter,
532+
jconf, batchSize)
528533
return RDD(jrdd, self, ser)
529534

530535
def _checkpointFile(self, name, input_deserializer):
@@ -555,8 +560,7 @@ def union(self, rdds):
555560
first = rdds[0]._jrdd
556561
rest = [x._jrdd for x in rdds[1:]]
557562
rest = ListConverter().convert(rest, self._gateway._gateway_client)
558-
return RDD(self._jsc.union(first, rest), self,
559-
rdds[0]._jrdd_deserializer)
563+
return RDD(self._jsc.union(first, rest), self, rdds[0]._jrdd_deserializer)
560564

561565
def broadcast(self, value):
562566
"""
@@ -568,8 +572,7 @@ def broadcast(self, value):
568572
pickleSer = PickleSerializer()
569573
pickled = pickleSer.dumps(value)
570574
jbroadcast = self._jsc.broadcast(bytearray(pickled))
571-
return Broadcast(jbroadcast.id(), value, jbroadcast,
572-
self._pickled_broadcast_vars)
575+
return Broadcast(jbroadcast.id(), value, jbroadcast, self._pickled_broadcast_vars)
573576

574577
def accumulator(self, value, accum_param=None):
575578
"""

python/pyspark/daemon.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ def worker(sock):
4343
"""
4444
# Redirect stdout to stderr
4545
os.dup2(2, 1)
46-
sys.stdout = sys.stderr # The sys.stdout object is different from file descriptor 1
46+
sys.stdout = sys.stderr # The sys.stdout object is different from file descriptor 1
4747

4848
signal.signal(SIGHUP, SIG_DFL)
4949
signal.signal(SIGCHLD, SIG_DFL)
@@ -134,8 +134,7 @@ def handle_sigchld(*args):
134134
try:
135135
os.kill(worker_pid, signal.SIGKILL)
136136
except OSError:
137-
pass # process already died
138-
137+
pass # process already died
139138

140139
if listen_sock in ready_fds:
141140
sock, addr = listen_sock.accept()

python/pyspark/files.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020

2121
class SparkFiles(object):
22+
2223
"""
2324
Resolves paths to files added through
2425
L{SparkContext.addFile()<pyspark.context.SparkContext.addFile>}.

0 commit comments

Comments
 (0)