You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/programming-guide.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -363,7 +363,7 @@ data <- c(1, 2, 3, 4, 5)
363
363
distData <- parallelize(sc, data)
364
364
{% endhighlight %}
365
365
366
-
Once created, the distributed dataset (`distData`) can be operated on in parallel. For example, we can call `reduce(distData, function(a, b) {a + b})` to add up the elements of the list.
366
+
Once created, the distributed dataset (`distData`) can be operated on in parallel. For example, we can call `reduce(distData, "+")` to add up the elements of the list.
367
367
We describe operations on distributed datasets later on.
368
368
369
369
</div>
@@ -551,7 +551,7 @@ Text file RDDs can be created using `textFile` method. This method takes an URI
551
551
distFile <- textFile(sc, "data.txt")
552
552
{% endhighlight %}
553
553
554
-
Once created, `distFile` can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the `map` and `reduce` operations as follows: `reduce(map(distFile, length), function(a, b) {a + b})`.
554
+
Once created, `distFile` can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the `map` and `reduce` operations as follows: `reduce(map(distFile, length), "+")`.
555
555
556
556
Some notes on reading files with Spark:
557
557
@@ -667,7 +667,7 @@ To illustrate RDD basics, consider the simple program below:
667
667
{% highlight r %}
668
668
lines <- textFile(sc, "data.txt")
669
669
lineLengths <- map(lines, length)
670
-
totalLength <- reduce(lineLengths, function(a, b) {a + b})
670
+
totalLength <- reduce(lineLengths, "+")
671
671
{% endhighlight %}
672
672
673
673
The first line defines a base RDD from an external file. This dataset is not loaded in memory or
@@ -1070,7 +1070,7 @@ many times each line of text occurs in a file:
Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations) and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (string, numeric) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:
0 commit comments