Skip to content

Commit 8496b26

Browse files
author
Davies Liu
committed
remove the docs related to RDD
1 parent e23b9d6 commit 8496b26

File tree

1 file changed

+0
-83
lines changed

1 file changed

+0
-83
lines changed

docs/sql-programming-guide.md

Lines changed: 0 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -573,37 +573,6 @@ for teenName in teenNames.collect():
573573

574574
</div>
575575

576-
<div data-lang="r" markdown="1">
577-
578-
Spark SQL can convert an RDD of list of objects to a DataFrame, inferring the datatypes. The keys of this list define the column names of the table, and the types are inferred by looking at the first row. Since we currently only look at the first row, it is important that there is no missing data in the first row of the RDD. In future versions we
579-
plan to more completely infer the schema by looking at more data, similar to the inference that is
580-
performed on JSON files.
581-
582-
{% highlight r %}
583-
# sc is an existing SparkContext.
584-
sqlContext <- sparkRSQL.init(sc)
585-
586-
# Load a text file and convert each line to a Row.
587-
lines <- textFile(sc, "examples/src/main/resources/people.txt")
588-
parts <- map(lines, function(line) {strsplit(line, ",")[[1]] })
589-
people <- map(parts, function(l) {list(name=l[[1]], age=as.integer(l[[2]]))} )
590-
591-
# Infer the schema, and register the DataFrame as a table.
592-
schemaPeople <- toDF(people)
593-
registerTempTable(schemaPeople, "people")
594-
595-
# SQL can be run over DataFrames that have been registered as a table.
596-
teenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")
597-
598-
# The results of SQL queries are RDDs and support all the normal RDD operations.
599-
teenNames <- map(teenagers, function(p) { paste("Name:", p$name)})
600-
for (teenName in collect(teenNames)) {
601-
cat(teenName, "\n")
602-
}
603-
{% endhighlight %}
604-
605-
</div>
606-
607576
</div>
608577

609578
### Programmatically Specifying the Schema
@@ -786,52 +755,6 @@ for name in names.collect():
786755

787756
</div>
788757

789-
<div data-lang="r" markdown="1">
790-
791-
When it can not figure the schema automatically (for example,
792-
the structure of records is encoded in a string, or a text dataset will be parsed and
793-
fields will be projected differently for different users),
794-
a `DataFrame` can be created programmatically with three steps.
795-
796-
1. Create an RDD of lists from the original RDD;
797-
2. Create the schema represented by a `StructType` matching the structure of
798-
lists in the RDD created in the step 1.
799-
3. Apply the schema to the RDD via `createDataFrame` method provided by `SQLContext`.
800-
801-
For example:
802-
{% highlight r %}
803-
# sc is an existing SparkContext.
804-
sqlContext = sparkRSQL.init(sc)
805-
806-
# Load a text file and convert each line to a tuple.
807-
lines <- textFile(sc, "examples/src/main/resources/people.txt")
808-
parts <- map(lines, function(line) {strsplit(line, ",")[[1]] })
809-
people <- map(parts, function(l) {list(name=l[[1]], age=as.integer(l[[2]]))} )
810-
811-
# The schema is encoded in a string.
812-
schema <- list(type="struct", fields=list(
813-
list(name="name", type="string", nullable=TRUE),
814-
list(name="age", type="integer", nullable=TRUE)
815-
))
816-
817-
# Apply the schema to the RDD.
818-
schemaPeople <- createDataFrame(sqlContext, people, schema)
819-
820-
# Register the DataFrame as a table.
821-
registerTempTable(schemaPeople, "people")
822-
823-
# SQL can be run over DataFrames that have been registered as a table.
824-
results <- sql(sqlContext, "SELECT name FROM people")
825-
826-
# The results of SQL queries are RDDs and support all the normal RDD operations.
827-
teenNames <- map(teenagers, function(p) { paste("Name:", p$name)})
828-
for (teenName in collect(teenNames)) {
829-
cat(teenName, "\n")
830-
}
831-
{% endhighlight %}
832-
833-
</div>
834-
835758
</div>
836759

837760

@@ -1477,7 +1400,6 @@ Spark SQL can automatically infer the schema of a JSON dataset and load it as a
14771400
This conversion can be done using one of two methods in a `SQLContext`:
14781401

14791402
* `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
1480-
* `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.
14811403

14821404
Note that the file that is offered as _jsonFile_ is not a typical JSON file. Each
14831405
line must contain a separate, self-contained valid JSON object. As a consequence,
@@ -1504,11 +1426,6 @@ registerTempTable(people, "people")
15041426

15051427
# SQL statements can be run by using the sql methods provided by `sqlContext`.
15061428
teenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")
1507-
1508-
# Alternatively, a DataFrame can be created for a JSON dataset represented by
1509-
# an RDD[String] storing one JSON object per string.
1510-
anotherPeopleRDD <- parallelize(sc, list('{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}'))
1511-
anotherPeople <- jsonRDD(sqlContext, anotherPeopleRDD)
15121429
{% endhighlight %}
15131430
</div>
15141431

0 commit comments

Comments
 (0)