@@ -146,7 +146,7 @@ describes the various methods for loading data into a SchemaRDD.
146
146
147
147
Spark SQL supports two different methods for converting existing RDDs into SchemaRDDs. The first
148
148
method uses reflection to infer the schema of an RDD that contains specific types of objects. This
149
- reflection based approach leads to more concise code and works well when you already know the schema
149
+ reflection based approach leads to more concise code and works well when you already know the schema
150
150
while writing your Spark application.
151
151
152
152
The second method for creating SchemaRDDs is through a programmatic interface that allows you to
@@ -566,7 +566,7 @@ for teenName in teenNames.collect():
566
566
567
567
### Configuration
568
568
569
- Configuration of Parquet can be done using the ` setConf ` method on SQLContext or by running
569
+ Configuration of Parquet can be done using the ` setConf ` method on SQLContext or by running
570
570
` SET key=value ` commands using SQL.
571
571
572
572
<table class =" table " >
@@ -575,8 +575,8 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
575
575
<td ><code >spark.sql.parquet.binaryAsString</code ></td >
576
576
<td >false</td >
577
577
<td >
578
- Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
579
- not differentiate between binary data and strings when writing out the Parquet schema. This
578
+ Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
579
+ not differentiate between binary data and strings when writing out the Parquet schema. This
580
580
flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.
581
581
</td >
582
582
</tr >
@@ -591,10 +591,20 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
591
591
<td ><code >spark.sql.parquet.compression.codec</code ></td >
592
592
<td >gzip</td >
593
593
<td >
594
- Sets the compression codec use when writing Parquet files. Acceptable values include:
594
+ Sets the compression codec use when writing Parquet files. Acceptable values include:
595
595
uncompressed, snappy, gzip, lzo.
596
596
</td >
597
597
</tr >
598
+ <tr >
599
+ <td ><code >spark.sql.parquet.filterPushdown</code ></td >
600
+ <td >false</td >
601
+ <td >
602
+ Turn on Parquet filter pushdown optimization. This feature is turned off by default because of a known
603
+ bug in Paruet 1.6.0rc3 (<a href="https://issues.apache.org/jira/browse/PARQUET-136">PARQUET-136</a>).
604
+ However, if your table doesn't contain any nullable string or binary columns, it's still safe to turn
605
+ this feature on.
606
+ </td >
607
+ </tr >
598
608
<tr >
599
609
<td ><code >spark.sql.hive.convertMetastoreParquet</code ></td >
600
610
<td >true</td >
@@ -945,7 +955,7 @@ options.
945
955
946
956
## Migration Guide for Shark User
947
957
948
- ### Scheduling
958
+ ### Scheduling
949
959
To set a [ Fair Scheduler] ( job-scheduling.html#fair-scheduler-pools ) pool for a JDBC client session,
950
960
users can set the ` spark.sql.thriftserver.scheduler.pool ` variable:
951
961
0 commit comments