Skip to content

Commit 5db8dca

Browse files
lianchengmarmbrus
authored andcommitted
[SPARK-4258][SQL][DOC] Documents spark.sql.parquet.filterPushdown
Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440) <!-- Reviewable:end --> Author: Cheng Lian <[email protected]> Closes #3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits: 2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown
1 parent 2b233f5 commit 5db8dca

File tree

1 file changed

+16
-6
lines changed

1 file changed

+16
-6
lines changed

docs/sql-programming-guide.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ describes the various methods for loading data into a SchemaRDD.
146146

147147
Spark SQL supports two different methods for converting existing RDDs into SchemaRDDs. The first
148148
method uses reflection to infer the schema of an RDD that contains specific types of objects. This
149-
reflection based approach leads to more concise code and works well when you already know the schema
149+
reflection based approach leads to more concise code and works well when you already know the schema
150150
while writing your Spark application.
151151

152152
The second method for creating SchemaRDDs is through a programmatic interface that allows you to
@@ -566,7 +566,7 @@ for teenName in teenNames.collect():
566566

567567
### Configuration
568568

569-
Configuration of Parquet can be done using the `setConf` method on SQLContext or by running
569+
Configuration of Parquet can be done using the `setConf` method on SQLContext or by running
570570
`SET key=value` commands using SQL.
571571

572572
<table class="table">
@@ -575,8 +575,8 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
575575
<td><code>spark.sql.parquet.binaryAsString</code></td>
576576
<td>false</td>
577577
<td>
578-
Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
579-
not differentiate between binary data and strings when writing out the Parquet schema. This
578+
Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
579+
not differentiate between binary data and strings when writing out the Parquet schema. This
580580
flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.
581581
</td>
582582
</tr>
@@ -591,10 +591,20 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
591591
<td><code>spark.sql.parquet.compression.codec</code></td>
592592
<td>gzip</td>
593593
<td>
594-
Sets the compression codec use when writing Parquet files. Acceptable values include:
594+
Sets the compression codec use when writing Parquet files. Acceptable values include:
595595
uncompressed, snappy, gzip, lzo.
596596
</td>
597597
</tr>
598+
<tr>
599+
<td><code>spark.sql.parquet.filterPushdown</code></td>
600+
<td>false</td>
601+
<td>
602+
Turn on Parquet filter pushdown optimization. This feature is turned off by default because of a known
603+
bug in Paruet 1.6.0rc3 (<a href="https://issues.apache.org/jira/browse/PARQUET-136">PARQUET-136</a>).
604+
However, if your table doesn't contain any nullable string or binary columns, it's still safe to turn
605+
this feature on.
606+
</td>
607+
</tr>
598608
<tr>
599609
<td><code>spark.sql.hive.convertMetastoreParquet</code></td>
600610
<td>true</td>
@@ -945,7 +955,7 @@ options.
945955

946956
## Migration Guide for Shark User
947957

948-
### Scheduling
958+
### Scheduling
949959
To set a [Fair Scheduler](job-scheduling.html#fair-scheduler-pools) pool for a JDBC client session,
950960
users can set the `spark.sql.thriftserver.scheduler.pool` variable:
951961

0 commit comments

Comments
 (0)