Skip to content

Commit fd9904a

Browse files
authored
Merge pull request #6655 from gchq/6618-disable-datafusion-readahead
6618: Disable data fusion readahead sore by default
2 parents c107537 + d03a14b commit fd9904a

File tree

7 files changed

+7
-7
lines changed

7 files changed

+7
-7
lines changed

docs/usage/properties/instance/user/table_property_defaults.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ The following instance properties relate to default values used by table propert
1313
| sleeper.default.table.parquet.dictionary.encoding.value.fields | Whether dictionary encoding should be used for value columns in the Parquet files. | false | false |
1414
| sleeper.default.table.parquet.columnindex.truncate.length | Used to set parquet.columnindex.truncate.length, see documentation here:<br>https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md<br>The length in bytes to truncate binary values in a column index. | 128 | false |
1515
| sleeper.default.table.parquet.statistics.truncate.length | Used to set parquet.statistics.truncate.length, see documentation here:<br>https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md<br>The length in bytes to truncate the min/max binary values in row groups. | 2147483647 | false |
16-
| sleeper.default.table.datafusion.s3.readahead.enabled | Enables a cache of data when reading from S3 with the DataFusion data engine, to hold data in larger blocks than are requested by DataFusion. | true | false |
16+
| sleeper.default.table.datafusion.s3.readahead.enabled | Enables a cache of data when reading from S3 with the DataFusion data engine, to hold data in larger blocks than are requested by DataFusion. | false | false |
1717
| sleeper.default.table.parquet.writer.version | Used to set parquet.writer.version, see documentation here:<br>https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md<br>Can be either v1 or v2. The v2 pages store levels uncompressed while v1 pages compress levels with the data. | v2 | false |
1818
| sleeper.default.table.parquet.rowgroup.rows.max | Maximum number of rows to write in a Parquet row group. | 100000 | false |
1919
| sleeper.default.table.statestore.transactionlog.add.transaction.max.attempts | The number of attempts to make when applying a transaction to the state store. This default can be overridden by a table property. | 10 | false |

docs/usage/properties/table/data_storage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ The following table properties relate to the storage of data inside a table.
1111
| sleeper.table.parquet.dictionary.encoding.value.fields | Whether dictionary encoding should be used for value columns in the Parquet files. | false |
1212
| sleeper.table.parquet.columnindex.truncate.length | Used to set parquet.columnindex.truncate.length, see documentation here:<br>https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md<br>The length in bytes to truncate binary values in a column index. | 128 |
1313
| sleeper.table.parquet.statistics.truncate.length | Used to set parquet.statistics.truncate.length, see documentation here:<br>https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md<br>The length in bytes to truncate the min/max binary values in row groups. | 2147483647 |
14-
| sleeper.table.datafusion.s3.readahead.enabled | Enables a cache of data when reading from S3 with the DataFusion data engine, to hold data in larger blocks than are requested by DataFusion. | true |
14+
| sleeper.table.datafusion.s3.readahead.enabled | Enables a cache of data when reading from S3 with the DataFusion data engine, to hold data in larger blocks than are requested by DataFusion. | false |
1515
| sleeper.table.parquet.writer.version | Used to set parquet.writer.version, see documentation here:<br>https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md<br>Can be either v1 or v2. The v2 pages store levels uncompressed while v1 pages compress levels with the data. | v2 |
1616
| sleeper.table.parquet.query.column.index.enabled | Used during Sleeper queries to determine whether the column/offset indexes (also known as page indexes) are read from Parquet files. For some queries, e.g. single/few row lookups this can improve performance by enabling more aggressive pruning. On range queries, especially on large tables this can harm performance, since readers will read the extra index data before returning results, but with little benefit from pruning. | false |
1717
| sleeper.table.parquet.rowgroup.rows.max | Maximum number of rows to write in a Parquet row group. | 100000 |

example/full/instance.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1821,7 +1821,7 @@ sleeper.logging.root.level=INFO
18211821
# Enables a cache of data when reading from S3 with the DataFusion data engine, to hold data in larger
18221822
# blocks than are requested by DataFusion.
18231823
# (default value shown below, uncomment to set a value)
1824-
# sleeper.default.table.datafusion.s3.readahead.enabled=true
1824+
# sleeper.default.table.datafusion.s3.readahead.enabled=false
18251825

18261826
# Used to set parquet.writer.version, see documentation here:
18271827
# https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md

example/full/table.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ sleeper.table.statestore.classname=DynamoDBTransactionLogStateStore
165165
# Enables a cache of data when reading from S3 with the DataFusion data engine, to hold data in larger
166166
# blocks than are requested by DataFusion.
167167
# (default value shown below, uncomment to set a value)
168-
# sleeper.table.datafusion.s3.readahead.enabled=true
168+
# sleeper.table.datafusion.s3.readahead.enabled=false
169169

170170
# Used to set parquet.writer.version, see documentation here:
171171
# https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md

java/core/src/main/java/sleeper/core/properties/instance/TableDefaultProperty.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ public interface TableDefaultProperty {
8989
UserDefinedInstanceProperty DEFAULT_DATAFUSION_S3_READAHEAD_ENABLED = Index.propertyBuilder("sleeper.default.table.datafusion.s3.readahead.enabled")
9090
.description("Enables a cache of data when reading from S3 with the DataFusion data engine, to hold data " +
9191
"in larger blocks than are requested by DataFusion.")
92-
.defaultValue("true")
92+
.defaultValue("false")
9393
.validationPredicate(SleeperPropertyValueUtils::isTrueOrFalse)
9494
.propertyGroup(InstancePropertyGroup.TABLE_PROPERTY_DEFAULT).build();
9595
UserDefinedInstanceProperty DEFAULT_PARQUET_WRITER_VERSION = Index.propertyBuilder("sleeper.default.table.parquet.writer.version")

scripts/templates/instanceproperties.template

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1822,7 +1822,7 @@ sleeper.subnets=set-automatically
18221822
# Enables a cache of data when reading from S3 with the DataFusion data engine, to hold data in larger
18231823
# blocks than are requested by DataFusion.
18241824
# (default value shown below, uncomment to set a value)
1825-
# sleeper.default.table.datafusion.s3.readahead.enabled=true
1825+
# sleeper.default.table.datafusion.s3.readahead.enabled=false
18261826

18271827
# Used to set parquet.writer.version, see documentation here:
18281828
# https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md

scripts/templates/tableproperties.template

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ sleeper.table.name=changeme
149149
# Enables a cache of data when reading from S3 with the DataFusion data engine, to hold data in larger
150150
# blocks than are requested by DataFusion.
151151
# (default value shown below, uncomment to set a value)
152-
# sleeper.table.datafusion.s3.readahead.enabled=true
152+
# sleeper.table.datafusion.s3.readahead.enabled=false
153153

154154
# Used to set parquet.writer.version, see documentation here:
155155
# https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md

0 commit comments

Comments
 (0)