Skip to content

Commit 75b3dbc

Browse files
macvincentmeta-codesync[bot]
authored andcommitted
fix: [Nimble][Chunking] Determine Threshold for Chunking Stages Based on Schema Width (#316)
Summary: Pull Request resolved: #316 There are two chunking stages. A soft stage where we only chunk streams above a pre-determined max size and a hard stage where we chunk all streams above min size. In D86435350, we introduced an option to determine max size based on schema width. While we use this new max size value to initialize the chunker, we fail to use it to when determining chunking stages. This diff fixes that. Reviewed By: helfman Differential Revision: D86924613 fbshipit-source-id: 4c07eefe22d4474aa1c07cfa5227fb92d71a89b4
1 parent 75e73cf commit 75b3dbc

File tree

1 file changed

+14
-3
lines changed

1 file changed

+14
-3
lines changed

dwio/nimble/velox/VeloxWriter.cpp

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1009,15 +1009,26 @@ bool VeloxWriter::evalauateFlushPolicy() {
10091009
// Relieve memory pressure by chunking streams above max size.
10101010
const auto& streams = context_->streams();
10111011
std::vector<uint32_t> streamIndices;
1012-
streamIndices.reserve(streams.size());
1012+
const auto streamCount = streams.size();
1013+
streamIndices.reserve(streamCount);
1014+
1015+
// Determine size threshold for soft chunking based on schema width.
1016+
const auto& options = context_->options;
1017+
const auto maxChunkSize = streamCount > options.largeSchemaThreshold
1018+
? options.wideSchemaMaxStreamChunkRawSize
1019+
: options.maxStreamChunkRawSize;
10131020
for (auto streamIndex = 0; streamIndex < streams.size(); ++streamIndex) {
1014-
if (streams[streamIndex]->memoryUsed() >=
1015-
context_->options.maxStreamChunkRawSize) {
1021+
if (streams[streamIndex]->memoryUsed() >= maxChunkSize) {
10161022
streamIndices.push_back(streamIndex);
10171023
}
10181024
}
1025+
1026+
// Soft chunking.
10191027
const bool continueChunking =
10201028
batchChunkStreams(streamIndices, /*ensureFullChunks=*/true);
1029+
1030+
// Hard chunking when chunking streams above maxChunkSize fails to
1031+
// relieve memory pressure.
10211032
if (continueChunking) {
10221033
// Relieve memory pressure by chunking small streams.
10231034
// Sort streams for chunking based on raw memory usage.

0 commit comments

Comments
 (0)