Skip to content

Commit 6426223

Browse files
macvincentmeta-codesync[bot]
authored andcommitted
Fix Race Condition in Stats Aggregation During Parallel Encoding with Chunking (facebookincubator#357)
Summary: Pull Request resolved: facebookincubator#357 This diff addresses a data race condition introduced by a recent refactoring in the handling of column stats objects for individual streams. Previously, the creation and access of column stats objects for individual streams were moved inside a barrier as [part of a refactor.](https://www.internalfb.com/diff/D88053111?entry_point=19) However, that change inadvertently introduced a data race, as multiple threads could concurrently create or access these objects without proper synchronization. This issue was detected by our TSAN (ThreadSanitizer) tests, which reported failures due to the race condition. This diff ensures that the creation and access of column stats objects are properly synchronized within the barrier, eliminating the data race. As a result, TSAN tests now pass, confirming that the concurrency issue has been resolved.x Reviewed By: xiaoxmeng Differential Revision: D88445993 fbshipit-source-id: 9cc8ecf39020fbfc2f1d06c077f1a45195e7bfbe
1 parent ba36073 commit 6426223

File tree

1 file changed

+7
-4
lines changed

1 file changed

+7
-4
lines changed

dwio/nimble/velox/VeloxWriter.cpp

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -993,15 +993,18 @@ bool VeloxWriter::writeChunks(
993993
velox::dwio::common::ExecutorBarrier barrier{
994994
context_->options().encodingExecutor};
995995
for (auto streamIndex : streamIndices) {
996-
barrier.add([&, streamData = streams[streamIndex].get()] {
997-
const auto offset = streamData->descriptor().offset();
996+
auto& streamData = streams[streamIndex];
997+
const auto offset = streamData->descriptor().offset();
998+
auto& streamSize = context_->columnStats(offset).physicalSize;
999+
auto& encodeStream = encodedStreams_[offset];
1000+
barrier.add([&] {
9981001
if (encodeStreamChunk(
9991002
*streamData,
10001003
minChunkSize,
10011004
maxChunkSize,
10021005
ensureFullChunks,
1003-
encodedStreams_[offset],
1004-
context_->columnStats()[offset].physicalSize,
1006+
encodeStream,
1007+
streamSize,
10051008
chunkBytes,
10061009
logicalBytes)) {
10071010
writtenChunk = true;

0 commit comments

Comments
 (0)