Skip to content

Add CborReaderProxy to support process CBOR stream in chunks #3939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

muhammad-othman
Copy link
Member

@muhammad-othman muhammad-othman commented Jul 30, 2025

Description

The current code loads the whole response stream in memory before start the unmarshalling process. This works for small payloads but may cause issues for large response payloads.
The System.Formats.Cbor.CborReader doesn't accept stream and requires in memory byte array for deserialization.

This PR adds CborReaderProxy that wraps System.Formats.Cbor.CborReader to support processing the stream in chunks without ever holding the whole payload in memory, feeding the chunks into CborReader and process them one by one.

The default CborReader.ReadEndMap()/CborReader.ReadEndArray() methods assume the full map is already present in the input buffer and cannot continue reading if the buffer ends in the middle of a container.
This means the reader will fail to advance if the next token is a container break (0xFF) but the start wasn't read into the current buffer (it was read in a previous part of the stream).

To support maps/arrays across multiple refills I did the following:

  • Implement custom ReadStartMap, ReadEndMap, ReadStartArray, ReadEndArray.
  • Detect and skip container break markers 0xFF as needed.
  • Added _nestingStack to track when reading a nested CBOR structure (e.g., an array of maps), and correctly match calls to ReadStartMap with ReadEndMap, and ReadStartArray with ReadEndArray.

Motivation and Context

#DOTNET-8252

Testing

  • Added CborReaderProxyTests.cs to cover the new changes.
  • Tested various CloudWatch and SecretsManager operations with CBOR.
  • DRY_RUN-4ff92fab-9e40-46cc-88d0-746023de37cf.

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

@muhammad-othman muhammad-othman requested a review from normj July 30, 2025 17:35
@philasmar philasmar requested a review from peterrsongg July 31, 2025 17:26
@GarrettBeatty GarrettBeatty requested a review from Copilot August 1, 2025 00:05
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a streaming CBOR reader to reduce memory usage when processing large payloads. Instead of loading entire response streams into memory before unmarshalling, it introduces CborReaderProxy that processes CBOR data in chunks.

  • Adds CborReaderProxy to wrap System.Formats.Cbor.CborReader with streaming capabilities
  • Implements custom container handling for maps/arrays that span buffer boundaries
  • Adds configuration for initial buffer size through AWSConfigs.CborReaderInitialBufferSize

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sdk/src/Core/Amazon.Util/Internal/RootConfig.cs Adds CborReaderInitialBufferSize configuration property
sdk/src/Core/AWSConfigs.cs Implements configuration support and helper methods for CBOR buffer size
extensions/test/CborProtocol.Tests/CborReaderProxyTests.cs Comprehensive test suite for the new streaming CBOR reader
extensions/src/AWSSDK.Extensions.CborProtocol/Internal/Transform/CborUnmarshallerContext.cs Updates context to use CborReaderProxy instead of loading full stream
extensions/src/AWSSDK.Extensions.CborProtocol/Internal/CborReaderProxy.cs Core implementation of streaming CBOR reader with buffer management

@muhammad-othman muhammad-othman force-pushed the muhamoth/DOTNET-8252-CBOR-Optimize-loading-stream branch from 6a80340 to 2c0ea9a Compare August 1, 2025 17:11
@muhammad-othman muhammad-othman requested a review from normj August 1, 2025 17:13
@muhammad-othman muhammad-othman force-pushed the muhamoth/DOTNET-8252-CBOR-Optimize-loading-stream branch 2 times, most recently from 0208456 to 6da5cab Compare August 1, 2025 17:52
@muhammad-othman muhammad-othman requested a review from normj August 1, 2025 17:57
private CborReader _internalCborReader;
private int _currentChunkSize;

public CborStreamReader(Stream stream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class is missing documentation


namespace Amazon.Extensions.CborProtocol.Internal
{
public class CborStreamReader : IDisposable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class is missing documentation

_stream = stream ?? throw new ArgumentNullException(nameof(stream));
_buffer = ArrayPool<byte>.Shared.Rent(AWSConfigs.CborReaderInitialBufferSize);

_currentChunkSize = _stream.Read(_buffer, 0, _buffer.Length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention is to fill the whole buffer or fill until the stream has no data left, this isn't guaranteed to happen with one Stream.read call, since the docs state that it could be less than the bytes you requested if the stream doesn't have that data available yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why we are using _currentChunkSize to know how much did we read and we create the ReadOnlyMemory from the start of the buffer to _currentChunkSize not to the buffer size.

From Read documentation:
image

}
catch (CborContentException ex)
{
if (_currentChunkSize == 0 && _internalCborReader.BytesRemaining == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how we would be in a CborContentException is the _currentChunkSize == 0 and the BytesRemaining == 0? Wouldn't this indicate that we successfully read everything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and after reading everything the code that uses the CborStreamReader might try to read more data which means there is an issue with that code, at this case CborReader will throw this exception and we will throw it back for the other code to handle or fail.

leftoverBytesCount = leftoverBytesCount - bytesToSkip;

// If the leftover data completely fills the buffer, grow it
if (leftoverBytesCount >= _buffer.Length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering.. is there a reason you used >= here? Isn't it impossible for the leftoverBytesCount to be greather than the buffer's length, since the reader will be instantiated with at most the data in the buffer? or am I missing something? Have you seen this case happen before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't happen while testing, but just wanted to be safe incase we had an off by one error or some, at this case I would prefer expanding the buffer rather than go to the next case and lose data when copying to the start of the buffer.

@muhammad-othman muhammad-othman force-pushed the muhamoth/DOTNET-8252-CBOR-Optimize-loading-stream branch from 6da5cab to 8b50922 Compare August 1, 2025 19:45
@muhammad-othman muhammad-othman merged commit 3a661f5 into cbor-protocol Aug 1, 2025
1 check passed
@muhammad-othman muhammad-othman deleted the muhamoth/DOTNET-8252-CBOR-Optimize-loading-stream branch August 1, 2025 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants