Add CborReaderProxy to support process CBOR stream in chunks #3939

muhammad-othman · 2025-07-30T17:35:37Z

Description

The current code loads the whole response stream in memory before start the unmarshalling process. This works for small payloads but may cause issues for large response payloads.
The System.Formats.Cbor.CborReader doesn't accept stream and requires in memory byte array for deserialization.

This PR adds CborReaderProxy that wraps System.Formats.Cbor.CborReader to support processing the stream in chunks without ever holding the whole payload in memory, feeding the chunks into CborReader and process them one by one.

The default CborReader.ReadEndMap()/CborReader.ReadEndArray() methods assume the full map is already present in the input buffer and cannot continue reading if the buffer ends in the middle of a container.
This means the reader will fail to advance if the next token is a container break (0xFF) but the start wasn't read into the current buffer (it was read in a previous part of the stream).

To support maps/arrays across multiple refills I did the following:

Implement custom ReadStartMap, ReadEndMap, ReadStartArray, ReadEndArray.
Detect and skip container break markers 0xFF as needed.
Added _nestingStack to track when reading a nested CBOR structure (e.g., an array of maps), and correctly match calls to ReadStartMap with ReadEndMap, and ReadStartArray with ReadEndArray.

Motivation and Context

#DOTNET-8252

Testing

Added CborReaderProxyTests.cs to cover the new changes.
Tested various CloudWatch and SecretsManager operations with CBOR.
DRY_RUN-4ff92fab-9e40-46cc-88d0-746023de37cf.

Screenshots (if appropriate)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

My code follows the code style of this project
My change requires a change to the documentation
I have updated the documentation accordingly
I have read the README document
I have added tests to cover my changes
All new and existing tests passed

License

I confirm that this pull request can be released under the Apache 2 license

extensions/src/AWSSDK.Extensions.CborProtocol/Internal/CborReaderProxy.cs

Copilot

Pull Request Overview

This PR implements a streaming CBOR reader to reduce memory usage when processing large payloads. Instead of loading entire response streams into memory before unmarshalling, it introduces CborReaderProxy that processes CBOR data in chunks.

Adds CborReaderProxy to wrap System.Formats.Cbor.CborReader with streaming capabilities
Implements custom container handling for maps/arrays that span buffer boundaries
Adds configuration for initial buffer size through AWSConfigs.CborReaderInitialBufferSize

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`sdk/src/Core/Amazon.Util/Internal/RootConfig.cs`	Adds CborReaderInitialBufferSize configuration property
`sdk/src/Core/AWSConfigs.cs`	Implements configuration support and helper methods for CBOR buffer size
`extensions/test/CborProtocol.Tests/CborReaderProxyTests.cs`	Comprehensive test suite for the new streaming CBOR reader
`extensions/src/AWSSDK.Extensions.CborProtocol/Internal/Transform/CborUnmarshallerContext.cs`	Updates context to use CborReaderProxy instead of loading full stream
`extensions/src/AWSSDK.Extensions.CborProtocol/Internal/CborReaderProxy.cs`	Core implementation of streaming CBOR reader with buffer management

sdk/src/Core/AWSConfigs.cs

extensions/src/AWSSDK.Extensions.CborProtocol/Internal/CborReaderProxy.cs