-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add file size-based chunking to JsonOutput (WIP) #650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) ✅ code/snyk check is complete. No issues have been found. (View Details) |
📜 Docstring Coverage ReportRESULT: PASSED (minimum: 30.0%, actual: 72.9%) Detailed Coverage Report
|
3ea6815
to
8987cb7
Compare
📦 Trivy Vulnerability Scan Results
Report Summary
Scan Result Details✅ No vulnerabilities found during the scan for |
📦 Trivy Secret Scan Results
Report Summary
Scan Result Details✅ No secrets found during the scan for |
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified FilesNo covered modified files...
|
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
🛠 Docs available at: https://k.atlan.dev/application-sdk/feat/add-file-size-based-chunking |
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
🛠 Full Test Coverage Report: https://k.atlan.dev/coverage/application-sdk/pr/650 |
1 similar comment
🛠 Full Test Coverage Report: https://k.atlan.dev/coverage/application-sdk/pr/650 |
self.buffer.append(chunk) | ||
self.current_buffer_size += len(chunk) | ||
self.current_buffer_size_bytes += chunk_size_bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: JsonOutput Buffer Flush Fails for Oversized Initial Chunks
The JsonOutput
's size-based flush condition (current_buffer_size > 0
) prevents flushing when the buffer is empty, even if the first chunk's estimated size exceeds max_file_size_bytes
. This results in oversized files being written, which then fail to upload via ObjectStoreOutput.push_file_to_object_store
due to DAPR gRPC message size limits. The current logic does not split an oversized single chunk.
Changelog
estimate_dataframe_json_size()
function to estimate JSON size of DataFrames - Add byte-level buffer tracking alongside existing record count trackingAdditional context (e.g. screenshots, logs, links)
Checklist
Copyleft License Compliance