Skip to content

Add async iterator on result #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

abelcha
Copy link

@abelcha abelcha commented Jun 19, 2025

Summary

This provides a high-level abstraction for result streaming that matches JavaScript language idioms alongside existing chunk-based APIs.
it permit to iterate over query results using for await loops

Usage Example

const result = await connection.run('SELECT * FROM large_table');

for await (const row of result) {
  console.log(row);
}

Features Added

  • Async Iterator Implementation: Added [Symbol.asyncIterator]() method to DuckDBResult class

Technical Details

  • The async iterator fetches chunks progressively, reducing memory usage for large result sets
  • Maintains compatibility with existing DuckDBResult API
  • Properly handles edge cases like empty results and null values

Testing

the tests verify:

  • Correct iteration behavior
  • Memory-efficient chunk fetching
  • Proper handling of edge cases
  • Early termination scenarios

@abelcha abelcha changed the title Add async iterator Add async iterator on result Jun 19, 2025
@jraymakers
Copy link
Contributor

Thanks for the PR! This is a very cool idea.

To make it even better, and to fit in with the rest of the API, it should allow iterating over either row arrays or row objects, and support the raw or converted (to JS, JSON, or custom) variants. To make that maintainable, we'd like need an async chunk iterator as a building block.

If you'd like to give that a shot, go ahead, or I can try to outline the API I have in mind when I get some time.

@abelcha
Copy link
Author

abelcha commented Jun 27, 2025

I tried wiring up support for all the variants, but it add a lot of stuff in the codebase, i feel like the kind of call that’s yours to make. This is just a minimal version that could serve as a base.

this binding is already a blessing compared to the first one — I’d rather not mess it up

Performance-wise, I was surprised how much per-row object creation adds up. With a template object + Object.create for each row i got a ~10% improvement though it’s hard to benchmark. but yeah at this level its best to let the consumer choose to eat the cost or not

I’m working on a more experimental, fully typed high-level DuckDB TypeScript runtime, and this is the UX I’ve landed on based on the select return value:

alt text

  • im mapping Bigint to Number so it simplifies a lot

@jraymakers
Copy link
Contributor

Yes, the reason for the variants is to provide a choice between convenience and performance. Generally the column-oriented ones are going to perform better than the row-oriented ones, and raw arrays will perform better than objects, but for small results it doesn't matter, and rows and objects can be convenient at times.

Supporting all the variants without a lot of code duplication that's hard to maintain took some iteration. I think it could be done while also supporting async iterators, but it will take some experimentation, which I haven't had time for yet. (I still hope to, though probably not very soon.)

That library/runtime you're building looks interesting. How are you ensuring the results are correctly typed? I'd like to provide better typing for results, but I haven't discovered a good way yet. (See #140.)

@abelcha
Copy link
Author

abelcha commented Jul 15, 2025

I follow a similar approach to convex.dev, where intermediate schemas are written to a local .buckdb/ directory.

Either on first execution it inspects .columnTypes() dynamically, or — if you’re in a live environment — it can describe the schema ahead of time (e.g. https://buckdb.pages.dev).

It also codegens phantom types from duckdb_functions() and duckdb_types() to produce full method signatures and static type info for function calls.

Then it use TS generics to handle joins, CTEs, name aliases, etc. to infer return value
src/build.types.ts

btw… are you guys hiring ?
I genuinely love DuckDB and would be thrilled to contribute more to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants