Skip to content

Plan for async ChunkReader? #924

@neverchanje

Description

@neverchanje

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

https://docs.rs/parquet/6.1.0/parquet/file/reader/trait.ChunkReader.html The ChunkReader trait is designed to hide the details of the underlying storage, which however may use the async feature, rusto-s3 for example:

pub async fn get_object_range

https://durch.github.io/rust-s3/s3/bucket/struct.Bucket.html#method.get_object_range

That means every get_read call has to be implemented by blocking on an async function.

impl parquet::file::reader::ChunkReader for ParquetChunkReader {
  type T = SliceableCuror;

  fn get_read(&self, start: u64, length: usize) -> parquet::errors::Result<Self::SliceableCuror> {
    futures::executor::block_on(async move {
      bucket.get_object_range(...)
    })
  }
}

The non-async limitation of the ChunkReader trait will result in some problems, especially when the query execution engines are using async, tokio, specifically, it will have to use tokio::spawn_blocking to perform a parquet read.

Describe the solution you'd like
A clear and concise description of what you want to happen.

We can add async to ChunkReader::get_read.

#[async_trait::async_trait]
pub trait ChunkReader: Length {
    type T: Read;

    async fn get_read(&self, start: u64, length: usize) -> Result<Self::T>;
}

Many of the API will definitely need also to be changed to async. Until now I have no very clear idea and plan
for moving them to the async version with backward compability.

One possible solution is to provide a totally aync API, and the non-async API wraps it via async_std::block_on.

pub trait RowGroupReaderAsync {
  async fn get_column_reader(&self, i: usize) -> Result<ColumnReader> { ... }
}

pub struct RowGroupReader {
  fn get_column_reader() { async_std::block_on(self.get_column_reader_async()) }
}

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions