Skip to content

Support async Blob sources #37338

@jimmywarting

Description

@jimmywarting

The terminology "async Blob source" was mention earlier[1] I think it means in chromes world a "BlobDataItem"

BlobDataItem: This is a primitive element that can basically be a File, Bytes, or another Blob. It also stores an offset and size, so this can be a part of a file. (This can also represent a “future” file and “future” bytes, which is used to signify a bytes or file item that has not been transported yet).
https://chromium.googlesource.com/chromium/src/+/master/storage/browser/blob/README.md

  • In simple terms it can be a Blob that comes from other places like the filesystem that don't have the data (source) in the memory.
  • It's basically a dummy blob that have a point reference to some file on the hard drive with an offset of where to start and stop reading from
  • Or it can be a slice of another blob, with a new offset.

// if you create a blob
const blobA = new Blob([new ArrayBuffer(20_000)])
// and slice it
const blobB = blobA.slice(0, 10_000)
  • ...Then we shouldn't create a new blob and allocated 30.000 bytes for both blobA and blobB it should still only be 20.000 bytes
  • blobB should be a reference to blobA with another offset from 0 to 10_000 and shouldn't allocate any new buffer
    • so when we are reading blobB then it creates a readable stream on blobA that reads bytes 0-10000

And if we create a 3rd blob blobC = new Blob([blobB, blobB])

  • Then blobC will just have two reference to blobB (witch actually refers to two slices of blobA). And reading this blobC will be like if we created two readable streams and read them one after the other
    blobC.stream() === [blobB.stream(), blobB.stream()]
// this pseudo code can be a bit off (wrong) but it can explain what i mean
blobB.stream = () => new streams.Readable.from(function* () {
  yield* blobA[kSource].slice(0, 10000)
})

blobC.stream = () => new streams.Readable.from(function* () {
  yield* blobB.stream()
  yield* blobB.stream()
})

// A better way is if blobC can have direct reference to the internal
// blob parts to blobA so blobB can be garbage collected

in this case both blobB and BlobC would not have any underlying source that holds the data in memory. this would only be (nested) references to blobA

Guess this is some important steps towards solving async blob sources and mixing both blobs that could come from the fs and mixed with blobs constructed with data from the memory

I created something like this async sources in fetch-blob and also a kind of BlobDataItem that was backed up by the filesystem and didn't hold anything in the memory. I have totally manage to solve this slicing stuff in a synchronous way and also constructing new blobs with other parts. the only thing that is async is how it reads the data. My blob is constructed with a sequence of blob parts (with offsets of other parts) (not a single source like a full ArrayBuffer containing all data).

That is how it can manage to take any other instance of a Blob class that isn't our own and knowing nothing about the internal source of the other blob. So you can construct fetch-blob with something that looks and behaves like any other Blob. it could for instance accept a BlobDataItem from the fs and also buffer.Blob

note that a blob from fs should keep track of last modified time and if it isn't the same when you are reading, then the internal stream should throw a error

Metadata

Metadata

Assignees

No one assigned

    Labels

    bufferIssues and PRs related to the buffer subsystem.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions