major bottleneck in the usage of write-file-atomic #1785
Description
- Version: master
- Platform: Mac OS X
- Subsystem: unix-fs
I think I have identified the source of the majority of latency (and thus GC) issues in js-ipfs. As noted in ipfs/js-ipfs-repo#183, js-ipfs needs to make sure that files are properly written to disk to avoid missing file issues in case of crashes. Moreover, this is on par with the Go implementation.
The write-file-atomic module on npm is used to implement this behavior. In our latest benchmarks, this is the major bottleneck of synchronous and asynchronous interaction of ipfs.
On the synchronous side, write-file-atomic uses fs.realpath()
. In fs.realpath()
, the LOOP
function is showing up in our flamegraphs because it needs to walk down the path and resolve all links: . This function alone can account for 5-10% of the total execution time.
On the asynchronous side, write-file-atomic
schedules at least schedules 5 async file system operations and 11 microtask queue operation. Moreover, it does a lstat for every part ‘/part/’ of a path, and as many process.nextTick
queue. If it’s an a folder at depth 6, this will grow to 10 file system operation, and 11 microtask queue operation, and 10 process.nextTick.
js-ipfs writes a lot of small files to disk, and all of that activity adds up. Note that files are not written in parallel, but rather in sequence, so that all that activity is in the hot path and it increase the overall latency of adding and receiving a file. Some of those fs operations can take up to 20ms to complete: when transferring big files, this would immediately become the major bottleneck of ipfs. As an example, adding a file of 64MB with chunk size of 262144 of a folder that is 5 level deep from the root will cause 256 * 5 * 5 = 6400 asynchronous operations, plus 256 * 20 = 5120 scheduling (Promise+nextTick) operations.
To solve this issue, I recommend:
- clarify the algorithm for “atomic writes” with the Go team - ideally reviewing their code.
- write a new tiny module for doing atomic writes in the context of ipfs, and replace write-file-atomic
- investigate if the files could be processed in parallel instead of serially, ideally over a sliding window approach.