I tried to run bcftools merge on thousands of files. This runs up against the open fd limit on my linux machine.
I managed to reproduce by generating thousands of vcfs, then running:
docker run -it --rm -v $(pwd):/work -w /work --ulimit nofile=2048:2048 ubuntu:24.04 bash
apt-get update
apt-get install -y bcftools
bcftools concat -a -O z -f file-list.txt -o /dev/null
The result:
root@b0b48a108195:/work# bcftools concat -a -O z -f file-list.txt -o /dev/null
Checking the headers and starting positions of 10000 files
[E::hts_idx_load3] Could not load local index file 'generated_vcfs/06_02044_chr6.vcf.bgz.csi' : Too many open files
Failed to open generated_vcfs/06_02044_chr6.vcf.bgz: could not load index
Intuitively, I would expect merge to handle many, many files. I know I can just do a recursive merge, but does it need to open all the files at the same time?
Thanks!!