Show HN: Ratarmount 1.0.0 – Rapid access to large archives via a FUSE filesystem

(github.com)

65 points | by mxmlnkn15 hours ago

5 comments

kenmacd13 hours ago
I find this project hugely helpful when working with Google Takeout archives. I normally pick a size that's not too large so that downloading them is easier, then it's simply a matter of:
ratarmount ./takeout-20231130T224325Z-0*.tgz ./mnt
sziiiizs12 hours ago
That is very cool. May I ask, how does the compressed stream seeking work? Does it keep state of the decompressor at certain points so arbitrary access can be faster than reading from the start of the stream?
- mxmlnkn12 hours ago
  For bzip2, a list of bit offsets in the compressed stream and a corresponding byte offset in the decompressed stream suffices because each bzip2 block is independent.
  For gzip, it is as you say. However, when only wanting to seek to DEFLATE block boundaries, the "state" of the decompressor is as simple as the last decompressed 32 KiB in the stream. Compared to the two offsets for bzip2, this is 2048x more data to store though. Rapidgzip does sparsity analysis to find out which of decompressed bytes are actually referenced later on and also recompresses those windows to reduce overhead. Ratarmount still uses the full 32 KiB windows though. This is one of the larger todos, i.e., to use the compressed index format, instead, and define such a format in the first place. This will definitely be necessary for LZ4, for which the window size is 64 KiB instead of 32 KiB.
  For zstd and xz, this Ansatz finds its limits because the Lempel-Ziv backreference windows are not limited in size in general. However, I am hoping that the sparsity analysis should make it feasible because, in the worst case, the state cannot be longer than the next decompressed chunk. In this worst case, the decompressed block consists only of non-overlapping back-references.
BoingBoomTschak8 hours ago
Congratulations on your v1.0.0! This is definitely a very nice tool, I'll try to play with it a bit and maybe try to make an ebuild (though the build system seems a bit complicated for proper no-network package managers). The extensive benchmark section is a nice plus.
A small note, archivemount has a living fork here: https://git.sr.ht/~nabijaczleweli/archivemount-ng
lathiat13 hours ago
This is awesome :)
ranger_danger10 hours ago
similiar projects:
https://github.com/cybernoid/archivemount
https://github.com/google/fuse-archive
https://github.com/google/mount-zip
https://bitbucket.org/agalanin/fuse-zip