Zip vs TGZ
It’s interesting how long you can go using something without really understanding what’s happening under the hood. I didn’t think anything of it until today, when I was doing my annual download via Google Takeout. I have something like 500GiB of compressed (zipped) data in Google from last year. Google asked me if I wanted a tgz or a zip file and I found myself stumped. Why does it matter? Aren’t they all just compressed archives?1
Re-summarizing the info I found on stack overflow:
tar
files are uncompressed, where many files are bunched together into a single file for ease of movement/storage. They are also called tarballs. The imagery of files glued together with tar is pretty evocative.tgz
files are tar files where the tarball is made first, then gzipped as a single unit to save space. Thus tgz (tarred and gzipped).zip
files are created by compressing individual files, then gluing those together.
The consequences of this are:
- tgz files may compress better since you can take advantage of compressing a larger piece of data.
- zip files compress worse, but make it so that you can uncompress individual files on demand – meaning random access is faster.
Practical recommendations
- If you’re compressing a huge file like from Google Takeout, get it zipped. You’re unlikely to need to access all the files at once later, and being able to decompress a little file you want to look at is definitely useful.
- If you’re just downloading a file for long long term storage and have no plans to access anything in it, tgz will be slightly more space efficient in theory. I haven’t tested the theory.
-
This is so embarassing because I’ve been a linux user since grade school and my brain was like “tgz was just a free version of zip” ↩