Zip vs TGZ
I can’t believe I didn’t understand the technical differences
between zip files and tgz file until today. This is crazy because I started using zip
files in like 1998 – back then it wasn’t part of the OS, and WinZip was one of
those freewares that would always ask for payment but you could ignore and
keep using it. And tgz seemed only to be used in linux land, where I would impatiently wait
tar -xvf <myfile.tgz> to be executed, usually on an
iso file containing a linux OS install CD.
Yikes on all counts. My daughter isn’t going to know what a CD is ;_;
It’s interesting how long you can go using something without really understanding what’s happening under the hood. I didn’t think anything of it until today, when I was doing my annual download via Google Takeout. I have something like 500GiB of compressed (zipped) data in Google from last year. Google asked me if I wanted a tgz or a zip file and I found myself stumped. Why does it matter? Aren’t they all just compressed archives?
Re-summarizing the info I found on stack overflow:
tarfiles are uncompressed, where many files are bunched together into a single file for ease of movement/storage. They are also called tarballs. The imagery of files glued together with tar is pretty evocative.
tgzfiles are tar files where the tarball is made first, then gzipped as a single unit to save space. Thus tgz (tarred and gzipped).
zipfiles are created by compressing individual files, then gluing those together.
The consequences of this are:
- tgz files may compress better since you can take advantage of compressing a larger piece of data.
- zip files compress worse, but make it so that you can uncompress individual files on demand – meaning random access is faster.
- If you’re compressing a huge file like from Google Takeout, get it zipped. You’re unlikely to need to access all the files at once later, and being able to decompress a little file you want to look at is definitely useful.
- If you’re just downloading a file for long long term storage and have no plans to access anything in it, tgz will be slightly more space efficient in theory. I haven’t tested the theory.