UNIX Tar Problem: File Length Truncation, Unicode Name Support

Xah Lee, 2011-03-17

Discovered, that GNU tar now has a “--help” option. So, instead of typing “man tar”, you type “tar --help”. Not sure if this has been there for long or what.

Much better. I always hated the “man” fuck. You can never be sure if the man page correspond to the version you are using, and because the doc is separate, it's also pain to maintain for dev, tends to get out of sync.

Another thing about tar is that i never figured out why its syntax doesn't use the dash. You use tar xvf myfile.tar instead of tar -xvf myfile.tar. Many years ago, with dash won't work. Not sure all tar programs support that today.

Also, you can't talk about tar without talking about unix line truncation problem. Tar used to truncate your file names if the path is long (e.g. ~120). See: Unix, RFC, Line Truncation. Am not sure how good it is today.

Something i still wanted to test but never got to it. Does current version of tar preserve file name that has unicode? (e.g. Chinese, math symbols.)

According to tar (file format), there seems to be a new spec in “POSIX.1-2001” that addressed file name length and charset encoding, and is implemented by GNU tar in 2004.

The Wikipedia article turns out quite informative. One thing it mentioned is the “tarbomb”. That is, when untar, the file gets scattered all over your dir, or even to parent dirs, and OVERWRITES your files. This is a extreme pain in the ass.

Another problem interesting is that tar doesn't support table of contents so no random access. If you need to list files or extract one file, you need to read thru it from the beginning.

Here's another good resource discussing tar's problems. New file format? @ Source duplicity.nongnu.org.

In recent month i read that Google still use tape drive as one of their backup. I wonder if they use tar as the file format.

Alright, today, officially i deprecate tar. I'll never use it myself.