Unix zip Utility Path Problem

Perm url with updates: http://xahlee.org/comp/unix_zip_problem.html

Unix zip Utility Path Problem

Xah Lee, 2010-11-30

This page describes a problem with unix's zip utility that's related to the problem of unix's environment variable.

Unix has a command line zip utility to compress files and folders. Very convenient. For example, suppose you want to archive this folder: 〔c:/Users/xah/ErgoEmacs_Source/〕. All you have to do is “cd” to the parent folder 〔c:/Users/xah/〕, then type 「zip -r ErgoEmacs_Source.zip ErgoEmacs_Source」. A archive file named “ErgoEmacs_Source.zip” will be created.

But suppose you need to call “zip” in a program. I know the dir i want to archive, and i know the dir i want the archived output to be.

Suppose in your program, you have:

zip -r "c:/Users/xah/output/ErgoEmacs_Source.zip" "c:/Users/xah/ErgoEmacs_Source"

This will create the archive, however, there's a problem. In the archive, it records the full path of each file. So, when user tries to unzip 〔ErgoEmacs_Source.zip〕 on her machine in her current dir at 〔c:/Users/mary/Downloads/〕, it'll try to create the files and dir at 〔c:/Users/xah/ErgoEmacs_Source〕 or create them at 〔c:/Users/mary/Downloads/Users/xah/ErgoEmacs_Source〕. Worse, if you use relative paths in your program, then some unzip software will claim it's a error.

Problem with Unix Environment Variable

There does not seem to be a option in the unix zip command line utility to solve this. The best you can do, is in your script, change the current path then call 「zip -r ErgoEmacs_Source.zip ErgoEmacs_Source」 just as if you are doing it manually.

The problem with this is that, once you introduce the “current dir” environment variable in your code, you have to be careful for every line of your code that deals with directories. Because env var are global variables, and wrong value of “current dir” will effect all functions that takes relative dir as path. This is especially important for build scripts that deals with lots directories. If you forgot to set current dir before a particular function call that takes relative path, you might deleting dir or files.

I was working on a emacs lisp script that builds a ErgoEmacs lisp packages for public release. I got a bug report that on the default unzip utility in Windows 7, and also the “7-Zip v9.17” claims that my zip archive is empty. The problem is caused by relative path in my zip archive, which is caused by using relative paths when calling zip utility like this 「zip ErgoEmacs_v1.9.zip ../」. When unarchiving using unix unzip, it gives a warning but otherwise works.

So, i tried to fix my build script. Spent 2 hours and realized there's no option in the zip util to do what i want. So, in the end, i set current dir to the parent dir of the dir i want to zip, full aware of the danger. Yet, because of this setting of current dir, by mistake first i copied my entire 〔~/〕 dir (a few gigabytes, was wondering why it took so long), then later on i deleted my entire svn checkout dir.

The essence of this prbolem is unix's concept of current dir as a environment variable. The environment variable is basically a global variable. Env var and current dir might be unavoidable and useful concept for operating the command line, but the essential problem of unix is the unixers do not realize what env var is, thus we have a problem like the zip util today, where it does not have option to clearly understand the input path, output path, and paths in archives.

What's the Big Deal?

Some unixer might say “what's the big deal?”. The problem is that as a software programer, you spend hours on seemingly trivial problems, caused by million of these little things.

If you do not introduce environment variable into your program, it is impossible to create the correct the zip archive using the unix zip program. If you introduce env var, you basically introduced a pest into your script, that you have to be careful on everyline to set the current dir correctly. If you forgot (which is normal), it causes disaster. If you do functional programing, or using a functional lang, this causes headache.

See also:

Version Number and Test Files

For testing purpose, here's a zip file with relative path in them: http://ergoemacs.googlecode.com/files/ergoemacs_1.9.1.zip

The unzip program discussed here is:

c:\Users\xah\web\xahlee_org\comp>unzip --help
unzip --help
UnZip 6.00 of 20 April 2009, by Cygwin. Original by Info-ZIP.

The zip version is:

c:\Users\xah\web\xahlee_org\comp>zip --help
zip --help
Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
Zip 3.0 (July 5th 2008). Usage:

They are both from cygwin. I'm running Windows Vista.

Another unix fuckup is that, right now i tried to find the version of zip and unzip that i'm using, for the record. There does not seem to be any option to print the version number. It does not seem to be documented in the man pages. In the end, it appears, the version number is printed when you use the “--help” option. (the “--help” option is not even mentioned in the “unzip”'s man page.)

Another problem with unix is that, it's documentation (man page) always start like this:

       zip  [-aABcdDeEfFghjklLmoqrRSTuvVwXyz!@$] [--longoption ...]  [-b path]
       [-n suffixes] [-t date] [-tt date] [zipfile [file ...]]  [-xi list]

who the fuck understand what gook it is talking about?

For the record, this is my elisp build script as it currently is.

; -*- coding: utf-8 -*-

;; 2009-10-01, 2010-11-15
;; This elisp script builds a ErgoEmacs elisp package.
;; Effectively, it creates a new zip file, nothing else.

;; This script is experimental. Best to use the make util at
;; ergoemacs/Makefile
;; for now.

;; What does it do:
;; copy the whole “ergoemacs” dir into some dest dir. The “ergoemacs” is the dir from root checked out from svn.
;; remove all .svn dirs.
;; remove other files and dir such as Makefile and win32-setup etc.

;; First, change the version number in variable zipDirName”.
;; then, just eval-buffer.
;; The result will be a new zip file (and a unzipped dir) at the root of your svn checkout.
;; For example, if your svn checkout path is
;;   c:/Users/xah/ErgoEmacs_Source
;; then the following are created
;;   c:/Users/xah/ErgoEmacs_Source/ergoemacs_1.9.1
;;   c:/Users/xah/ErgoEmacs_Source/ergoemacs_1.9.1.zip

;; This script requires unix “find”, “rm”, “cp”, etc.

(defvar zipDirName nil "the zip file/dir name")
(setq zipDirName "ergoemacs_1.9.1.1")

(defvar sourceDir nil "The ergoemacs source code dir in repository. By default, this is parent dir of the dir this file is in.")
(setq sourceDir (expand-file-name  (concat (file-name-directory buffer-file-name) "../")) ) ; e.g. "c:/Users/xah/ErgoEmacs_Source/ergoemacs/"

(defvar destDirRoot nil "The output dir. Will be created if doesn't exit. By default, this is 2 dir above this file.")
(setq destDirRoot (expand-file-name  (concat (file-name-directory buffer-file-name) "../../"))) ;

(setq destDirWithZipPath (concat destDirRoot zipDirName "/"))

;; set to absolute path if not already
(setq sourceDir (expand-file-name sourceDir ) ) 
(setq destDirRoot (expand-file-name destDirRoot ) )
(setq destDirWithZipPath (expand-file-name destDirWithZipPath ) )

;; main

;; if previous build dir and zip file exist, remove them.
(let ()
  (if (file-exists-p destDirWithZipPath) (delete-directory destDirWithZipPath t))
  (if (file-exists-p (concat destDirWithZipPath ".zip" )) (delete-file (concat destDirWithZipPath ".zip" )) ) )

;; create the new dest dir
(make-directory destDirWithZipPath t)

;; copy stuff over to dest dir
;; (shell-command (concat "cp -R " sourceDir " " destDirRoot) )
(copy-directory sourceDir destDirWithZipPath )

;; delete “.svn” dir and other files we don't want
(shell-command (concat "find " destDirWithZipPath " -depth -name \".svn\" -type d -exec rm -R {} ';'" ) )

;; (require 'find-lisp)
;; (mapc 'my-process-file
;;  (find-lisp-find-files destDirWithZipPath "\\.svn$")
;;  (find-lisp-find-files "c:/Users/xah/xx2/ergoemacs_1.9.1.1/build-util/" "")
;;  (find-lisp-find-dired-subdirectories "c:/Users/xah/xx2/ergoemacs_1.9.1.1/build-util/")
;; )

;; delete emacs backup files
;; (shell-command (concat "find " destDirWithZipPath " -name \"*~\" -exec rm {} ';'" ) )
(require 'find-lisp)
(mapc 'delete-file (find-lisp-find-files destDirWithZipPath "~$"))

;; delete Windows specific setup dir
;; (shell-command (concat " rm -R " destDirWithZipPath "win32-setup"))
(delete-directory (concat destDirWithZipPath "win32-setup") t)

;; delete misc files we dont need
(delete-file (concat destDirWithZipPath "Makefile"))
(delete-file (concat destDirWithZipPath "build-util/build_ergoemacs_package.el"))

;; byte compile elc files
(load-file (concat destDirWithZipPath "build-util/byte-compile_lisp_files.el"))

;; zip it
(cd destDirRoot)
(shell-command (concat "zip -r " zipDirName ".zip " zipDirName ) )

;; change current dir back
(cd (expand-file-name (file-name-directory buffer-file-name)))

;; ideally, change all shell calls to elisp functions so it's not dependent on shell.
;; using elisp for build is just experimental. We can revert to unix shell in the future.

;; currently, the version number is hard coded. We probably want to make use svn's tag feature for version stapm, for building both Windows release and elisp package release.

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs