[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.9 Limitations of Usual Tools

The small set of tools you can expect to find on any machine can still include some limitations you should be aware of.

awk
Don't leave white spaces before the parentheses in user functions calls, GNU awk will reject it:

 
$ gawk 'function die () { print "Aaaaarg!"  }
        BEGIN { die () }'
gawk: cmd. line:2:         BEGIN { die () }
gawk: cmd. line:2:                      ^ parse error
$ gawk 'function die () { print "Aaaaarg!"  }
        BEGIN { die() }'
Aaaaarg!

If you want your program to be deterministic, don't depend on for on arrays:

 
$ cat for.awk
END {
  arr["foo"] = 1
  arr["bar"] = 1
  for (i in arr)
    print i
}
$ gawk -f for.awk </dev/null
foo
bar
$ nawk -f for.awk </dev/null
bar
foo

Some AWK, such as HPUX 11.0's native one, have regex engines fragile to inner anchors:

 
$ echo xfoo | $AWK '/foo|^bar/ { print }'
$ echo bar | $AWK '/foo|^bar/ { print }'
bar
$ echo xfoo | $AWK '/^bar|foo/ { print }'
xfoo
$ echo bar | $AWK '/^bar|foo/ { print }'
bar

Either do not depend on such patterns (i.e., use `/^(.*foo|bar)/', or use a simple test to reject such AWK.

cat
Don't rely on any option. The option `-v', which displays non-printing characters, seems portable, though.

cc
When a compilation such as `cc foo.c -o foo' fails, some compilers (such as CDS on Reliant UNIX) leave a `foo.o'.

cmp
cmp performs a raw data comparison of two files, while diff compares two text files. Therefore, if you might compare DOS files, even if only checking whether two files are different, use diff to avoid spurious differences due to differences of newline encoding.

cp
SunOS cp does not support `-f', although its mv does. It's possible to deduce why mv and cp are different with respect to `-f'. mv prompts by default before overwriting a read-only file. cp does not. Therefore, mv requires a `-f' option, but cp does not. mv and cp behave differently with respect to read-only files because the simplest form of cp cannot overwrite a read-only file, but the simplest form of mv can. This is because cp opens the target for write access, whereas mv simply calls link (or, in newer systems, rename).

diff
Option `-u' is nonportable.

Some implementations, such as Tru64's, fail when comparing to `/dev/null'. Use an empty file instead.

dirname
Not all hosts have dirname, but it is reasonably easy to emulate, e.g.:

 
dir=`expr "x$file" : 'x\(.*\)/[^/]*' \|
          '.'      : '.'

But there are a few subtilities, e.g., under UN*X, should `//1' give `/'? Paul Eggert answers:

No, under some older flavors of Unix, leading `//' is a special path name: it refers to a "super-root" and is used to access other machines' files. Leading `///', `////', etc. are equivalent to `/'; but leading `//' is special. I think this tradition started with Apollo Domain/OS, an OS that is still in use on some older hosts.

POSIX.2 allows but does not require the special treatment for `//'. It says that the behavior of dirname on path names of the form `//([^/]+/*)?' is implementation defined. In these cases, GNU dirname returns `/', but it's more portable to return `//' as this works even on those older flavors of Unix.

I have heard rumors that this special treatment of `//' may be dropped in future versions of POSIX, but for now it's still the standard.

egrep
The empty alternative is not portable, use `?' instead. For instance with Digital Unix v5.0:

 
> printf "foo\n|foo\n" | egrep '^(|foo|bar)$'
|foo
> printf "bar\nbar|\n" | egrep '^(foo|bar|)$'
bar|
> printf "foo\nfoo|\n|bar\nbar\n" | egrep '^(foo||bar)$'
foo
|bar

egrep also suffers the limitations of grep.

expr
No expr keyword starts with `x', so use `expr x"word" : 'xregex'' to keep expr from misinterpreting word.

Don't use length, substr, match and index.

expr (`|')
You can use `|'. Although POSIX does require that `expr "' return the empty string, it does not specify the result when you `|' together the empty string (or zero) with the empty string. For example:

 
expr '' \| ''

GNU/Linux and POSIX.2-1992 return the empty string for this case, but traditional Unix returns `0' (Solaris is one such example). In the latest POSIX draft, the specification has been changed to match traditional Unix's behavior (which is bizarre, but it's too late to fix this). Please note that the same problem does arise when the empty string results from a computation, as in:

 
expr bar : foo \| foo : bar

Avoid this portability problem by avoiding the empty string.

expr (`:')
Don't use `\?', `\+' and `\|' in patterns, they are not supported on Solaris.

The POSIX.2-1992 standard is ambiguous as to whether `expr a : b' (and `expr 'a' : '\(b\)'') output `0' or the empty string. In practice, it outputs the empty string on most platforms, but portable scripts should not assume this. For instance, the QNX 4.25 native expr returns `0'.

You may believe that one means to get a uniform behavior would be to use the empty string as a default value:

 
expr a : b \| ''

unfortunately this behaves exactly as the original expression, see the `expr (`:')' entry for more information.

Older expr implementations (e.g. SunOS 4 expr and Solaris 8 /usr/ucb/expr) have a silly length limit that causes expr to fail if the matched substring is longer than 120 bytes. In this case, you might want to fall back on `echo|sed' if expr fails.

Don't leave, there is some more!

The QNX 4.25 expr, in addition of preferring `0' to the empty string, has a funny behavior in its exit status: it's always 1 when parentheses are used!

 
$ val=`expr 'a' : 'a'`; echo "$?: $val"
0: 1
$ val=`expr 'a' : 'b'`; echo "$?: $val"
1: 0

$ val=`expr 'a' : '\(a\)'`; echo "?: $val"
1: a
$ val=`expr 'a' : '\(b\)'`; echo "?: $val"
1: 0

In practice this can be a big problem if you are ready to catch failures of expr programs with some other method (such as using sed), since you may get twice the result. For instance

 
$ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'

will output `a' on most hosts, but `aa' on QNX 4.25. A simple work around consists in testing expr and use a variable set to expr or to false according to the result.

find
The option `-maxdepth' seems to be GNU specific. Tru64 v5.1, NetBSD 1.5 and Solaris 2.5 find commands do not understand it.

grep
Don't use `grep -s' to suppress output, because `grep -s' on System V does not suppress output, only error messages. Instead, redirect the standard output and standard error (in case the file doesn't exist) of grep to `/dev/null'. Check the exit status of grep to determine whether it found a match.

Don't use multiple regexps with `-e', as some grep will only honor the last pattern (eg., IRIX 6.5 and Solaris 2.5.1). Anyway, Stardent Vistra SVR4 grep lacks `-e'... Instead, use alternation and egrep.

ln
Don't rely on ln having a `-f' option. Symbolic links are not available on old systems, use `ln' as a fall back.

For versions of the DJGPP before 2.04, ln emulates soft links for executables by generating a stub that in turn calls the real program. This feature also works with nonexistent files like in the Unix spec. So `ln -s file link' will generate `link.exe', which will attempt to call `file.exe' if run. But this feature only works for executables, so `cp -p' is used instead for these systems. DJGPP versions 2.04 and later have full symlink support.

mv
The only portable options are `-f' and `-i'.

Moving individual files between file systems is portable (it was in V6), but it is not always atomic: when doing `mv new existing', there's a critical section where neither the old nor the new version of `existing' actually exists.

Moving directories across mount points is not portable, use cp and rm.

sed
Patterns should not include the separator (unless escaped), even as part of a character class. In conformance with POSIX, the Cray sed will reject `s/[^/]*$//': use `s,[^/]*$,,'.

Sed scripts should not use branch labels longer than 8 characters and should not contain comments.

Don't include extra `;', as some sed, such as NetBSD 1.4.2's, try to interpret the second as a command:

 
$ echo a | sed 's/x/x/;;s/x/x/'
sed: 1: "s/x/x/;;s/x/x/": invalid command code ;

Input should have reasonably long lines, since some sed have an input buffer limited to 4000 bytes.

Alternation, `\|', is common but not portable. Anchors (`^' and `$') inside groups are not portable.

Nested groups are extremely portable, but there is at least one sed (System V/68 Base Operating System R3V7.1) that does not support it.

Of course the option `-e' is portable, but it is not needed. No valid Sed program can start with a dash, so it does not help disambiguating. Its sole usefulness is helping enforcing indenting as in:

 
sed -e instruction-1 \
    -e instruction-2

as opposed to

 
sed instruction-1;instruction-2

Contrary to yet another urban legend, you may portably use `&' in the replacement part of the s command to mean "what was matched".

sed (`t')
Some old systems have sed that "forget" to reset their `t' flag when starting a new cycle. For instance on MIPS RISC/OS, and on IRIX 5.3, if you run the following sed script (the line numbers are not actual part of the texts):

 
s/keep me/kept/g  # a
t end             # b
s/.*/deleted/g    # c
: end             # d

on

 
delete me         # 1
delete me         # 2
keep me           # 3
delete me         # 4

you get

 
deleted
delete me
kept
deleted

instead of

 
deleted
deleted
kept
deleted

Why? When processing 1, a matches, therefore sets the t flag, b jumps to d, and the output is produced. When processing line 2, the t flag is still set (this is the bug). Line a fails to match, but sed is not supposed to clear the t flag when a substitution fails. Line b sees that the flag is set, therefore it clears it, and jumps to d, hence you get `delete me' instead of `deleted'. When processing 3 t is clear, a matches, so the flag is set, hence b clears the flags and jumps. Finally, since the flag is clear, 4 is processed properly.

There are two things one should remind about `t' in sed. Firstly, always remember that `t' jumps if some substitution succeeded, not only the immediately preceding substitution, therefore, always use a fake `t clear; : clear' to reset the t flag where indeed.

Secondly, you cannot rely on sed to clear the flag at each new cycle.

One portable implementation of the script above is:

 
t clear
: clear
s/keep me/kept/g
t end
s/.*/deleted/g
: end

touch
On some old BSD systems, touch or any command that results in an empty file does not update the timestamps, so use a command like echo as a workaround.

GNU touch 3.16r (and presumably all before that) fails to work on SunOS 4.1.3 when the empty file is on an NFS-mounted 4.2 volume.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated by Davide on March, 6 2002 using texi2html