Beware The POSIX Meme (apropos wc)
There’s this meme in some dark places of the Linux community: Standard compliance is always greener on the other side of the fence, namely on the BSDs, Plan 9-based/inspired systems, etc. Sometimes I believe people who say those kind of things have never seriously worked with any of those systems and just like to vent off baseless rants.
Today I’m here to debunk some misconceptions I’ve come across repeatedly, some of which I myself believed because I just followed the meme.
Your wc Smells
So, let’s compare two outputs. The first one is GNU
$ cat ohno This is one line Another line Hey, a third one! $ wc --version wc (GNU coreutils) 8.32 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Paul Rubin and David MacKenzie. $ wc -l ohno 3 ohno
Now, let’s turn to OpenBSD
$ cat ohno This is one line Another line Hey, a third one! $ uname -a OpenBSD aribsd.localhost 6.8 GENERIC#5 amd64 $ wc -l ohno 3 ohno
Have you already spotted the difference? Have you? Leading whitespace in
Time for a bit of a backstory. I’ve been working on two little scripts these days: phrenamer and wsclean. I use both quite a bit and I may even publish and mantain them as full-fledged projects; still not sure on that. I like my code to be portable, so I’ve been studying the POSIX specs as much as I can, testing my scripts out on a variety of shells, you know the drill. I occasionally use Plan 9 from User Space to test them, knowing though that Plan 9 never intended to be POSIX-compliant or backwards compatible with UNIX… but I like to see my scripts also working on that.
Oh, let’s try Plan 9 for User Space and see what happens! The
9 command is
the way you use the Plan 9 commands instead of the standard ones on your
$ 9 cat ohno This is one line Another line Hey, a third one! $ 9 wc -l ohno 3 ohno
phrenamer relies on
wc -l at some point. When I discovered this
discrepancy among systems, I immediately thought what any minimalism-loving,
POSIX-compliant girl would think of at first sight: “GNU is screwing things up
again!” Don’t blame me: it usually is the case that GNU sneaks in weird
extensions1 to the standards… and it usually is the case that OpenBSD is
pretty conservative about following them, without extending them too much.2
So I went on to
read the POSIX specs
(you may also find them under
man 1p wc in your system), because that’s
what you should do if you want to follow standards: read them:
By default, the standard output shall contain an entry for each input file of the form:
“%d %d %d %s\n”, <newlines>, <words>, <bytes>, <file>
If you haven’t dealt with
printf string specifications, just trust me: it
doesn’t sanction any leading whitespace as standard. Moreover, it goes on to
tell the world where the leading whitespace came from, namely System V:
The output file format pseudo- printf() string differs from the System V version of wc:
which produces possibly ambiguous and unparsable results for very large files, as it assumes no number shall exceed six digits.
Again, if you’re not familiar with the string format, see those 7’s? Those mean that the digit (d) to be printed must be padded for 7 digits, usually by means of space characters.
So, this means…
- OpenBSD is following the original System V Unix output.
- POSIX is not following that.
- Therefore, this time it is GNU who is POSIX compliant.
- Therefore, I am never going to support OpenBSD’s backwards output in these projects because I’m interested in POSIX compliant code.
Surprised? You shouldn’t be.
Actually, GNU is usually POSIX compliant, even though it might extend things and even though we all consider the GNU coreutils to be terribly bloated… GNU almost3 never contradicts the standard. Period. This is also valid for glibc and GCC.
By the way, the OpenBSD crowd
is very well aware of this
since 4.3 (2008!), but consider this a wontfix because of a very, very
backwards reason regarding running
wc on more than one file? Jeez…
Are You Sure Your System Is POSIX?
But isn’t POSIX mean to be, like, the way we get compatibility with “Old UNIX”? How come POSIX diverges with System V here?
The history of UNIX is complex. What is UNIX and what isn’t is… hard to answer because even though there is something called the Single UNIX Specification (SUS)… the truth is that SUS is strictly equivalent to POSIX only recently, since SUSv4/POSIX 2008. Some earlier versions of SUS were different than POSIX between 1988 and SUSv3/POSIX 2004. So… what may be certified as UNIX now might have not been in the past, depending on the standards situation at that specific time.
In any case, the mythical System V UNIX system people have also turned into some kind of meme4 was a commercial UNIX system released by AT&T. Its birth has a lot to do with the Bell System antitrust case and little to do with the so-called “Research”5 UNIX developed by our beloved heroes Dennis Ritchie, Ken Thompson, and friends. In fact, Plan 9 came to be as a way for the original developers of “Research” UNIX to keep writing an OS on their own. System V might sound like the “canonical” UNIX just because it was hugely influential, but truth is that the original BSD line (1.0BSD through 4.3BSD) was truer to the Research line.
I think this becomes very clear when you realize that the direct descendants of System V that are still alive are Solaris (!) and HP-UX.
The fact is that UNIX is a mess, conceptually speaking. If you come across anyone who tells you that they “write code for UNIX systems,” they’re probably clueless. This is precisely why POSIX came to be.
The BSDs are awesome systems, Plan 9 was a beautiful experiment, yet they’re not necessarily “more standards compliant” than your regular ol' GNU userland running on a Linux system.6 OK, GNU puts lots of extension-traps in place wherever you go, but that’s all. If you want a strictly POSIX userland, you’re only safe choice among current systems is… Apple Darwin… and the only way to get a functional copy of it is, you know, buying macOS.
The Lesson Here
The lesson here is that if you care about portability, just caring about the
standard might not be enough. Yeah, sure, systems in the broad “UNIX-y family”
should be striving for POSIX-compliance… but here it is you who have to
decide… If some feature you rely on is POSIX, but some systems don’t follow
it… you’ll have to choose whether to keep strict standards compliance (and
therefore, breaking support for those systems; e.g. OpenBSD in the case of
wc) or… you’ll have to implement some kind of workaround.
The honest and easiest thing to do, in my opinion, especially when writing shell scripts is to tell your users which your intended target systems are… and actually be quite conservative about your claims… even when you know your code very probably works on quite a bunch. Better be safe than sorry… in my view.
Keep this word in mind. ↩︎
I don’t know anything about the other BSDs. I tried installing FreeBSD once in my life and failed. ↩︎
Fun fact: GNU
wcprints a non-standard format when used without any options, as it inserts one leading space character for some reason I just don’t get! ↩︎
I believe it’s because of the efforts to relive SysV init system in some fashion, against, of course systemd… ↩︎
That’s a retronym coined now to distinguish their OS from all that happened after AT&T took over the intellectual property of Bell Labs. ↩︎
There is no such thing as “GNU/Linux.” Fight me. ↩︎