Ari in the Shell

Posted on Mar 18, 2021

So yesterday I pushed a new git repo online: wsclean. Nothing fancy at all: it’s just a very simple shell script that helps me to clean my source files from useless tabs and spaces that sometimes are left over due to oversight, quick edits, you name it.

It’s a bit messy, I guess? But the gist of it is that it calls sed to find and replace (what I consider) unneeded whitespace characters, using a regexp.1 Pretty standard UNIX-y stuff, huh? The remaining code is just to deal with the files themselves and implementing a “failsafe mode,” which sort of puts me in a bad light as a developer. Not even I trust my own code? Ohnoes… No, just kidding: I find the failsafe mode very useful if I want to run diff on the results, just in case I’m dealing with some file that isn’t tracked by git.

My coding and computing experience has improved a lot since I’ve become increasingly more shell literate. And sometimes I wonder whether any of my projects could be rewritten from C into POSIX sh… hey, most of of what I do is some type of filtered I/O to files: both cras and schain read some file, apply some change to those files, and then print the information to stdout with some makeup to make it look better for each project’s goals. Rewritting them into POSIX sh should be trivial, I think.

Shell scripting languages are Turing complete, so in theory you should be able to write any computational problem using them. You should be able to write anything in them. String support is superb. They’re totally integrated into the POSIX userland… precisely because they’re the shell (duh!)… What prevents me to do those rewrites? I could save myself from the usual boilerplate code that comes from using strtok() to parse a string! The grep+sed+AWK trifecta is able to do the same tasks and possibly in a less cumbersome way! Ari, you’re being such a C fangirl…

Hey, maybe I am a C fangirl. Maybe my way of seeing things is different, but I have a very clear feeling as to when C is more suited than POSIX sh to do the task, even if involves lots of I/O and strings.

“Ariadna, are you mixing programming and feelings again? Why??”

Yes, I am. All human endeavors are sparked by irrational intuition. Logic keeps things moving from that initial spark.

The shell, as I see it, is a window to interact with the userland. The strengths and the weaknesses of the shell don’t really come from what it does, but from the strengths and weaknesses of the userland you’re using. POSIX sh doesn’t know anything about regexps (in fact, it uses Kleene stars in a way that is different to all regexp syntaxes), text formatting, dealing with files, etc. Its built-ins are usually very scarce (and implementation-dependent): most of what you use in a shell script are programs installed on your computer, not functions. In some userlands not even test (or its alias [ ) is a shell built-in. The very same script might work or not on someone else’s system depending on whether the userland behaves the same way or not as yours!

Yes, I know: there’s a thing called SUS, the Single Unix Specification, which is part of POSIX, and defines the standard commands in a POSIX environment. If you write portable shell code, any POSIX compliant system should run it identically. Vendors target this compliance, but hey, bugs exist. I recently discovered a bug in a friend’s AWK code because his implementation didn’t comply with the standard in a very subtle thing, whereas GNU AWK did follow the specs in that area.

So, in my view of things, I write shell scripts when I see my code as an “automated calling of commands” sort of project. If I see it as an “independent tool,” I write it in C as a standalone program, even if it makes use of I/O routines that do have interfaces at the shell level (i.e. most of unistd.h). I know, I know, this is utterly subjective. I guess the threshold lies in whether the program is defining some kind of novel interface on its own; if it does, I go for C. For example, even though these are trivial interfaces, cras and schain define file formats and a way to deal with those files. I see them as programs that need to be first-class citizens of the whole userland, instead of being a composition of other already existing userland tools. On the contrary, wsclean and phrenamer do not, so I see them as automated execution of other tools.

Again, I’m not trying to set a moral standard here. Who knows what I may end up thinking of all of this in some months or years! I know, I’m the worst.

However, being shell literate is a must for all Linux and Unix-like systems, in my opinion. GUI tools can be powerful, but shell tools open the gates to awesome ways to make your system your own. If you haven’t, look for any guides and tutorials out there (if it’s POSIX shell instead of bash, the better). It’s fun and teaches a lot about the system you’re running. Don’t be silly, give living in a shell a chance!

  1. I spent too much time deciding whether I’d use regex or regexp in this article! Settled on regexp because I like that it’s got an even number of letters. ↩︎