What Does 'static' Really Mean

Posted on Mar 13, 2021

Welcome to the first episode of Ari teaches you why C is awesome, a series which I don’t have any plans for and probably will lead to nowhere in particular. In fact, it’s not even a series. Move along!

Because of covid scalc,1 I’ve been into thinking about how and when static variables are a good thing in C. You know, in programming everything is a trade-off and static variables are, in some way, a cheaty way to get something like a global variable, but keeping local scope… if you declare it inside a procedure, that is.

Take a look at this code snippet:

#include <stdio.h>

#define ROUNDS 20

/* Wait, what is this 'static' doing here? More on that later! */
static int
my_proc(int number)
{
	/* 
	 * This variable stores how many times we called the procedure. 
	 * Useless, but illustrative.
	 */
	static int call_times = 0;

	call_times++;

	/* 
	 * call_times obscures the return values for illustrative purposes.
	 * This is an obviously 'evil' usage of a static variable. 
	 */
	if (call_times % 2 == 0)
		return number * 15;
	else if (call_times == 3)
		return number;
	else
		return -1;
}

int
main(void)
{
	int i;

	for (i = 0; i < ROUNDS; ++i)
		printf("%d\n", my_proc(i));

	return 0;
}

In the code above, call_times is a static variable that is declared inside my_proc(). So far, so good. This means the variable doesn’t get “reset” across calls to my_proc(), but keeps its value. That’s why the return values become quite obscure if you only take into account the integer argument that my_proc() takes. That static variable is leading to side-effects that you’re not able to control by manipulating how you call the procedure. In that regard, a static variable brings in the same type of problems a global variable brings: behavior of a procedure depends on parameters that are not passed by the calling code. Yes, I can’t reference call_times outside my_proc() (it would spit out a compiling error), but yet it behaves very similarly to a global variable.

OK, but what about my_proc() being declared as a static procedure? Isn’t that supposed to be like the C way to declare a procedure “private” to a module? Why did the designers of C reuse the same keyword? The effects seem similar enough, but wouldn’t make more sense if the keywords were different?

And, by the way, static global variables also exist. Have you ever compiled any suckless-inspired project that uses a config.h file to configure the program? If you have, I’m sure you’ve noticed they’re usually full of static global variables. These variables behave like usual global variables that you don’t export to other modules.

Why are these all these things declared using the very same keyword? Enter Assembly.

If you suffered a heart attack by reading that, I’m sorry, but I’m a firm believer that knowing something about how the computer works and, therefore, knowing some ASM is crucial for being a decent programmer. I’m in no way an ASM programmer, but I do see how knowing about how microprocessors work has helped me be a much better programmer. And no, please, don’t come to me with the argument that “programming is not about computers.” I know where that BS comes from: from a brutal misreading of the famous SICP book.

Let’s see what the ASM of the code snippet above looks like. I’m generating it using this gcc command, where static1.c is the name I gave to the source file.

$ gcc -S static1.c -masm=intel -Wall -Wextra -Wpedantic -std=c99 -D_POSIX_C_SOURCEi=200809L

I’m using Intel ASM syntax because I can’t stand the AT&T syntax gcc insists in using as its default. The rest of the flags are the standard flags I always use with C and I suggest you also use if you want your code be portable across POSIX systems.

I’m reproducing just some parts of the code here, but you’ll find the whole ASM code for you to download here: static1.s

;; [...]

my_proc:
.LFB0:
	.cfi_startproc
	push	rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	mov	rbp, rsp
	.cfi_def_cfa_register 6
	mov	DWORD PTR -4[rbp], edi
	mov	eax, DWORD PTR call_times.0[rip]

;; [...]

.LC0:
	.string	"%d\n"
	.text
	.globl	main
	.type	main, @function
main:
.LFB1:
	.cfi_startproc
	push	rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	mov	rbp, rsp
	.cfi_def_cfa_register 6
	sub	rsp, 16
	mov	DWORD PTR -4[rbp], 0

;; [...]

.LFE1:
	.size	main, .-main
	.local	call_times.0		; OH, looks familiar!
	.comm	call_times.0,4,4
	.ident	"GCC: (GNU) 10.2.0"
	.section	.note.GNU-stack,"",@progbits

If you try looking for our non-static variables number and i, you won’t find them cited literally like that. Ever wondered why stack variables don’t need to be freed, unlike dynamically allocated ones? Because the way stack variables are created is by modifying where your stack pointer points at (in 64 bits x86 ASM: rsp), assuming a LIFO order (that’s why it’s a stack!) and then restoring it to where it was before entering the procedure. Stack memory was previously allocated by the kernel itself, so it’s safe to assume all its addresses are valid.

But, you see that there are two pieces of data that are directly placed into our assembly code: the printf() format string and call_times, which is stored and set at the end of the code, right below main is ended. Both are declared literally as set memory addresses, like they were procedures…

That’s why the static variable keeps its value across calls: because its data lives outside the stack, but at the same level procedures are stored in memory. The constant literal string also lives there, but you might notice that it is marked as .rodata (read-only data).

That’s the thing. Computers only know about memory addresses and binary data. Ever seen videos on YouTube on how easy it’s to glitch out an old GameBoy handheld? That’s because it lacked memory protection and if you told it to “run” a memory address where a sprite was stored, it’d try to do so… and sometimes the first significant byte of that data randomly corresponded to a CPU opcode, so… data could get “executed!” For an x86 CPU running through our code, well, the literal string and the integer we hardcoded into the binary are just a bunch of bytes stored at those position for some reason, exactly like some opcodes are stored. It’s the code execution flow which must ensure that things don’t wreck havoc.

So, now back onto why C uses static both for variables and procedures: its because for the CPU, it’s all just binary stuff. Semantically speaking, from C’s point of view, static means something along the lines of “This piece of information is only accessible to its declaration context,” but that’s an inexact way of explaining what is really going on.

Now, what about my_proc() being declared as static. By default, all memory addresses within a program’s memory space2 are accessible by any point of that program. The static keyword isn’t translated to anything from C to object code; it’s just enforced by the compiler not letting you refer to that procedure outside that module… As you get the “private” effect that way without any further code, that’s fine. It’s very advisable though to mark all non-exported procedures as static in your code to help the compiler optimize you code, though. Notice, however, how main(), on the other hand, receives a special treatment in this line:

.LC0:
	.string	"%d\n"
	.text
	.globl	main
	.type	main, @function

It’s marked “global” for a reason… You know, when we’re writing userland programs, our programs are not executed by the kernel, but directly by the CPU. There’s no real hierarchy; processes trees are “artificial,” so to speak. Moreover, our binary code could work on any compatible CPU regardless of OS, as long as we don’t refer to any OS-specific procedure. However, the binary is packaged into a binary format, usually ELF in Linux-based and BSD-based systems, MachO in Darwin-based systems, PE in Windows, etc. Ever wondered how Wine works on Linux without recompiling anything? It just teaches Linux how to read PE executables and brings in replacement libraries so that references are met, but the actual opcodes are the same for any computers that share the same CPU instruction set. Nevertheless, the kernel expects your code to have an entry point called main so that it’s got a stable reference that it can access to from the outside (that’s why it must be exported!) to load your program in memory and make your CPU follow its instructions.

Hope this was interesting for you! I know this world favors higher-level stuff that lives in the browser or in some super abstracted container, but I think we do need to get back to the root of things to build better software. In the end, hidden behind layers and layers of abstraction there’s always a physical CPU and motherboard running your stuff. Knowing how that works for sure makes you understand how everything is possible. Don’t you think?

OK, after filling the Internet with all this arcane knowledge on a Saturday, I can now go for some nice coffee and chill out. I deserve it!


  1. Have you noticed how the phrase “because of covid” seems to have crept into our language as some sort of catch-all causal clause for everything that’s going on? BTW, by making that terrible joke, am I implying that scalc is somehow an illness? Funny how language works… I should talk more about linguistics on this site, shouldn’t I? (Spoiler alert: I plan to.) ↩︎

  2. Remember, we’re dealing with protected mode stuff here… Virtual memory is in force here. ↩︎