C Language Primer

v. 2024-09-05 21:49

Introduction

This primer introduces basic concepts and techniques required in C implementations of the deepcode Stack Machine (DSM). This primer is for someone who has no familiarity with C but has basic programming skills in another language. The purpose of this primer is to empower such a person to ask more informed questions to expedite self-study. Mastery of C requires familiarity with things such as C language standards, undefined behavior, macros, atomics, etc. This primer does not cover such things.

When looking up details about the C language, we recommend using a reference such as cppref which is modern and often provides examples.

Initial Tooling

C is a compiled language, meaning that it requires a tool called a compiler to transform C source code into a representation your computer can run. You will need a compiler available on your system. Popular options are GCC and clang.

We cover additional tools that will help you develop more efficiently in C in the Tooling Primer.

main

The C standard declares a special function called main which acts as an entypoint for a program in certain cases, and the runtime of the DSM is one of those cases. The function will have one of two signatures. For now, we will use this one:

int main(void);

This signature takes no arguments and returns an integer. The returned integer is the program exit code. Typically 0 indicates program success and a non-zero value indicates program failure. The actual value for program success depends on your runtime. (See the EXIT_SUCCESS and EXIT_FAILURE documentation for more.)

A simple program that just indicates program success may look like the following:

int main(void) {
  return 0;
}

Save the code above in a file called simple.c and then compile it. If you use GCC, your compiler invocation may look like:

gcc simple.c

If you use clang, it may look like:

clang simple.c

If there were no errors during compilation, then you will find a file called a.out in your working directory. You can run that:

./a.out

What do you see?

Expand for answer.

You will see nothing, since your program did not print anything.

If you want to test that you really did control the exit status of the program, then on Unix-like platforms, you can check the previous command's exit status with:

echo $?

Change the number after return, recompile, rerun, and check the exit status to confirm.

Later, we will cover another form of main that allows you to accept command-line arguments, but first we need to learn more theory to understand it. For more on main, see here.

The Preprocessor

If we wanted to produce the typical "hello world" program, it might look like this in C:

#include <stdio.h>

int main(void) {
  printf("hello, world\n");
  return 0;
}

We'll go into detail on what printf is later. For now, let's focus on where it came from. If we remove the following line

#include <stdio.h>

and attempt to compile the program, we will see an error about printf being an undefined function. The function is defined in a file called stdio.h that exists somewhere on your system, and #include is importing that file, making printf available to your program. Files such as stdio.h that contain code that can be included via the preprocessor are called header files.

The mechanism powering #include is called the preprocessor, and #include itself is one of the preprocessor's "directives." In this case, the preprocessor performs literal text replacement, replacing #include <stdio.h> with the contents of the file stdio.h as if you had typed that file's contents above main on your own. This feature is what allows developers to split code across multiple files in a single C program. We'll show how to do this in your own programs later.

The preprocessor has other features commonly used in C development such as support for macros, but we won't go deep into that topic in this primer.

Data Types

C has a variety of types. You can find them all documented here. For our needs, we will focus first on numbers, booleans, and arrays. We will discuss structs, pointers, and strings later.

Numbers

Numbers consist of integer, floating-point, and complex types. We'll focus on the integer types.

C provides multiple integer types, and the types differ from each other by their numeric ranges. For example, the unsigned char type is guaranteed to be one byte on your system. If a byte is 8 bits on your system, then what is the maximum value?

Expand for answer.

255, which is 2^8 - 1. Note that the number of unique representable values is 256 because it includes 0.

unsigned means that this type can only represent positive integers, so the minimum value is 0. Another type is unsigned int, and if that is 4 bytes on your system, the maximum value compared to unsigned char is much larger, but each unsigned int requires 4 times as much storage space as each unsigned char in this case.

Signed versions of the above also exist. Simply use the signed keyword before the type name to get the signed counterpart. The signed version occupies the same number of bytes as the unsigned version, but the range is different because one bit is used to store the sign. Using the byte size mentioned above, what is the maximum value of signed char?

Expand for answer.

127, which is 2^7 - 1. It differs from an unsigned char in that one bit is reserved to indicate the sign.

What is the minimum value?

Expand for answer.

-128, which is -1 * 2^7.

The reason the magnitudes are different is due to the way negative numbers are encoded; see two's complement for more.

For convenience, developers usually drop the signed keyword for signed types because, for example, signed int and int are the same type. The only exception is char--the C standard does not specify whether char is signed or unsigned, so you need to be explicit about this one. You can use char unqualified, but whether it is signed or unsigned depends on your compiler implementation and so is not portable across compilers.

Read here for more documentation on C's numeric types and their ranges.

As suggested earlier, the number of bits in a byte and the number of bytes in a type such as int are not guaranteed to be the same on all runtimes. If your program makes an assumption on a specific byte size for a type, you may encounter portability issues when deploying your programs.

C99 introduced bit-exact integer types in the stdint.h header file. The following are integer types guaranteed to be an exact number of bits "wide."

int8_t
int16_t
int32_t
int64_t

The unsigned representation of the above prefixes a u to the type name, for example uint8_t.

Assignment to the various numeric types looks as follows:

#include <stdint.h>

int main(void) {
  int i1 = -30;
  long i2 = 1234567;
  uint8_t i3 = 7;

  return 0;
}

Numeric types can overflow or underflow without warning if you assign a value to them that is outside of the type's range.

Booleans

Historically C had no first-class boolean type. Developers would typically use 1 as true and 0 as false. C99 introduced a boolean support library which exposes bool, true, and false for use in your program.

#include <stdbool.h>

int main(void) {
  bool a = true;
  bool b = false;
  bool c = 1 < 2; // true

  return 0;
}

Even though the above was introduced in C99, so many years ago that the note on pre-C99 language features might seem irrelevant, you will still encounter lots of C code in the wild that uses numeric types as boolean ones.

Arrays

Arrays are C's built-in type for collections. Arrays can only hold elements of a single type. Arrays are contiguous, meaning that all elements of a single array exist side-by-side in memory.

C99 introduced the concept of variable-length arrays, but we will not go into detail on those here. Instead, we will use C's fixed-size arrays. Below are some examples of arrays.

#include <stdbool.h>

int main(void) {
  // Storage for 5 integer elements.
  int arr1[5];

  // Storage for 3 uint8_t elements, initialized.
  uint8_t arr2[3] = {0, 70, 21};

  // Storage for 3 boolean elements, but the size is set by the initializer.
  bool arr3[] = {true, false, true};

  return 0;
}

Accessing individual array element values is done using the "array subscript operator," [].

int main(void) {
  int a[] = {0, 1, 2, 3, 4};

  int first_element = a[0]; // value: 0

  int third_element = a[2]; // value: 2

  // Change the value of the 4th element.
  a[3] = 5;

  return 0;
}

The array subscript operator takes a single integer and uses the value of the integer as an index into the array. The value must be 0 or positive. What happens if you use a negative index or an index larger than the bounds of the array?

Expand for answer.

Your program may crash, or it may not. The C standard says that using such indices is undefined behavior. This is an example of how we must take care when indexing into arrays in C--invalid indices do not necessarily raise obvious errors.

Array indices start at 0, meaning that the first element is at index 0, not index 1.

Printing

C provides various ways to output information to the console. We will focus on the printf function which we saw earlier. As done earlier, to print a hard-coded string, just pass the string to printf:

int main(void) {
  printf("foo\n");
  return 0;
}

This prints foo to "stdout." stdout stands for standard output and will likely be the terminal you are running your program in. If you run the program, you will notice that there is a newline following your message. This comes from \n which is an escape sequence. An escape sequence is a way to encode special characters (such as newlines) in a string literal. A string literal is a string that is embedded directly into source code. Another example of an escape sequence is \" which you can use to embed a quotation mark in a string literal without ending the string literal in source code.

The f in printf stands for "formatted." The string we provide to printf is the format string. The format string dictates what is printed and how. Until now, we've only printed hard-coded messages and escape sequences, but the format string allows us to print the contents of other data in our program as well.

Here is how we print ints:

#include <stdio.h>

int main(void) {
  int i = 2;
  printf("The number is: %d\n", i);
  return 0;
}

This prints The number is: 2 and a newline. %d is one of many "format specifiers." Format specifiers start with % and are followed by one or more characters that describe what kind of data to print. Specifically, each format specifier explains how printf should interpret each additional argument following the format string. In the above example, we provide 2 arguments to printf: the format string and the integer to print.

You can provide multiple format specifiers to print multiple values at once.

#include <stdio.h>

int main(void) {
  int i = 2;
  char c = 65;
  printf("i is %d, c is %c\n", i, c);
  return 0;
}

This prints i is 2, c is A. The reason c is printed as A is because the %c format specifier prints the ASCII representation of a character code, and the character code for A is 65. The order of arguments following the format string must match the order of the format specifiers in the format string. For example, in the code above, %d prints data of type int, and %d comes first in the format string, so i must come first in the list of arguments following the format string.

The format specifier for signed and unsigned integers is different.

#include <stdio.h>

int main(void) {
  int i = -2;
  unsigned int ui = 2;
  printf("i is %d, ui is %u\n", i, ui);
  return 0;
}

This prints i is -2, ui is 2.

Printing a short is similar to printing an int but with an extra modifier:

#include <stdio.h>

int main(void) {
  short s = 1000;
  printf("s is %hd\n", s);
  return 0;
}

This prints s is 1000.

See the format specifier table in cppref documentation for a table of available format specifiers and how to chain specifiers together to print the type you're interested in. Note that to print a literal %, the format specifier is %%. Also note that there is no format specifier for printing a value in binary. Finally, note that not all types--even among the few types we have learned so far--have corresponding format specifiers. For example, there is no specifier to print true or false for a bool value.

What happens if you provide more format specifiers in the format string than you provide arguments for following the format string?

Expand for answer.

The behavior of the program is undefined. You would only know this if you experimented and discovered inexplicable behavior, if you read a C standard, or if you read documentation for `printf`.

What happens if you provide fewer format specifiers in the format string than you provide arguments following the format string?

Expand for answer.

The extra arguments are ignored.

What happens if the type of an argument you provide does not match the type expected by a format specifier?

Expand for answer.

In the best case scenario, the argument can be promoted (we haven't covered the meaning of this yet). If the argument cannot be promoted, the behavior of the program is undefined.

Is printing an int32_t using the %d format specifier portable? Explain why or why not.

Expand for answer.

It is not portable. %d is guaranteed to format data of type int correctly when printing. The number of bytes, and therefore the number of bits, in int depends on the runtime environment. The number of bits in an int32_t is always the same and may not match the number of bits in an int. See a table of format constants for the fixed-width integers to see how to print them correctly.

Functions

Functions are C's way of organizing code into callable blocks. We've already seen at least two examples of functions, main and printf. printf is provided by our runtime's standard library, and while we implement the code in main freely, the interface to main is specified by the C standard. However, we can introduce our own functions where we control the interface and the implementation.

First, let's discuss some terminology, using main as an example. Until now, we've been writing the interace to main like this:

int main(void);

This interface is called the function signature. Function signatures contain the function name, the names and types of the parameters the function has, and the function's return type. void in the parameter list indicates that the function takes no parameters (such as for the version of main we've been using), and void as the return type means that the function returns no value.

Here is a custom function that accepts an integer, doubles it, and returns the result.

int get_double(int a) {
  int n = a * 2;
  return n;
}

The content after the function signature, between the {}, is called the function body, implementation, or definition.

Here is how we call the function:

int get_double(int a) {
  int n = a * 2;
  return n;
}

int main(void) {
  int number = 5;
  int doubled = get_double(number);
  return 0;
}

What happens if we move the signature and implementation of get_double after main?

Expand for answer.

If we move the signature and implementation below the main function, the compiler will say that there is an undefined reference to our custom function. This is because the compiler "understands" our program by examining our source code top-down. See below for how to address this.

If we want to let the compiler know that we will reference some function that we will define later, we can provide what is called a forward declaration by stating just the function signature. We must provide the function definition somewhere, though.

int get_double(int);

int main(void) {
  int number = 5;
  int doubled = get_double(number);
  return 0;
}

int get_double(int a) {
  int n = a * 2;
  return n;
}

Calling Convention

What will the following program print?

#include <stdio.h>

void add_one(int a) {
  a = a + 1;
}

int main(void) {
  int a = 0;
  add_one(a);
  printf("%d\n", a);
  return 0;
}

Expand for answer.

It will print 0.

C is a pass by value (or "call by value") language, meaning that the values of arguments provided in a function call are copied into the function's parameters. In other words, for the example above, add_one does not receive main's variable a to mutate but rather has its own integer also called a that takes the value of main's a. main's a is a local variable unique to main's scope. When we call add_one, we enter a new scope that provides its own local variable that is also named a.

Later on, we will discuss how to mutate the caller's arguments instead of working with copies.

The Stack

The stack is a data structure in memory that stores scopes' local variables. When we perform int a = 0; in main, we create space for a in main's stack frame. When we call add_one, we create a new frame to hold information for add_one. add_one's a parameter is given space in add_one's stack frame and the value for the argument we provide in main is assigned to a in add_one's stack frame. Modifications to a in the scope of add_one are made to the a stored in add_one's stack frame. When we return from add_one, we pop add_one's frame off the stack and re-enter main's stack frame, and now a refers to the variable managed in main's stack frame.

It might be easier to visualize what is happening with a diagram. Let's consider the evolution of the stack for the following snippet of code taken from the Calling Convention example above.

int main(void) {
  int a = 0;
  add_one(a);
  // printf(...); etc.

Note that the diagram below is a simplified view of the contents of the stack. The stack contains additional information not shown here. For more on this additional information, study C's calling convention.

     Stack           Stack           Stack           Stack
  +----------+    +----------+    +----------+    +----------+
  |(main)    |    |(main)    |    |(main)    |    |(main)    |
  |          |    | int a    |    | int a    |    | int a    |
  |          |    |          |    +----------+    |          |
  |          | -> |          | -> |(add_one) | -> |          |
  |          |    |          |    | int a    |    |          |
  |          |    |          |    |          |    |          |
  |          |    |          |    |          |    |          |
  +----------+    +----------+    +----------+    +----------+
  Set up space    Reserve space   Set up space    add_one returns
  for main,       for main's      for add_one     and its stack
  main is         variable a.     and its         frame is popped.
  called.                         parameter a.    We are back in
                                  Call add_one.   main.

It's important to understand what the stack is and how it generally works when programming in C because this will help you manage memory correctly. We will discuss memory management more when we talk about memory allocation and the heap later on.

Pointers

We've already learned about arrays and how to access specific values in them using indices. Pointers are simply indices into arbitrary memory. Every variable is stored somewhere in memory at runtime and so has an associated memory address. You can get the address of a variable using the "address-of" operator, &. The address-of operator returns a pointer. A pointer in a type declaration is indicated by *. An example in code makes this clearer.

#include <stdio.h>

int main(void) {
  int a = 0;
  int* b = &a;
  printf("%p\n", b);
  return 0;
}

In the above, b is a variable of type "pointer to int," as indicated by the * after int. (* is independent of int and so could alternatively be written int * b or even int *b--the placement of * is an age-old style debate.) Since b is a pointer, we need to initialize it with a memory address. We get the memory address of a using the address-of operator on a, &a.

Lastly, we print the value of the pointer using the %p format specifier. This may print something like 0x16d43f348, a memory address represented in hexadecimal. If you run the same program over and over and see a different result printed each time, this is likely the effect of Address Space Layout Randomization.

One powerful feature of pointers is that we can use them to modify the value stored at the memory address pointed to.

#include <stdio.h>

int main(void) {
  int a = 0;
  int* b = &a;
  *b = 1;
  printf("%d\n", a);
  return 0;
}

The above prints 1. We did not change the value of a directly with something like a = 1;. Instead, we changed the value of a indirectly by "dereferencing" b which is a pointer to a. Derefencing is done using the "dereference operator," *. (When * appears in a type name, it indicates the type is a pointer; when * appears in front of a variable name, it is the dereference operator.)

The dereference operator converts a pointer to its underlying value. We can then use that value as we normally would.

#include <stdio.h>

int main(void) {
  int a = 1;
  int* b = &a;
  int c = *b + 1;
  printf("%d\n", c);
  return 0;
}

The above prints 2. a is assigned 1, b is a pointer to a, and c is a normal int that is assigned the sum of the value b points to (1) and the hard-coded value 1.

When learning about functions, we noted that C is a pass-by-value language, so callers' arguments to functions are copied for use in the called function, and changes to the copies are not reflected back to the caller. We can allow functions to change values outside of the function scope by passing in pointers.

#include <stdio.h>

void add_one(int* a) {
  *a = *a + 1;
}

int main(void) {
  int a = 0;
  add_one(&a);
  printf("%d\n", a);
  return 0;
}

Note that add_one now takes a pointer as a parameter, and the return type is void. Also note that add_one uses the dereference operator twice in the same line, once to get the value of a and once to set the value of a. In main, we pass the address of a (a pointer to a) to add_one. Finally, the program prints 1.

You can also assign NULL (or nullptr in C23 and onward) to a pointer to point to "nothing." The pointer is then called a "null pointer." NULL is defined to be 0 or a pointer to address 0. Most modern operating systems do not map address 0 into a program's memory space. In these operating systems, it is okay to point to 0, but it is not okay to dereference a pointer to (that is, access the value at) address 0. The result of attempting to do so is runtime-defined, but likely you will encounter a segmentation fault. More generally, it is illegal to dereference a pointer to any memory address that is not mapped into your program's memory space at runtime.

You can also create pointers to functions, but we won't cover that in this primer. See this documentation for more on constructing and using pointers, and see here for more on NULL.

Pointers are one of the most powerful features of C, but they are also involved in many bugs. It is important to understand how to use pointers correctly and how to visualize memory spaces to avoid introducing bugs.

What does the following program do? Are there any bugs?

#include <stdio.h>

int* make_one(void) {
  int one = 1;
  return &one;
}

int main(void) {
  int* a = make_one();
  printf("%d\n", *a);
  return 0;
}

Expand for answer.

We won't explain in detail what the program does. There is one bug, though. make_one returns a pointer to a local variable. Remember that the local variable is created on make_one's stack, and when make_one returns, that stack is destroyed. So main is left with a pointer into a stack frame that no longer exists. The program will likely not crash, and it may even print 1 depending on your runtime, but we cannot guarantee the value of what "a" points at once that value gets destroyed along with make_one's stack. According to the C standard, the behavior is undefined.

Pointer Arithmetic

When we printed a pointer, we saw a value such as 0x16d43f348. This is indeed just a number. In particular, the number counts individual bytes. Changing the number changes what the pointer points at. C supports incrementing and decrementing pointers, adding or subtracting integers from pointers, and subtracting two pointers of the same type to determine the number of elements between two pointers.

Although memory addresses count individual bytes, pointer arithmetic modifies pointer values using the byte size of the type pointed to. This is easier to understand with an example. Below, sizeof reports the number of bytes required by a specific type.

#include <stdio.h>

int main(void) {
  int a = 0;
  int* b = &a;
  int* c = b + 1;
  printf("b, c: %p, %p\n", b, c);
  printf("sizeof(int): %d\n", sizeof(int));
  printf("c - b: %d\n", c - b);
  return 0;
}

The above might print

b, c       : 0x16f187348, 0x16f18734c
sizeof(int): 4
c - b      : 1

Note the printed values of b and c. b ends in 0x48, and c is b + 1, but c does not end in 0x49. Instead, c ends in 0x4c. 0x4c is 4 away from 0x48. Also, the size of int on the above runtime is 4. So when we added 1 to b, we actually added one int worth of bytes. Then, when we subtract b from c, we get 1, meaning 1 element (in this case, 1 int) difference.

Aside: in the program above, is there any problem with dereferencing c?

Expand for answer.

Yes. "c" points at an address that does not have a corresponding variable in our stack.

The array subscript operator [] also works on pointers. For example, when used on a pointer p as p[n], the operator automatically dereferences the element referred to by p + n. See the example below.

int main(void) {
  int a[] = {0, 1, 2, 3, 4, 5};
  int* b = &a;
  int c = b[3];
  return 0;
}

In the example above, b is initialized to point to the first element of the array indicated by a. *b is the same as b[0]. c is assigned the value 3 because b[3] references the fourth element of a (remember, counting starts at index 0).

Pointers to Pointers

You can have pointers to pointers. For example, int** is a pointer to a pointer to an int.

Theoretically you can have infinite levels of indirection (int*******<etc>), but practically, compilers usually limit the maximum level of indirection. Many levels of indirection often indicate an overly-complex design in your software.

We will see practical examples of using pointers to pointers later.

Strings

Strings in C are sequences of characters. Here we will cover byte strings that consist of ASCII characters. (For other types of strings, see here.) Strings must be null-terminated in C, meaning the last character in the string must be \0. Such strings are called null-terminated byte strings (NTBS). C provides a string library that works with such strings.

String variables are typed as char*.

#include <stdio.h>

int main(void) {
  char* str = "Hello world.\n";
  printf("%s", str);
  return 0;
}

The program above defines a string literal and references that literal by a pointer to char called str. We then print the string using the %s specifier. Notice that we pass str, not *str, to printf. This is because the %s format specifier expects a pointer to the first character in the string to print. Also notice that the string literal does not explicitly encode \0 at the end. The compiler automatically adds a null terminator on string literals for us. If we were programmatically creating the string, we would need to take care to add a null terminator ourselves.

We will cover programmatically creating strings when we talk about memory allocation. String literals are read-only in C. To modify strings, we need to create them dynamically.

As suggested by the type of the string pointer (char*), individual elements of strings are chars. Until now we have assigned values to variables of type char using hard-coded numbers. We can also assign individual characters to such variables using "character literals." A character literal is a single ASCII character or character code enclosed in apostrophes.

int main(void) {
  char a = 'a';
  char null = '\0';
  char new_line = '\n';
  char capital_a = 65;
  char capital_b = capital_a + 1;

  return 0;
}

The Second Form of `main`

Earlier we said there are at least 2 forms of the signature for main. The second form allows our entrypoint to handle command-line arguments. It looks like

int main(int argc, char** argv);

argc is the number of command-line arguments passed to our program. On most popular runtimes (macOS, *nix, Windows), argc is at least 1--the first command-line argument is the name of the program. The c in argc stands for "count."

argv is a collection of command-line arguments represented as null-terminated byte strings. The collection itself is null-terminated. The v in argv stands for "vector."

If we build a program using the form of main that accepts command-line arguments and then invoke that program as

$ ./my_prog 1 "c primer" +-/*

then argc is 4, and the length of argv is 5 (remember it has an extra element for the null terminator). The contents of argv look like

                        sequence of
 char** argv            char*
+---------+            +-------+
| 0x...80 | -> 0x...80 |0x...b0| -> "my_prog\0"
+---------+            +-------+
               0x...88 |0x...b8| -> "1\0"
                       +-------+
               0x...90 |0x...ba| -> "c primer\0"
                       +-------+
               0x...98 |0x...c3| -> "+-/*\0"
                       +-------+
               0x...a0 |0x0    |
                       +-------+

The hypothetical value of argv is 0x...80, the memory address of the first char* in the sequence. The value at 0x...80 is 0x...b0 which is a pointer to the first character in the my_prog\0 string.

argv[1] will be a pointer to the first character in the null-terminated string "1". argv[4] will be a null pointer.

Structs

C provides a way for users to build their own aggregate types, called structures.

struct Position {
  int x;
  int y;
};

struct Player {
  char* name;
  int health_points;
  struct Position position;
};

int main(void) {
  struct Player player;
  return 0;
}

The above code introduces two new aggregate types, Position and Player. Structs are composed of "members." The Position struct has two members, one called x and one called y. Note that default values cannot be assigned to struct definitions. Also note that when referring to our new types, we need to precede our type names with struct. (This can be avoided with typedef, but we won't cover that here.)

When we create space on the stack for player via struct Player player, all members have uninitialized values. We need to manually set the member values. We access members on structs using the "member acccess operator" (sometimes called the "dot operator"), ..

// <Position and Player definitions here.>

int main(void) {
  struct Player player;
  player.name = "default name";
  player.health_points = 0;
  player.position.x = 0;
  player.position.y = 0;
  return 0;
}

Note we chain member access to get at things such as player.position.x.

Setting initial values of structs is often done in functions so the operations can be easily reused.

// <Position and Player definitions here.>

struct Player player_init(char* name, int hp, int pos_x, int pos_y) {
  struct Player player;
  player.name = name;
  player.health_points = hp;
  player.position.x = pos_x;
  player.position.y = pos_y;
  return player;
}

int main(void) {
  struct Player p = player_init("default_name", 0, 20, 30);
  return 0;
}

If you have a handle on a struct via a pointer and want to access the struct's members, you need to dereference the pointer before using the dot operator, or you can use an alternative member access operator (the "arrow operator") -> to dereference the pointer.

#include <stdio.h>

// <Position and Player definitions here.>
// <player_init definition here.>

int main(void) {
  struct Player p = player_init("default_name", 0, 20, 30);
  struct Player* p_ptr = &p;

  printf("%s\n", (*p_ptr).name);
  printf("%d\n", p_ptr->health_points);

  return 0;
}

Note that we dereference in parenthesis ((*p_ptr)) above because the member access operator has higher precedence than the dereference operator.

Memory Allocation and Management

We have so far discussed the stack and touched on the read-only memory location that string literals are stored in. There exists another important section of memory called the "heap." Unlike stack frames, which are destroyed when the associated function ends, allocated heap memory stays allocated until it is explicitly deallocated by the programmer. Using the heap, we can allocate memory for any content we'd like and share that memory with arbitrary scopes until we choose to deallocate the memory.

The C standard library provides various functions for dynamic memory management. We will focus on malloc, calloc, and free.

#include <stdlib.h>

int main(void) {
  int* a = malloc(4 * sizeof(int));
  free(a);
  a = NULL;
  return 0;
}

The program above uses malloc ("memory allocation") to allocate space on the heap for 4 ints. Note that malloc allocates bytes, which is why we need to multiply the number of ints we want (4) by the byte size of an int (sizeof(int)). We now have something like an array of 4 ints that is stored in memory outside of our stack frame. We can pass this pointer around as we have shown in other examples for use anywhere in our process. We then call free to deallocate the memory. After calling free(a), we set a to NULL to remind ourselves that a does not currently point to any memory allocated for use.

Read the documentation on malloc and free to understand important usage information about those functions. Note that it is an error to dereference a after freeing a unless we point a to other memory that is allocated for our use. Also note that malloc just allocates memory--it does not initialize the allocated memory. There is no guarantee what values we'll find in the allocated memory until we initialize the values ourselves. We can instead call calloc which will set each allocated byte to 0 for us.

If we were to use a as an array, note that a itself does not expose any information to us about how many bytes were allocated. We need to track that information elsewhere in our program.

Using the heap, we can dynamically create strings.

#include <stdlib.h>

int main(void) {
  char* str = calloc(10, sizeof(char));
  str[0] = 'c';
  str[1] = 'a';
  str[2] = 't';
  str[3] = '\0'; // Not necessary, because we used calloc.
  free(str);
  return 0;
}

In the above example, we use calloc to create space for a 10-element string (a maximum of 9 characters and one extra element for the null terminator). We manually set the first three elements to letters, and we set the fourth element to our null terminator. In this case, the fourth element is already null because we used calloc, but we set it anyway as an example.

(Note that if we wanted to, we could also create space for strings on the stack using arrays--char str[].)

We can allocate heap memory to store our custom types as well. You can pass your custom type to sizeof to determine the number of bytes required to store an instance of your custom type.

Manual memory management is the source of many, many bugs in C, so we must take care to understand the lifetime, ownership, and bounds of our allocated memory. See more about this in the C Techniques Primer.

Working with Multiple Source Files

Previously we talked about header files and the preprocessor. As you write more complex programs, you will likely want to split your code across multiple files. To create a header file, create a new file and give it a .h extension. The .h extension is not mandatory but is customary to indicate C header files. Then, define interfaces to your code (function signatures, type definitions, etc.) in the header file and add an include guard to the file. For example, in my_lib.h:

#ifndef MY_LIB_H_
#define MY_LIB_H_

/**
 * @param a The number to add one to.
 * @return The sum of @c a and 1.
 */
int add_one(int a);

#endif // MY_LIB_H_

The #ifndef, #define, and #endif portions are the header guard. These prevent the preprocessor from copying the contents of the file multiple times if the header file is included multiple times by an end user.

Next, you also need to provide the implementation for any functions in the header file. This is typically done in a corresponding .c file, for example my_lib.c:

#include "my_lib.h"

int add_one(int a) {
  return a + 1;
}

Note that the implementation file includes the corresponding header file. Among other reasons, this lets the compiler check that the signature of our implementations match the declared signatures in our header files. Also, the implementation file does not feature a header guard because we do not expect anyone to attempt to directly #include our .c file.

Finally, to make an executable, you will need a main function. Pretend this is is stored in a file called main.c. To compile this program, you will need to teach the compiler where the additional files live. In this trivial case, it is easy. The clang and gcc invocations are similar in this regard.

$ clang main.c my_lib.c -I./

The -I./ portion says to add ./ to the path to search for include files.

As programs grow to include many source files organized in different directories, the compiler invocation grows unwieldy, and we are better off using a build system to manage it for us. For more on that, see the Tooling Primer.

C Language Primer

Introduction

Initial Tooling

main

The Preprocessor

Data Types

Numbers

Booleans

Arrays

Printing

Functions

Calling Convention

The Stack

Pointers

Pointer Arithmetic

Pointers to Pointers

Strings

The Second Form of main

Structs

Memory Allocation and Management

Working with Multiple Source Files

The Second Form of `main`