C Techniques Primer

v. 2024-09-05 21:56

Introduction

This page covers some software engineering techniques used when developing software in C. It assumes that you have at least read the C Language Primer.

Pragmas

In the C Language Primer, we learned about preprocessor directives such as #include. Another directive is #pragma, which stands for "pragmatic information." Pragmas are another way to pass information to the compiler during preprocessing. #pragma once is a particularly popular pragma that tells the preprocessor to include the contents of a specific file only once. It is used as a shorthand replacement for the typical

#ifndef MY_MODULE_FILE_H_
#define MY_MODULE_FILE_H_

// ...

#endif MY_MODULE_FILE_H_

Note that while most popular compilers support pragma, pragma is not part of the C standard and so is not considered portable code. The longer-form #ifndef ... it replaces is the portable form.

Data-Oriented Programming

Unlike object-oriented languages that feature classes and methods, C has only structures and functions. One way of associating functions with data in C is to prefix function names with the data type they operate on and have the functions take as their first argument an instance (or pointer to an instance) of the data the function is written for. For example:

#pragma once

#include <stdbool.h>
#include <stddef.h>

struct Buffer {
  char* _buffer; // A place to store the data
  size_t _element_size; // The byte size of an individual element in @c _buffer
  size_t _capacity; // The maximum number of elements @c _buffer can hold
  size_t _size; // The current number of elements in @c _buffer
};

struct Buffer buffer_new(size_t num_elements, size_t sizeof_element);
bool buffer_push(struct Buffer* b, char* element);
char* buffer_at(struct Buffer* b, size_t index);
void buffer_free(struct Buffer* b);

(Aside: We would typically use void* where char* is used above, but we have not introduced void* for the Deepcode Stack Machine. If you want to learn more about void*, you can read up on it here or look up "type erasure in C".)

C also does not have access control, a way to indicate that certain struct fields are "private" implementation details that should not be accessed or modified by users. Such fields are idiomatically prefixed with an underscore as done above, but it is up to the programmers to respect this convention.

Design by Contract

Design by contract is a software construction technique that formalizes software interfaces with so-called contracts. Contracts are expressed in a combination of the interface's signature and documentation. The documentation outlines preconditions, postconditions, and invariants. Pre-conditions are things that must have happened before the inferface is used, invariants are things that must be true about the data coming into the interface, and post-conditions are promises that the interface makes to the end user, for example about data coming out of the interface or the state of the program after the function is called.

This design practice is very useful in C, where you have to manage the runtime memory on your own, and where there is no language support to express memory ownership (for example, Rust's references or C++'s std::unique_ptr). Design by contract helps clarify inteface expectations and memory management requirements.

Consider the following example.

/** A struct containing something stateful. */
struct StateThing {
  // ...
};

/**
 * Initialize the @c StateThing library.
 *
 * Preconditions:
 * - @c The runtime must not have called @c statething_library_init already.
 *
 * Postconditions:
 * - It is invalid to call @c statething_library_init again in the same runtime.
 */
void statething_library_init(void);

/**
 * Perform foo with a @c StateThing instance.
 *
 * Preconditions:
 * - @c statething_library_init has been called in the runtime.
 *
 * @param st The @c StateThing instance to operate on. Cannot be @c NULL .
 * @return Some data resulting from doing foo. The caller owns the memory.
 */
char* statething_foo(struct StateThing* st);

Above we see that preconditions and postconditions are called out in their own sections in the signature documentation. The invariants are mentioned in the @param sections. Memory ownership is called out in the @return section of statething_foo, but it could have also been stated as a postcondition. This is one way of organizing the information; you are free to organize it however you would like.

If an interface is used but its contract is not fulfilled, the contract is said to be broken or violated. In this case either the program enters an invalid state (relative to the interface expectations, not necessarily "undefined behavior" as outlined in the C standard) or (preferably) the interface raises an error. If the interface exposes error information, the errors may be raised in this manner, or the implementation may simply assert and stop the program.

Implementations using design by contract may also choose to only test their interfaces within the bounds of their contract to make sure that the claimed behavior works as advertised. Sometimes this is enough; other times it is prudent to test outside the bounds of the contract, for example to ensure graceful failure modes. The necessary level of testing depends on the context you deploy your software in.

Testing

Tests are written as additional small programs that exercise your code. It is therefore helpful to write as much of your code as possible as libraries. This way you can easily import and run your code in test programs. Most modern build systems have first-class support for registering binaries as tests. For example, Bazel provides cc_test and bazel test //..., and CMake provides add_test and make test (if you have CMake generate Makefiles).