The C++ programming language has a couple of different variable types designed to manage text data. These include C strings, which are defined as arrays of characters, and the C++ string data type.
These types of variables can be used for a variety of different purposes. The most visible is printing messages to the console, but strings can also be used to read from and write to files, can be copied from one memory buffer to another, etc.
An array of C++ functions exist for C++ string manipulation. These include functions like printf and scanf.
The definitions of these functions include a requirement for a format string. This format string can either be a self-contained string to be printed, copied and so on, or it can draw from other variables to build the final string. A failure to use this format string properly creates vulnerabilities in a C++ application.
What are format specifiers?
A C++ format string can be a simple collection of characters. However, this is not a requirement. C++ also permits the use of format specifiers to enable functions like printf to take additional arguments and use them to build the final string.
Some of the more commonly used format specifiers include %d and %i (printing integers), %f (printing floating point numbers) and %s (printing a C string).
However, a number of other format string specifiers exist as well. Some of the more useful ones include:
%h: Printing the contents of the indicated memory address in hexadecimal
%p: Printing memory as a pointer
%n: Writing the number of values printed so far to the indicated memory address
Used properly, format strings and format specifiers enable a developer to perform a number of useful operations in an efficient and compact manner. However, these same format specifiers can also be abused as part of an attack.
How the printf function works
Understanding the risk posed by format strings and format string specifiers requires an awareness of how the printf function and similar functions are defined, and how the stack works within an application.
When looking at the function definitions for the C++ printf function, you’ll notice that the definition includes a single argument (format) followed by an ellipse. This ellipse indicates that the printf function can take a variable number of arguments. At a minimum, it requires the format argument. An example of this would be something like the statement printf(“Hello World!”); . In this case, the string Hello World! is the format string used by printf.
However, printf’s format string can also include format specifiers. For example, an application may want to greet a particular user. One option is to use string concatenation followed by a printf statement:
string greeting = “Hello “;
greeting += input;
greeting += “!\n”;
While this is effective, it is inefficient. A format string specifier can be used to achieve the same goal much more cleanly. The following statement achieve the same goal of printing a greeting for the user:
In this example, when printf is called, it receives two arguments. In addition to the format string, the C string input is pushed onto the stack. The printf function pops both arguments off of the stack, replaces the %s format specifier with the corresponding variable and prints the result for the user.
The variable number of arguments in printf is extremely useful. The added flexibility that it provides removes the need for clunky statements like the first example. However, printf’s structure also creates the potential for exploitation if the function is not used correctly.
Misuse of format strings creates vulnerabilities
The definition of printf assumes that a developer will always use a defined format string with additional arguments to handle any untrusted user input. However, this is not always the case.
A developer wishing to echo a user’s input back to the command line may choose to use a simple statement like printf(input”); . Since the user input is likely a C string (or can be converted to one easily), this won’t throw any errors.
However, this design gives the user control over how the printf function works and enables them to use format string specifiers. The image below shows an example of what happens if a malicious user enters an input of %x%x%x%x.
As shown in the image above, two interpretations exist for the purpose of the variables on the stack.
On the left is the calling function’s view of the stack (the true one). Since the developer’s program is not aware that the user entered a string including format string specifiers, the only variable that it pushes to the stack before the call to printf is the user-provided format string. It expects printf to only request one argument.
The other interpretation of the stack is on the right. This is printf’s interpretation of the stack. When it pops and reads its format string, it sees four format specifiers (each %x). It then goes to the stack and pops the variables that it needs to fill these specifiers and prints the resulting string.
The end result of this attack is printing values off of the stack, which may reveal sensitive data stored there. However, this is not the only way that format strings can be exploited. The use of %n allows a user to write to an address that they specify, potentially rewriting crucial values or crashing the program.
Secure string operations with C++
Functions like printf are designed to take a format string and a variable number of arguments that are based on the contents of the format string. C++ developers should always specify their own format string, especially if processing untrusted input. A failure to do so allows a user to define their own, malicious format string.