Sanitizers – or how I learned to stop worrying and love the compiler.

Hi Everyone,

In what I hope to be the first of many posts, this blog is going to start introducing those of you learning C++ to what will hopefully be useful bits of information to enhance your proficiency as well as a bit of an insight into the life of a developer (at least, in future posts).

So, let’s get started.

When writing C++, whether you’re a beginner or a seasoned expert; you’re eventually bound to make a mistake – whether it comes from writing new code or changing existing stuff. The mistakes can range from as benign as an exception being thrown to as bad as data corruption from a thread race or a crash due to accessing invalid objects.

Enter the sanitizers: Both the Clang and GCC compilers have a rich set of tools that can help you to debug your code and figure out what you’re doing wrong, without necessarily even having to resort to stepping into a debugger. Turning them on is as simple as passing an additional parameter to the compiler (typically referred to in build systems such as CMake as compiler options).

If you wanted to invoke it on the command-line directly with some single, simple C++ source file it would look something like g++ -O1 -g -fsanitize=thread -o MyExe mysourcecode.cpp  or if you use clang: clang++ -O0 -g -fsanitize=address -o MyExe mysourcecode.cpp – note that for the g++ example I passed -O1; this is because basic optimisation is actually preferred for the thread sanitizer, but this isn’t necessarily true of the others. Always read the documentation!

Anyhow, for example, -fsanitize=address provides you with great output if you access invalid memory, overflow a buffer, use an object after it’s been destroyed or try to destroy an object twice (such as a double delete or free())

The code below has an error wherein it accesses an object beyond the current size of the vector (array)

Without sanitizers, we would likely just crash or print out garbage.

But, when we run it with address sanitizer:

Apologies for it being a bit hard to read, but we can see that it reports “heap-buffer-overflow” and the third line says “#0 0x10cde652a in main addrsan.cpp:10” – I named the source file “addrsan.cpp” so that’s why we see that. The “:10” signifies the line number of the source code that caused the error – and indeed, looking at the original source code, we can see it was exactly that line that caused the error within our main() function.

Sadly, not all errors are necessarily this simple to debug as the reason why a particular line of code happened to do something wrong can sometimes be a symptom of a problem earlier on in your code.

There is also -fsanitize=undefined which can catch a variety of errors from misusing more complex C++ language features such as forgetting to have a virtual destructor on a base class or casting an object to an inappropriate type.

If you’re multi-threading, -fsanitize=thread can detect data races for you – such as one thread updating a value while another one is reading it (thereby causing the reader to get some of the old value and some of the new value which is likely garbage in the context of your program)

Finally, it’s worth noting that these sanitizers all have a hefty CPU and memory cost and you should never use these for builds of software that you intend to release, they’re purely for debugging – I personally validate all of my software by generating separate builds with each of the various sanitizers and run them all to make sure that no errors are encountered before producing the final, fully-optimised release build.

I hope you found this blog post informative and I look forward to seeing you folks at the next meetup!

Cheers,
Oliver.

Leave a Reply