"Segmentation violations" or "segfaults" are an annoying but are often easy to debug. A seg fault happens when your program tries to access memory outside of the range of addresses allocated for your program. Here is an explanation about what a segfault is and some tips for debugging segfaults in C/C++ on you unix-based platforms.

How Does a Segfault Happen?

From a practical standpoint there are two classes of segfaults: those where you try to reference address 0 (null pointer) and those where you try to reference an illegal address that is non-zero. They "tend" to happen for different reasons.

Referencing Address 0

This can happen when a variable is undefined on a systems in which the OS zeroes out memory before giving it to the user. For example: trying to dereference a null pointer or trying to use a C-style string (char *) without first giving it a value.

For example rather than:

       treeNode->name;   // value of treeNode not checked for null

Program defensively. Never assume a pointer you have is non-null:

       if (treeNode)
           treeNode->name;          
       else
           some replacement task

In C++ there are pointer classes you can use to help protect against this. This can cause a similar dereferencing problem:

       {
         char *stringToPrint;   // do not initialize

         printf("String: %s\n", stringToPrint);
       }

Referencing Illegal Address that is Non-zero

This can happen if you read past the end of an array or write into a pointer with a nonpointer, but 9 times out of 10 it is because: you have a pointer to something allocated, you delete that thing, reuse the space and then go back and reference the pointer.

Using gdb to debug a segfault

Let's use this program as an example. It is in file happy.cpp.

  1 #include 
  2 
  3 int strlen(char *s)
  4 {
  5     char *t;
  6     t = s;
  7     while (*s) s++;
  8 
  9     return s-t;
 10 }    
 11 
 12 int main()
 13 {
 14     char *s;
 15 
 16     s = (char *)"Totoro";
 17     printf("The string %s has length %d\n", s, strlen(s));
 18     s = (char *)0;
 19     printf("The string %s has length %d\n", s, strlen(s));
 20 
 21     return 0;
 22 }

With a name like happy, nothing can go wrong. :-) Let's see. First compile the program with the -g option to retain debug info.

$ g++ -g happy.cpp -o happy

This created a program happy. Then run the program.

$ happy
The string Totoro has length 6
Segmentation fault

Oh dear, death with a segfault! Where did it happen? Let's run gdb on the executable and find out. So we run gdb on happy:

$ gdb happy
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-119.el7

Now the program is loaded into gdb. We can run it. If the program happy took any arguments you would put those arguments after the "r" below. e.g. r testpgm.c- or r < data.txt But happy doesn't take any arguments so just "r" will do.

(gdb) r
Starting program: /home/rs-cs-heckendo/happy
The string Totoro has length 6

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400578 in strlen (s=0x0) at happy.cpp:7
7	    while (*s) s++;

Program stopped in function strlen which was given an argument of 0x0. The segfault happened at line 7 in happy.cpp. The only variable that can go wrong is s. So let's print the value of s:

(gdb) p s
$1 = 0x0

Oh! s=0 but I do a *s! There's the problem. I am trying to access memory location 0! But how did it end up being zero? Humor me and let's look at the execution stack at the moment it died:

(gdb) bt
#0  0x0000000000400578 in strlen (s=0x0) at happy.cpp:7
#1  0x00000000004005d7 in main () at happy.cpp:20

bt stands for backtrace. Here we see main was called and that called strlen with aregument s=0 and here we are. The p command let's us print any variable we can have access to here. For instance what is the value of t?

(gdb) p t
$1 = 0x0

We can list from the source code right out of the debugger with the l command:

(gdb) l
2
3       int strlen(char *s)
4       {
5           char *t;
6               t = s;
7                   while (*s) s++;
8
9           return s-t;
10          }
11

So let's go look in the routine that called strlen! We can go up and down the execution stack:

(gdb) up
#1  0x00000000004005d7 in main () at happy.cpp:20
20      printf("The string %s has length %d\n", s, strlen(s));

So this is the place where strlen was called. What is the value of s here?

(gdb) p s
$2 = 0x0

What does the code look like here:

(gdb) l
15        char *s;
16
17        s = (char *)"Totoro";
18        printf("The string %s has length %d\n", s, strlen(s));
19        s = (char *)0;
20        printf("The string %s has length %d\n", s, strlen(s));
21
22        return 0;
23    }

I think we have a problem solved. Strlen was called with an argument of zero which caused strlen to try to dereference the address 0 and that gave a segfault.

Let's set a breakpoint at line 18 and then run the program again:

(gdb) b 18

Breakpoint 1 at 0x40059f: file happy.cpp, line 18.

(gdb) r

Starting program: /home/rs-cs-heckendo/happy

Breakpoint 1, main () at happy.cpp:18
18             printf("The string %s has length %d\n", s, strlen(s));

What is the value of s here?

(gdb) p s
$1 = 0x400690 "Totoro"

Let's continue running:

(gdb) c
Continuing.
The string Totoro has length 6

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400578 in strlen (s=0x0) at happy.cpp:7
7	    while (*s) s++;

Now we can quit the debugger:

(gdb) q

Other useful commands are help which will give us help with any the gdb command. The commands have a lot of complex options many of which you normally would not need to use. But you can ask and learn.

(gdb) help

and step which steps through the code.

(gdb) s

Segfaults are always a problem

Segfaults may happen on some machines, but not others. If that happens there is generally a problem with your code anyway and you got away with it on one machine by luck. For example it might run on a Windows machine but not a Linux machine. Happens a fair bit. It all depends on how memory is laid out and whether it is zeroed or not. So fix it now so you won't have to fix it later! Most often cause of a problem where it runs on one machine but not another is a variable is not defined and it accidently had a good value on one machine and not on another.

Happy hunting!