Symbol Table

A symbol represents a function, a global variable, and other named entity. The symbol table is a section that maps symbols to their addresses. Two typical symbol tables are the .symtab and .dynsym sections.

The .symtab section.

The .dynsym section.


1. Static Symbol Table

The .symtab section is associated with the section header and the .strtab section.


Will examine the static symbol table of the below program.

int g_int = 0xcafebabe;

int sum(int x, int y)
{
    return x + y;
}

int main(int argc, char** argv)
{
    int a = 1, b = 2;
    int s = sum(a, b);
    return 0;
}
$ gcc main.c -o main

1.1. Section Header

The section header provides a list of sections and their details in the ELF file.

$ readelf --wide --sections main
Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  ...
  [ 6] .dynsym           DYNSYM          00000000000003d8 0003d8 000090 18   A  7   1  8
  [ 7] .dynstr           STRTAB          0000000000000468 000468 000088 00   A  0   0  1
  ...
  [12] .plt              PROGBITS        0000000000001020 001020 000010 10  AX  0   0 16
  [13] .plt.got          PROGBITS        0000000000001030 001030 000010 10  AX  0   0 16
  [14] .text             PROGBITS        0000000000001040 001040 00013b 00  AX  0   0 16
  ...
  [22] .got              PROGBITS        0000000000003fc0 002fc0 000040 08  WA  0   0  8
  [23] .data             PROGBITS        0000000000004000 003000 000014 00  WA  0   0  8
  [24] .bss              NOBITS          0000000000004014 003014 000004 00  WA  0   0  1
  ...
  [26] .symtab           SYMTAB          0000000000000000 003048 000378 18     27  18  8
  [27] .strtab           STRTAB          0000000000000000 0033c0 0001d3 00      0   0  1
  [28] .shstrtab         STRTAB          0000000000000000 003593 00010c 00      0   0  1

Explain the output.

1.2. Section .strtab

The .strtab section is a string table that contains null-terminated strings. Each string represents a symbol name.

$ readelf --string-dump=.strtab main
String dump of section '.strtab':
  ...
  [   134]  g_int
  ...
  [   163]  sum
  ...
  [   187]  main
  ...
  [   1cd]  _init

Explain the output.

1.3. Section .symtab

The .symtab section contains all symbols in the file.

>>> readelf --wide --symbols main
Symbol table '.symtab' contains 37 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    ...
    23: 0000000000004010     4 OBJECT  GLOBAL DEFAULT   23 g_int
    ...
    27: 0000000000001129    24 FUNC    GLOBAL DEFAULT   14 sum
    ...
    32: 0000000000001141    58 FUNC    GLOBAL DEFAULT   14 main
    ...
    36: 0000000000001000     0 FUNC    GLOBAL HIDDEN    11 _init

Explain the output.

2. Dynamic Symbol Table

External symbol is a symbol whose address cannot be determined at compile time, but must be resolved at runtime.

.dynsym is the section that contains the list of external symbols. The .dynsym section works together with the .dynstr, .plt, .rela.plt, .got sections.


Will examine the static symbol table of the below program.

Examine the .plt and .got.

#include <stdio.h>

int main(int argc, char** argv)
{
    puts("hello");    
    return 0;
}

Use ‘-fcf-protection=none’ to place puts@plt to .plt instead of .plt.sec.

$ gcc -fcf-protection=none main.c -o main

2.1. Section .dynsym

The .dynsym section is a string table that contains null-terminated strings. Each string represents a external symbol name.

$ readelf --string-dump=.dynstr main
String dump of section '.dynstr':
  ...  
  [    22]  puts
  ...

2.2. Section .rela.plt

rela.plt is a relocation table that tells the runtime linker how to find addresses for external symbols.

$ readelf --relocs main
Relocation section '.rela.plt' at offset 0x600 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000003fd0  000300000007 R_X86_64_JUMP_SLO 0000000000000000 puts@GLIBC_2.2.5 + 0

Explain the output.

2.3. Sections .plt and .got

The .plt section contains a small executable code to find addresses for external symbols. The .got section holds the external symbol addresses.

                                            ┌──────────────────┐
┌──────────┐    ┌──────────────────────┐    │   ┌───────────┐  │   ┌─────────────────────────────┐    ┌──────────┐
│ main()   │    │.plt                  │    │   │.got       │  │   │ runtime linker              │    │ rela.plt │
│          │    |──────────────────────||───────────||─────────────────────────────|    │          │
│   puts()─┼────┼► puts@plt()          │    │   │           │  └───┼─►_dl_runtime_resolve()      │    │          |
│          │    │    find puts in .got─┼────┼──►│           │      │    find puts                │    │          │
│          │    │                      │    │   │           │      │    check rela.plt ──────────┼──► │          |
└──────────┘    │    if not found      │    │   │           │◄─────┼─── update puts addr to .got │    └──────────┘
                │       let linker find┼────┘   └───────────┘      └─────────────────────────────┘
                │    call puts         │
                └──────────────────────┘

First call.

Next calls.


Setup gdb to debug shared library events.

(gdb) set stop-on-solib-events 1
(gdb) set auto-solib-add 0

Start program.

(gdb) set environment LD_LIBRARY_PATH .
(gdb) file main
(gdb) start

2.3.1. Before First Call

In the main function, when we call puts, it goes to puts@plt.

(gdb) disassemble main
Dump of assembler code for function main:
   0x0000555555555139 <+0>:     push   %rbp
   0x000055555555513a <+1>:     mov    %rsp,%rbp
   0x000055555555513d <+4>:     sub    $0x10,%rsp
   0x0000555555555141 <+8>:     mov    %edi,-0x4(%rbp)
   0x0000555555555144 <+11>:    mov    %rsi,-0x10(%rbp)
   0x0000555555555148 <+15>:    lea    0xeb5(%rip),%rax        # 0x555555556004
   0x000055555555514f <+22>:    mov    %rax,%rdi
   0x0000555555555152 <+25>:    call   0x555555555030 <puts@plt>
   0x0000555555555157 <+30>:    mov    $0x0,%eax
   0x000055555555515c <+35>:    leave  
   0x000055555555515d <+36>:    ret

puts@plt jumps to 0x555555557fd0 that is the address of the associated entry in .got.

(gdb) disassemble 0x555555555030
Dump of assembler code for function puts@plt:
   0x0000555555555030 <+0>:     jmp    *0x2f9a(%rip)        # 0x555555557fd0 <puts@got.plt>
   0x0000555555555036 <+6>:     push   $0x0
   0x000055555555503b <+11>:    jmp    0x555555555020

(gdb) info symbol 0x555555557fd0
puts@got[plt] in section .got of /root/demo/main

Since the address of puts is not yet resolved, the corresponding .got entry points back to 0x1036, an instruction in .plt, as shown in the objdump output. This indicates that the symbol has not been resolved, so the runtime linker is delegated to find the address.

(gdb) x/a 0x555555557fd0
0x555555557fd0 <puts@got.plt>:  0x1036
$ objdump --disassemble --section=.plt main
0000000000001030 <puts@plt>:
    1030:       ff 25 9a 2f 00 00       jmp    *0x2f9a(%rip)        # 3fd0 <puts@GLIBC_2.2.5>
    1036:       68 00 00 00 00          push   $0x0
    103b:       e9 e0 ff ff ff          jmp    1020 <_init+0x20>

Illustrate the call flow.

                 file: main                                    file: main              
              section: .plt                                 section: .got            
             function: puts@plt()                                                    

┌────────────────────┬─────────────────────┐      ┌────────────────┬────────────────┐
│ address            │ value               │      │ address        │     value      │
┼────────────────────┼─────────────────────┤      ┼────────────────┼────────────────┤ 
0x0000555555555030 │ jump  0x555555557fd0┼───┐  │                │                │ 
│                    │                     │   │  │                │                │
0x0000555555555036 │ push  $0x0  ◄───────┼─┐ └──┼►0x555555557fd00x1036 ──┐     │
│                    │                     │ │    │                │          │     │
0x000055555555503b │ jmp   0x555555555020│ └────┼────────────────┼──────────┘     │
└────────────────────┴─────────────────────┘      └────────────────┴────────────────┘ 

Examine .got.

(gdb) info files
0x0000555555557fb8 - 0x0000555555558000 is .got

(gdb) x/9a 0x0000555555557fb8
0x555555557fb8: 0x3dc8  0x0
0x555555557fc8: 0x0     0x1036
0x555555557fd8: 0x0     0x0
0x555555557fe8: 0x0     0x0
0x555555557ff8: 0x0

2.3.2. After First Call

Continue program, it will stop when libc.so loaded.

(gdb) continue
Stopped due to shared library event:
  Inferior loaded /lib/x86_64-linux-gnu/libc.so.6

Examine .got. After runtime loader solve puts, it updated the .got entry from 0x1036 to 0x7ffff7e08e50, which is the memory address of puts in libc.so.

(gdb) x/9a 0x0000555555557fb8
0x555555557fb8: 0x3dc8  0x0
0x555555557fc8: 0x0     0x7ffff7e08e50 <__GI__IO_puts>
0x555555557fd8: 0x7ffff7db1dc0 <__libc_start_main_impl> 0x0
0x555555557fe8: 0x0     0x0
0x555555557ff8: 0x7ffff7dcd9a0 <__cxa_finalize>

(gdb) info symbol 0x7ffff7e08e50
puts in section .text of /lib/x86_64-linux-gnu/libc.so.6

Check the call flow of puts@plt. Now, it jumps directly to 0x7ffff7e08e50, the memory address of puts in libc.so.

(gdb) disassemble 0x555555555030
Dump of assembler code for function puts@plt:
   0x0000555555555030 <+0>:     jmp    *0x2f9a(%rip)        # 0x555555557fd0 <puts@got.plt>
   0x0000555555555036 <+6>:     push   $0x0
   0x000055555555503b <+11>:    jmp    0x555555555020

(gdb) x/a 0x555555557fd0
0x555555557fd0 <puts@got.plt>:  0x7ffff7e08e50 <__GI__IO_puts>

Illustrate the call flow.

                 file: main                                    file: main                              file: libc.so
              section: .plt                                 section: .got                           section: .text
             function: puts@plt()                                                                   function: puts

┌────────────────────┬─────────────────────┐      ┌────────────────┬────────────────┐      ┌────────────────┬────────────────┐
│ address            │ value               │      │ address        │     value      │      │ address        │     value      │
┼────────────────────┼─────────────────────┤      ┼────────────────┼────────────────┤      ┼────────────────┼────────────────┤
0x0000555555555030 │ jump  0x555555557fd0┼───┐  │                │                │  ┌───┼►0x7ffff7e08e50 │ endbr64        │
│                    │                     │   │  │                │                │  │   │                │                │
0x0000555555555036 │ push  $0x0          │   └──┼►0x555555557fd00x7ffff7e08e50─┼──┘   │ 0x7ffff7e08e54 │ push   %r14    │
│                    │                     │      │                │                │      │                │                │
0x000055555555503b │ jmp   0x555555555020│      │                │                │      │ 0x7ffff7e08e56 │ push   %r13    │
└────────────────────┴─────────────────────┘      └────────────────┴────────────────┘      └────────────────┴────────────────┘

3. Decode Symbol Table

3.1. Struct Elf64_Sym

The entry in symbol table is represented by the Elf64_Sym struct in x86_64 or Elf32_Sym struct in x86.

$ pahole elf64_sym
struct elf64_sym {          // offset size
  Elf64_Word      st_name;  // 0      4
  unsigned char   st_info;  // 4      1
  unsigned char   st_other; // 5      1
  Elf64_Half      st_shndx; // 6      2
  Elf64_Addr      st_value; // 8      8
  Elf64_Xword     st_size;  // 16     8
  /* size: 24 bytes */
};

Relationships to other sections.

┌────────────┐       ┌────────────┐        ┌────────────┐
.strtab  │       │  .symtab   │        │  section   |
┼────────────┤       ┼────────────┤        ┼────────────┤
│   index ◄──┼───────┼─ st_index  │   ┌────┼► index     │
│            │       │            │   │    │            │
│   value    │       │  st_shndx ─┼───┘    │  .....
└────────────┘       └────────────┘        └────────────┘

To decode struct elf64_sym from raw hex memory, we use python script hex_to_elf64_sym.py.

import struct
import sys

FMT_ELF64_SYM = "=IBBHQQ"

def hex_to_elf64_sym(hex_str: str):
    data = bytes.fromhex(hex_str)
    name, info, other, shndx, value, size = struct.unpack(FMT_ELF64_SYM, data)
    print("name info other shndx value size")
    print(hex(name), hex(info), hex(other),
            hex(shndx), hex(value), hex(size))

hex_to_elf64_sym(sys.argv[1])

3.2. Decode .symtab

We will extract and decode .symtab of the below program.

File main.c

int g_int = 0xcafebabe;

int sum(int x, int y)
{
    return x + y;
}

int main(int argc, char** argv)
{
    int a = 1, b = 2;
    int s = sum(a, b);
    return 0;
}
$ gcc main.c -o main

List sections.

$ readelf --section-headers main
[Nr] Name       Type        Address          Off    Size   ES Flg Lk Inf Al
[14] .text      PROGBITS    0000000000001040 001040 00013b 00  AX  0   0 16
[23] .data      PROGBITS    0000000000004000 003000 000014 00  WA  0   0  8
[26] .symtab    SYMTAB      0000000000000000 003048 000378 18     27  18  8
[27] .strtab    STRTAB      0000000000000000 0033c0 0001d3 00      0   0  1

Print string table .strtab.

$ readelf --string-dump=.strtab main
[   134]  g_int
[   163]  sum
[   187]  main

Extract section .symtab, which starts at offset 0x003048, size of 0x000378. The elf64_sym struct has a size of 24 bytes, so use option ‘xxd -c 24’ to display 24 bytes each line.

$ xxd -p -g 1 -c 24 -s 0x003048 -l 0x000378 main
340100001100170010400000000000000400000000000000
6301000012000e0029110000000000001800000000000000
8701000012000e0041110000000000003a00000000000000

Decode the first line.

$ hex_to_elf64_sym.py 340100001100170010400000000000000400000000000000
  name     info    other  shndx   value     size
['0x134', '0x11', '0x0', '0x17', '0x4010', '0x4']

Decode the second line.

$ hex_to_elf64_sym.py 6301000012000e0029110000000000001800000000000000
  name     info    other  shndx  value     size
['0x163', '0x12', '0x0', '0xe', '0x1129', '0x18']

Explain the outputs.

First Line
st_name 0x134 Index 0x134 in .strtab: g_int
st_info 0x11 Type object, global
st_other 0x0 Visibility default
st_shndx 0x17 Section 0x17 = 23 = .data
st_value 0x4010 Offset 0x4010
st_size 0x4 Size of 4 bytes
Second Line
st_name 0x163 Index 0x136 in .strtab: sum
st_info 0x11 Type func, global
st_other 0x0 Visibility default
st_shndx 0x17 Section: 0xe = 14 = .text
st_value 0x4010 Offset 0x1129
st_size 0x18 Size of 24 bytes

4. GDB

4.1. Add Symbols

GDB supports adding symbol table from file, it is useful when debugging code loaded by mmap.

To add the symbols.

  add-symbol-file filename
    [-readnow|-readnever]
    [-o offset]
    [textaddr]
    [-s section addr…]

To remove symbols.

  remove-symbol-file filename
  remove-symbol-file -a addr

Parameters:

For .text section.

For other sections, such as: .bss, .data,…

Other options.

4.2. Add Symbols to mmap Regions

In the below example, we will add symbols for memory regions that is loaded by mmap.

File main.c

#include <sys/mman.h>
#include <stddef.h>
#include <fcntl.h>

int main()
{
    int fd = open("math.o", O_RDONLY);

    // load math.o to memory
    mmap(
        NULL, // let kernel choose start addr of region
        4096, // length of the region
        PROT_READ|PROT_EXEC, // protection mode
        MAP_PRIVATE,         // visible mode
        fd,                  // file to load
        0                    // offset in file
    );

    return 0;
}

File math.c

int number1 = 0xCAFEBABE;
int number2 = 0xC1A0C1A0;

int sum(int x, int y)
{
    return x + y;
}

int sub(int x, int y)
{
    return x - y;
}

Build.

$ gcc -c math.c -o math.o
$ gcc -g main.c -o main

Sections in math.o.

$ readelf --section-headers --wide math.o
[Nr] Name     Type        Address          Off    Size   ES Flg Lk Inf Al
[ 1] .text    PROGBITS    0000000000000000 000040 00002e 00  AX  0   0  1
[ 2] .data    PROGBITS    0000000000000000 000070 000008 00  WA  0   0  4

Symbols in math.o.

$ readelf --symbols math.o
Num:    Value          Size Type    Bind   Vis      Ndx Name
  3: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    2 number1
  4: 0000000000000004     4 OBJECT  GLOBAL DEFAULT    2 number2
  5: 0000000000000000    24 FUNC    GLOBAL DEFAULT    1 sum
  6: 0000000000000018    22 FUNC    GLOBAL DEFAULT    1 sub

Illustrate math.o.

+---------------------+
| ELF File Header     | 0x00
+---------------------+
| Program Headers     |
+---------------------+
| .text Section       | 0x40
| ├── sum()           |      + 0x00
| └── sub()           |      + 0x18
|                     |
+---------------------+
| .data Section       | 0x70
| ├── number1         |      + 0x00
| └── number2         |      + 0x04
|                     |
+---------------------+
| Other Sections...   |
+---------------------+
| Section Headers     |
+---------------------+

Start program.

(gdb) file main
(gdb) break main.c:19
(gdb) run

Print memory mappings.

(gdb) info proc mappings
    Start Addr         End Addr   Size     Offset  Perms  objfile
0x7ffff7ffa000   0x7ffff7ffb000   0x1000   0x0     r-xp   math.o

When math.o is loaded into memory, the offsets of sections and symbols remain the same as in the ELF file. Once the memory address of the file is known, the addresses of sections and symbols can be calculated by adding their ELF offsets.

In memory, math.o is loaded at address 0x7ffff7ffa000. So, within math.o:

Illustrate memory.

+---------------------+
| Other Regions...    |
+---------------------+
| math.o              | 0x7ffff7ffa000
|                     |
|   .text             | 0x7ffff7ffa040
|   ├── sum()         |         + 0x00
|   └── sub()         |         + 0x18
|                     |
|   .data             | 0x7ffff7ffa070
|   ├── number1       |         + 0x00
|   └── number2       |         + 0x40
+---------------------+
| Other Regions...    |
+---------------------+

We need to add symbols in .text and .data of math.o, so we provide memory address of these sections to GDB.

(gdb) add-symbol-file math.o 0x7ffff7ffa040 -s .data 0x7ffff7ffa070

Check the loaded files and sections.

(gdb) info files
0x00007ffff7ffa040 - 0x00007ffff7ffa06e is .text in /root/demo/math.o
0x00007ffff7ffa070 - 0x00007ffff7ffa078 is .data in /root/demo/math.o

Check the symbol addresses.

(gdb) info function sum
0x00007ffff7ffa040  sum

(gdb) info function sub
0x00007ffff7ffa058  sub

(gdb) info variable number1
0x00007ffff7ffa070  number1

(gdb) info variable number2
0x00007ffff7ffa074  number2

After loaded symbols, we can print and call them through symbol names.

(gdb) call (int) sum(4, 5)
$3 = 9

(gdb) call (int) sub(4, 5)
$5 = -1

(gdb) print/x (int)number1
$1 = 0xcafebabe

(gdb) print/x (int)number2
$2 = 0xc1a0c1a0

References