Thursday, January 27, 2011

Quick checks on VxWorks images - part 2

Following the 1st post on VxWorks, let's check some more internals on binary system images. I will focus on VxWorks version 5, which is based on a proprietary binary format, whereas VxWorks version 6 makes use of the ELF executable and linkable format. VxWorks 5 images are monolithic: everything -system and applications- is often built and linked into a single executable file. And luckily (or by mistake?) most of the VxWorks 5 images I have seen include debugging symbols! Manufacturers seem forgetting to strip their firmwares...

My goal for now is to retrieve the symbol table containing all functions' code start address, and corresponding symbols' name and type. Firstly, I will try to find suspicious areas were functions or variables names are collapsed, separated with padding bytes.
For example:

[...]
00137E80   74 65 6C 6E  65 74 64 54  61 73 6B 44  65 6C 65 74  telnetdTaskDelet
00137E90   65 00 74 65  6C 6E 65 74  64 53 65 73  73 69 6F 6E  e.telnetdSession
00137EA0   44 69 73 63  6F 6E 6E 65  63 74 46 72  6F 6D 53 68  DisconnectFromSh
00137EB0   65 6C 6C 4A  6F 62 00 74  65 6C 6E 65  74 64 53 65  ellJob.telnetdSe
00137EC0   73 73 69 6F  6E 44 69 73  63 6F 6E 6E  65 63 74 46  ssionDisconnectF
00137ED0   72 6F 6D 53  68 65 6C 6C  00 74 65 6C  6E 65 74 64  romShell.telnetd
00137EE0   53 65 73 73  69 6F 6E 44  69 73 63 6F  6E 6E 65 63  SessionDisconnec
00137EF0   74 46 72 6F  6D 52 65 6D  6F 74 65 00  74 65 6C 6E  tFromRemote.teln
00137F00   65 74 64 50  61 72 73 65  72 43 6F 6E  74 72 6F 6C  etdParserControl
[...]

After some more checks, I have a good view on the way those names are appended all together: it is only printable characters separated by 1 or few null bytes. This will be easy to automate a search. Furthermore, it can help in retrieving the OS loading address used by the bootloader: addresses of those symbols in the static file are not those at the execution time; however, they are all moved by a fixed offset: ...the loading address.
An idea is to make use of the difference between all the symbols strings address instead of using directly their static addresses in the file. May these relative offsets help in retrieving the symbol table?
Let's open python again, find sequence of symbols and return the list of offsets between them:

def scan_for_symbols(img):
    # printable chars:  0x21 to 0x7E
    # scan the file for printable chars,
    # spaced with \x00 repeated 1 to maxp times
    
    # constants:
    # char for padding between symbols
    pad_char = '\x00'
    # maximum padding bytes admitted between symbols strings
    maxp = 8
    # limit to determine possible sequence of symbols names
    min_pattern = 0x100 
    
    # initialized:
    num_pattern = 0 # count possible symbols during scan
    start_addr = 0 # store the possible sequence start address
    addr_cur_sym, addr_prev_sym = 0, 0 # start address of possible symbols
    p, pad, acc = 0, 0, 0 # address pointer, and pad and char counters
    addr_diff = [] # list with address offsets between symbols
    ret = [] # list with addr_diff lists found
    
    while p < len(img):
        if 0x21 < ord(img[p]) < 0x7E:
            if start_addr == 0:
                start_addr = p
            if pad > 0:
                addr_cur_sym = p
                if addr_prev_sym > 0:
                    addr_diff.append(addr_cur_sym - \
                     addr_prev_sym)
                addr_prev_sym = addr_cur_sym
            pad = 0
            acc += 1
        elif img[p] == pad_char and pad <= maxp:
            if acc > 0:
                num_pattern += 1
            acc = 0
            pad += 1
        else:
            if num_pattern > min_pattern:
                print '[+] possible symbols starting' \
                      ' at address: 0x%x' \
                      % start_addr
                ret.append(addr_diff)
            addr_diff = []
            addr_cur_sym, addr_prev_sym = 0, 0
            pad, acc = 0, 0
            num_pattern = 0
            start_addr = 0
        p += 1
    return ret

And as a result in the python interpreter (with the firmware image of the IP-enable fridge):

>>> s = scan_for_symbols(img)
[+] possible symbols starting at address: 0x10f90e0
[+] possible symbols starting at address: 0x138fdbc
[+] possible symbols starting at address: 0x14de5f8
>>> [(len(l), l[:15]) for l in s]
[(481, [24, 24, 36, 32, 24, 24, 28, 28, 24, 28, 24, 20, 24, 24, 24]), 
(519, [8, 8, 8, 8, 8, 4, 8, 4, 4, 4, 4, 4, 8, 32, 20]), 
(45700, [12, 12, 12, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 4, 8])]

It seems we have a winner at 0x14de5f8 with 47500 debugging symbols! I can confirm thanks to my hexeditor.
Next step would be to use those offsets to retrieve the symbol table containing functions pointers: scanning the file, applying a guessed size of a pattern from the table, and comparing the relative offsets between successive supposed symbol's address with what I got from scan_from_symbols().
Let's do a simple research with python:

def search_symbol_table(img, offset_list):
    endian='>' # handle endianness for DWORD
    pat_size=16 # guessed size of a pattern of the symbol table
    
    # truncate the list to avoid possible dummy symbols
    # at the beginning and end 
    offset_list = offset_list[50:-50]
    
    i, start_addr = 0, 0
    # scan with word alignment at 0, 1, 2, ... pat_size bytes
    for o in range(0, pat_size):
        for p in range(0, len(img)-(2*pat_size), pat_size):
            sym_addr_1 = struct.unpack(endian+'I', \
                          img[p+o:p+o+4])[0]
            sym_addr_2 = struct.unpack(endian+'I', \
                          img[p+o+pat_size:p+o+pat_size+4])[0]
            # check against the symbols offset list
            if (sym_addr_2 - sym_addr_1).__abs__() == \
             offset_list[i]:
                i += 1
                if i == 1:
                    start_addr = p+o
                elif i == len(offset_list):
                    print '[+] found symbol table starting'
                          ' just before address: 0x%x' \
                          % start_addr
                    i = 0
            else:
                i = 0

And let's try it on our fridge's firmware image with the 45700 debugging symbols:

>>> s = scan_for_symbols(img)
[+] possible symbols starting at address: 0x10f90e0
[+] possible symbols starting at address: 0x138fdbc
[+] possible symbols starting at address: 0x14de5f8
>>> search_symbol_table(img, s[2])
[+] found symbol table starting just before address: 0x158d958

After checking with the hexeditor, the match is confirmed: Bingo! Actually, the table is exactly starting at address 0x158d6b8. This is because we truncated the beginning of the offset list a bit too much.
The format that seem to be used by VxWorks 5 is:
struct symtable_pattern{
dword symbol_addr;
dword code_addr;
dword symbol_type; // 0x500: function, 0x700: data, 0x900: ?
dword null;
};
I was lucky to find a match directly. It could have happened that the offset list deduced from the symbols strings would need to be reversed. It could also happen (who knows how obscure debuggers work...) that the symbols in the string list are not sorted in the same way than the patterns in the symbol table. For those reasons, we can have another approach to try to find the symbol table: a bit more statistical...
So, one can scan the firmware image, extracting 2 consecutive supposed symbol addresses (guessing again the size of a pattern of the table), and check if the difference between those 2 addresses is less than the maximum memory space taken by the full symbols string. If there is a match a number of time equal to the number of symbols found in the string area: then we have certainly found the symbol table... Or a large area of padding :(
However, let's test it:

def search_symbol_table_stat(img, sym_number, max_offset):
    endian='>'
    pat_size = 16
    i, start_addr = 0, 0
    # scan with word alignment at 0, 1, ..., pat_size bytes
    for o in range(0, pat_size):
        for p in range(0, len(img)-(2*pat_size), pat_size):
            sym_addr_1 = struct.unpack(endian+'I', \
                          img[p+o:p+o+4])[0]
            sym_addr_2 = struct.unpack(endian+'I', \
                          img[p+o+pat_size:p+o+pat_size+4])[0]
            if (sym_addr_2 - sym_addr_1).__abs__() < \
             max_offset:
                i += 1
                if i == 1:
                    start_addr = p+o
                elif i == sym_number:
                    print '[+] possible symbol table'\
                          ' starting at address: 0x%x' \
                          % start_addr
            else:
                i = 0

And let's try it again on fridge's firmware:

>>> search_symbol_table_stat(img, len(s[2]), sum(s[2]))
[+] possible symbol table starting at address: 0x158d580
[+] possible symbol table starting at address: 0x1652d80
[+] possible symbol table starting at address: 0x158d591
[...]
[+] possible symbol table starting at address: 0x1652d8f

Suprisingly, the result is not so bad... I find the area starting at 0x158d580 thanks to the statistic search, that is close to the exact start address 0x158d6b8. The area starting at 0x1652d80 is actually padding bytes.

Knowing the symbol table and the list of symbols strings, I have now to retrieve the loading address of the firmware. The idea is to get one of the extrem address (lowest or highest) from the symbols strings, and the equivalent in the symbol table. Based on the last example with 45700 symbols, I get the loading address: 0x10000. Look's good!
Let it try now on the firmware of the helium balloon:

>>> len(img)
9128872
>>> s = scan_for_symbols(img)
[+] possible symbols starting at address: 0x67e358
>>> len(s[0])
15874
>>> search_symbol_table(img, s[0])
>>> search_symbol_table_stat(img, len(s[0]), sum(s[0]))
[+] possible symbol table starting at address: 0x71cd40
[+] possible symbol table starting at address: 0x75ad70
[+] possible symbol table starting at address: 0x71cd41
[+] possible symbol table starting at address: 0x75ad71
[...]
[+] possible symbol table starting at address: 0x75ad7f
>>> get_lowest_addr_from_symtable(0x71cd40, 0x75ab8c)
1617425240
>>> hex(_ - 0x67e358)
'0x60001000'

So, this image has almost 16000 symbols. Checking with the hexeditor, I can confirm the starting address of the symbol table: 0x71cd40. I note at the same time the end of the table: 0x75ab8c. And comparing lowest addresses between symbols strings and symbol table, I deduce the loading address: 0x60001000. So nice...
From this point, it is possible to extract the list of symbols with corresponding code start address and type. This is left for the reader, and it ends up this 2nd session on VxWorks image analysis.
Next session, blind_key will use the loading address and debugging symbols retrieved here to resolve cross-reference of the image executable in IDA. This will help us in having a logical view of the binary, instead of the austere hexa view we had up to now.

No comments:

Post a Comment