Following the 1st post on VxWorks, let's check some more internals on binary system images. I will focus on VxWorks version 5, which is based on a proprietary binary format, whereas VxWorks version 6 makes use of the ELF executable and linkable format. VxWorks 5 images are monolithic: everything -system and applications- is often built and linked into a single executable file. And luckily (or by mistake?) most of the VxWorks 5 images I have seen include debugging symbols! Manufacturers seem forgetting to strip their firmwares...
My goal for now is to retrieve the symbol table containing all functions' code start address, and corresponding symbols' name and type. Firstly, I will try to find suspicious areas were functions or variables names are collapsed, separated with padding bytes.
For example:
[...] 00137E80 74 65 6C 6E 65 74 64 54 61 73 6B 44 65 6C 65 74 telnetdTaskDelet 00137E90 65 00 74 65 6C 6E 65 74 64 53 65 73 73 69 6F 6E e.telnetdSession 00137EA0 44 69 73 63 6F 6E 6E 65 63 74 46 72 6F 6D 53 68 DisconnectFromSh 00137EB0 65 6C 6C 4A 6F 62 00 74 65 6C 6E 65 74 64 53 65 ellJob.telnetdSe 00137EC0 73 73 69 6F 6E 44 69 73 63 6F 6E 6E 65 63 74 46 ssionDisconnectF 00137ED0 72 6F 6D 53 68 65 6C 6C 00 74 65 6C 6E 65 74 64 romShell.telnetd 00137EE0 53 65 73 73 69 6F 6E 44 69 73 63 6F 6E 6E 65 63 SessionDisconnec 00137EF0 74 46 72 6F 6D 52 65 6D 6F 74 65 00 74 65 6C 6E tFromRemote.teln 00137F00 65 74 64 50 61 72 73 65 72 43 6F 6E 74 72 6F 6C etdParserControl [...]
After some more checks, I have a good view on the way those names are appended all together: it is only printable characters separated by 1 or few null bytes. This will be easy to automate a search. Furthermore, it can help in retrieving the OS loading address used by the bootloader: addresses of those symbols in the static file are not those at the execution time; however, they are all moved by a fixed offset: ...the loading address.
An idea is to make use of the difference between all the symbols strings address instead of using directly their static addresses in the file. May these relative offsets help in retrieving the symbol table?
Let's open python again, find sequence of symbols and return the list of offsets between them:
def scan_for_symbols(img): # printable chars: 0x21 to 0x7E # scan the file for printable chars, # spaced with \x00 repeated 1 to maxp times # constants: # char for padding between symbols pad_char = '\x00' # maximum padding bytes admitted between symbols strings maxp = 8 # limit to determine possible sequence of symbols names min_pattern = 0x100 # initialized: num_pattern = 0 # count possible symbols during scan start_addr = 0 # store the possible sequence start address addr_cur_sym, addr_prev_sym = 0, 0 # start address of possible symbols p, pad, acc = 0, 0, 0 # address pointer, and pad and char counters addr_diff = [] # list with address offsets between symbols ret = [] # list with addr_diff lists found while p < len(img): if 0x21 < ord(img[p]) < 0x7E: if start_addr == 0: start_addr = p if pad > 0: addr_cur_sym = p if addr_prev_sym > 0: addr_diff.append(addr_cur_sym - \ addr_prev_sym) addr_prev_sym = addr_cur_sym pad = 0 acc += 1 elif img[p] == pad_char and pad <= maxp: if acc > 0: num_pattern += 1 acc = 0 pad += 1 else: if num_pattern > min_pattern: print '[+] possible symbols starting' \ ' at address: 0x%x' \ % start_addr ret.append(addr_diff) addr_diff = [] addr_cur_sym, addr_prev_sym = 0, 0 pad, acc = 0, 0 num_pattern = 0 start_addr = 0 p += 1 return ret
And as a result in the python interpreter (with the firmware image of the IP-enable fridge):
>>> s = scan_for_symbols(img) [+] possible symbols starting at address: 0x10f90e0 [+] possible symbols starting at address: 0x138fdbc [+] possible symbols starting at address: 0x14de5f8 >>> [(len(l), l[:15]) for l in s] [(481, [24, 24, 36, 32, 24, 24, 28, 28, 24, 28, 24, 20, 24, 24, 24]), (519, [8, 8, 8, 8, 8, 4, 8, 4, 4, 4, 4, 4, 8, 32, 20]), (45700, [12, 12, 12, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 4, 8])]
It seems we have a winner at 0x14de5f8 with 47500 debugging symbols! I can confirm thanks to my hexeditor.
Next step would be to use those offsets to retrieve the symbol table containing functions pointers: scanning the file, applying a guessed size of a pattern from the table, and comparing the relative offsets between successive supposed symbol's address with what I got from scan_from_symbols().
Let's do a simple research with python:
def search_symbol_table(img, offset_list): endian='>' # handle endianness for DWORD pat_size=16 # guessed size of a pattern of the symbol table # truncate the list to avoid possible dummy symbols # at the beginning and end offset_list = offset_list[50:-50] i, start_addr = 0, 0 # scan with word alignment at 0, 1, 2, ... pat_size bytes for o in range(0, pat_size): for p in range(0, len(img)-(2*pat_size), pat_size): sym_addr_1 = struct.unpack(endian+'I', \ img[p+o:p+o+4])[0] sym_addr_2 = struct.unpack(endian+'I', \ img[p+o+pat_size:p+o+pat_size+4])[0] # check against the symbols offset list if (sym_addr_2 - sym_addr_1).__abs__() == \ offset_list[i]: i += 1 if i == 1: start_addr = p+o elif i == len(offset_list): print '[+] found symbol table starting' ' just before address: 0x%x' \ % start_addr i = 0 else: i = 0
And let's try it on our fridge's firmware image with the 45700 debugging symbols:
>>> s = scan_for_symbols(img) [+] possible symbols starting at address: 0x10f90e0 [+] possible symbols starting at address: 0x138fdbc [+] possible symbols starting at address: 0x14de5f8 >>> search_symbol_table(img, s[2]) [+] found symbol table starting just before address: 0x158d958
After checking with the hexeditor, the match is confirmed: Bingo! Actually, the table is exactly starting at address 0x158d6b8. This is because we truncated the beginning of the offset list a bit too much.
The format that seem to be used by VxWorks 5 is:
struct symtable_pattern{
dword symbol_addr;
dword code_addr;
dword symbol_type; // 0x500: function, 0x700: data, 0x900: ?
dword null;
};
I was lucky to find a match directly. It could have happened that the offset list deduced from the symbols strings would need to be reversed. It could also happen (who knows how obscure debuggers work...) that the symbols in the string list are not sorted in the same way than the patterns in the symbol table. For those reasons, we can have another approach to try to find the symbol table: a bit more statistical...
So, one can scan the firmware image, extracting 2 consecutive supposed symbol addresses (guessing again the size of a pattern of the table), and check if the difference between those 2 addresses is less than the maximum memory space taken by the full symbols string. If there is a match a number of time equal to the number of symbols found in the string area: then we have certainly found the symbol table... Or a large area of padding :(
However, let's test it:
def search_symbol_table_stat(img, sym_number, max_offset): endian='>' pat_size = 16 i, start_addr = 0, 0 # scan with word alignment at 0, 1, ..., pat_size bytes for o in range(0, pat_size): for p in range(0, len(img)-(2*pat_size), pat_size): sym_addr_1 = struct.unpack(endian+'I', \ img[p+o:p+o+4])[0] sym_addr_2 = struct.unpack(endian+'I', \ img[p+o+pat_size:p+o+pat_size+4])[0] if (sym_addr_2 - sym_addr_1).__abs__() < \ max_offset: i += 1 if i == 1: start_addr = p+o elif i == sym_number: print '[+] possible symbol table'\ ' starting at address: 0x%x' \ % start_addr else: i = 0
And let's try it again on fridge's firmware:
>>> search_symbol_table_stat(img, len(s[2]), sum(s[2])) [+] possible symbol table starting at address: 0x158d580 [+] possible symbol table starting at address: 0x1652d80 [+] possible symbol table starting at address: 0x158d591 [...] [+] possible symbol table starting at address: 0x1652d8f
Suprisingly, the result is not so bad... I find the area starting at 0x158d580 thanks to the statistic search, that is close to the exact start address 0x158d6b8. The area starting at 0x1652d80 is actually padding bytes.
Knowing the symbol table and the list of symbols strings, I have now to retrieve the loading address of the firmware. The idea is to get one of the extrem address (lowest or highest) from the symbols strings, and the equivalent in the symbol table. Based on the last example with 45700 symbols, I get the loading address: 0x10000. Look's good!
Let it try now on the firmware of the helium balloon:
>>> len(img) 9128872 >>> s = scan_for_symbols(img) [+] possible symbols starting at address: 0x67e358 >>> len(s[0]) 15874 >>> search_symbol_table(img, s[0]) >>> search_symbol_table_stat(img, len(s[0]), sum(s[0])) [+] possible symbol table starting at address: 0x71cd40 [+] possible symbol table starting at address: 0x75ad70 [+] possible symbol table starting at address: 0x71cd41 [+] possible symbol table starting at address: 0x75ad71 [...] [+] possible symbol table starting at address: 0x75ad7f >>> get_lowest_addr_from_symtable(0x71cd40, 0x75ab8c) 1617425240 >>> hex(_ - 0x67e358) '0x60001000'
So, this image has almost 16000 symbols. Checking with the hexeditor, I can confirm the starting address of the symbol table: 0x71cd40. I note at the same time the end of the table: 0x75ab8c. And comparing lowest addresses between symbols strings and symbol table, I deduce the loading address: 0x60001000. So nice...
From this point, it is possible to extract the list of symbols with corresponding code start address and type. This is left for the reader, and it ends up this 2nd session on VxWorks image analysis.
Next session, blind_key will use the loading address and debugging symbols retrieved here to resolve cross-reference of the image executable in IDA. This will help us in having a logical view of the binary, instead of the austere hexa view we had up to now.