Interactive deobfuscation

Transcript Interactive deobfuscation

Interactive deobfuscation A thrift shop for static deobfuscation

whoami

• • • Security researcher Break stuff, reverse, make them better and break again Part of nullsec non profit group

How it all started

blame this person => • • Presumably a simple crackme – Eventually discovered as wb aes I wanted to solve it statically – Since running things is cheating – Goal was to solve in lt a month • A race I didn’t manage to fulfill when working statically

• • Name is md5’ed Serial is transformed / permutated using unknown function

• • •

Challenge archeology

Overall the crackme was deployed into 2 main parts Deobfuscation – Opaque predicates, lookup tables, value tables and “spaghetti” code Cryptanalysis – The original cipher was whitebox’ed

Deobfuscation

• • •

Deobfuscation - Layer0

Found some jmps, decided to map them all – find_lookuptables(“Mov , dword ptr [addr*4]”) – Add xrefs, define locs • IDA can’t map them all into graph views (due to size, more RAM == bigger graph) After looking a bit there seem to be some logic and different operations inside them However they all lead to the same path eventually

Deobfuscation Layer1

• • Removal of jmps and basic block identification – All the obfuscation was done in a matter to effect the bb itself, after a jmp to another table occurred everything was restored Follow_jmps_by_addr(addr) to find bb boundaries – Follow jcc until a jmp / push + ret sequence is found – Compress it, remove jccs and make one BB – In case xrefs, patch them together

Deobfuscation – Layer2

• Opaque predicates • Ops which used to make the bb bigger – Simple rule – operations are per bb and do not exceed it – Wrote a simple emulator to emulate bb and optimize them to simple instructions • 1 exception – do not touch lookup tables values – More on this later

Deobfuscation – Layer 3

• • • • • Tables, and lots of them – Apart from the jmptables which lead the way Tables are used as part of the cipher itself Key is dismantled inside them (more on this later) Each table has a different role and some are doubled for obfuscation FindTables to the rescue

Deobfuscation – Layer3

• • • FindTables basically taints memory and looks for read of 16b tables Once it finds one it defines an array of 0xFF to that addr All value tables are mapped using this way, their usage however varies

Deobfuscation – Layer 4

• Once we have all the code cleaned we get • • • several consecutive lookup tables Loops are unrolled and become normal repetitive ops (per round and state) All deobfuscated code was written into a new section called “deobf” to make code reading easier It is now time to move on to the cryptanalysis stage

Cryptanal

• • • The idea to automate every process is infeasible and too much time consuming I decided to split the work into two main stages: – Operation identification – Key extraction Both are used interactively – Thus the name interactive deobfuscation

Cryptanal archeology

• • • Discovered BGE attacks from the academia – Chow , Xiao sysk’s phrack article Eventually said FUCK YOU ALL gonna do it myself w/o cryptic math – Lack of algebra lessons and focus

• •

Cryptanal – Layer0

Actual wb code to encrypt a text Loops 9 times which made me quite frustrated – Before discovering it was wb’ed – After counting the loops by hand I thought it might be AES – But where’s the key ? • LOLWTF ? md5(user) == wbaes.dec(serial,user_as_key) – No, key must be *embedded* • LOLWUT? md5(user) == wbaes.d/enc(serial,key) ??

– Output isn’t ascii so it could be both enc/dec

Cryptanal – Rijndael on a toe

• • • Several simple operations – AddRoundKey, SubBytes , ShiftRows,MixColumns Some operations are linear and could be replaced with their previous op The key to understand the attack is to sniff the first round and extract the key – In the future I found Eloi made my life harder

rijndael

evolves into =>

whitebox(rijndael)

• • 1 st transformation: – ShiftRows is linear, and thus could be replaced in op position with AddRoundKey – SubBytes and ShiftRows could be replaced in op position, as SubBytes does the same op Let “Linear” aka lin be – lin(x) ^ lin(y) == lin(x ^ y)

• • • 2 nd transformation – It is possible to tranform and “compress” several ops into one By using XORtables and T/yboxes – T/yibox • Combine AddRoundKey and SubBytes into one operation (lookup table) to emit 1 byte • SubBytes(x ^ k[i]) XORtable – Transform MixColumns into a series of lookuptables, particulary these tables are created by XORing one input byte at a time through the MixColumns vector

• 3 rd – transformation Append external encoding into the keys and lookuptables – Replace table values with random ones upon stage – 41 => 32, 21 => 56, 12 => 4 – Let G & F be encoding values – G() o AES() o F() • Such that G & F cancel each other out eventually – The external encoding is what makes the whitebox variant “attack resistant”

Attaq

Attaq 101

• • Chow stated that his implementation doesn’t leak any information – In reality the XORtables and T/ytables still leaks one nibble each time – Not very helpful but still something Since the external encoding cancel each out it might be worth to understand them – Hint hint

Attaq!

• • • If we look at input encoding and output encoding we know that they both cancel each other out Thus if we manage to find the values of the encoding we’d only have a “naked” implementation of wbaes And then just sniff the first round key and extract the key

Cryptbox

• • • Let’s try to look at MixColumns in the Ty/itables transformations In a general idea it transforms 32b to 32b values Let P be input encoding and Q output encoding

• • • Now let’s try to give an approximation about the encoding values Billet suggests to zero out two bits out of the 4 and build up a new lookup table and perform the transformation Once we have that we construct a new lookup table to their reversed operation

whitebox^whitebox

• • • We get 256 possible bijections which can be used to build up output encoding approximations The same operation is done to the input encoding using the acquired approximation we had for Q Once have the external encoding values we can just sniff the first round key and extract the keys

FIN

• • @shiftreduce [email protected]

• • Thanks to Eloi for making this challenge greetz @ #ecl,#nullsec,inbarr,nirizr,skier_,emdel,over, Mikae, l_inc,