Decompilation of .NET bytecode Stephen Horne Trinity Hall

Download Report

Transcript Decompilation of .NET bytecode Stephen Horne Trinity Hall

Computer Science Part II Project Progress Report
Decompilation of .NET bytecode
Stephen Horne
Trinity Hall
http://hal.trinhall.cam.ac.uk/~srh38/project
10th February 2004
The .NET framework
.NET and the Common Language Runtime
• Microsoft’s answer to Java
• CLR is .NET equivalent of the JVM
C#
C# compiler
J#
J# compiler
Managed
C++
Managed C++
compiler
VB .NET
VB .NET
compiler
• Lots of useful metadata provided in assemblies
CIL and
Metadata
Common
Language
Runtime
What about reversing the compilation process?
• Sometimes we want to recover source from a binary
– Language translation
– Lost source recovery
– Checking for malicious code
• Obvious legal and ethical ramifications
Slide 2
Decompilation of .NET bytecode
Structure of a decompiler
Front
end
Executable
• Reads in bytecode
• Divides into basic blocks
Low-level
intermediate code
Unstructured
control-flow graph
• Data-flow analysis
UDM
Decompiler
Structured control-flow
graph
Source
• Control-flow analysis
High-level
intermediate code
Back
end
Slide 3
• Code generation
Decompilation of .NET bytecode
Example decompilation
CIL bytecode
Control-flow graph
IL_0000:
IL_0001:
IL_0002:
IL_0003:
IL_0004:
ldc.i4.0
stloc.0
ldc.i4.0
stloc.1
br.s
IL_0006:
IL_0007:
IL_0008:
IL_0009:
IL_000a:
ldc.i4.3
ldloc.1
mul
ldarg.0
bge.s
IL_000c:
IL_000d:
IL_000e:
IL_000f:
IL_0010:
ldloc.0
ldc.i4.1
sub
stloc.0
br.s
5
IL_0012:
IL_0013:
IL_0014:
IL_0015:
ldloc.0
ldc.i4.1
add
stloc.0
6
IL_0016:
IL_0017:
IL_001c:
IL_001d:
ldloc.0
call
ldloc.1
blt.s
7
IL_001f:
IL_0020:
IL_0021:
IL_0022:
ldloc.1
ldc.i4.1
add
stloc.1
2
IL_0023:
IL_0024:
IL_0025:
ldloc.1
ldarg.0
blt.s
IL_0006
8
IL_0027:
IL_0028:
IL_0029:
ldloc.0
stloc.2
br.s
IL_002b
9
IL_002b:
IL_002c:
ldloc.2
ret
1
3
4
Process
Entry
• Divide code into basic blocks and create CFG
IL_0023
1
• Data-flow analysis
IL_0012
– Register copy propogation
2
8
3
IL_0016
4
• Control-flow analysis
– Divide graph into intervals
5
9
– Loops induced by back-edges within intervals
– Nesting of intervals  nesting of loops
Math::Abs(int32)
6
Exit
– Conditionals found by common follow nodes
– Order of nodes  nesting of conditionals
IL_0006
7
• Generate code from structured CFG
Slide 4
Decompilation of .NET bytecode
Current status
Original
public static int ControlExample(int x) {
int y = 0;
for(int i = 0; i < x; i++) {
do {
if(3 * i < x)
y--;
else
y++;
} while(Math.Abs(y) < i);
}
return y;
Features implemented:
• Analysis for basic conditional and looping structures
• Control flow graph generation
• C# code generation
• Almost half the CIL instruction set
• Decompiles very basic applications
}
Remaining tasks (lots!):
Decompiled
public static Int32 ControlExample(Int32 x) {
Int32 local0;
Int32 local1;
Int32 local2;
local0 = 0;
local1 = 0;
while (local1 < x) {
do {
if (((3 * local1) < x)) {
local0 = (local0 - 1);
} else {
local0 = (local0 + 1);
}
} while (Math.Abs(local0) < local1);
local1 = (local1 + 1);
}
local2 = local0;
return local2;
}
• Local variable names
• Basic language features (arrays, switching, breaks etc.)
• Advanced features (custom indexers, operator overloading, properties)
• Object oriented features
Extensions:
• Decompilation for other stack-based architectures (e.g. Java)
• Code generation for other languages (e.g VB .NET)
• Graphical user interface
Slide 5
Decompilation of .NET bytecode