Assembly Language - National Institute of Technology

Download Report

Transcript Assembly Language - National Institute of Technology

C Programming and
Assembly Language
Janakiraman V – [email protected]
NITK Surathkal
2nd August 2014
Motivation
Do you know how all this is implemented in
assembly?
Agenda
•Brief introduction to the 8086 processor architecture
•Describe commonly used assembly instructions
•Use of stack and related instructions
•Translate high level function calls into low level assembly language
•Familiarize the calling conventions
•Explain how variables are passed and accessed
8086 Architecture
•ALU – Arithmetic and Logical unit –
The heart of the processor
•Control Unit – Decodes instructions,
Controls the execution flow
•Registers – Implicit memory
locations within the processor
•Registers – Serve as arguments to
most operations
•Flags – All ALU operations will set
particular bits after execution
Registers
•EAX – Stores integer return values
•ECX – Stores the counters for loops and also stores “THIS” pointer
•EIP –Instruction pointer. Stores the address of the next instruction
to be executed
•ESP – The Stack pointer. Implicitly changed during Call/ Ret
instructions.
•EBP – Base pointer. Used to access local variables and function
parameters.
Registers
Contd…
•EBX – A general purpose register
•ESI– The source index register for string instructions
•EDI - The destination index registers for string instructions
•EFL – Flag register. Stores the flag bits of various flags like Carry,
Zero, etc.
•Segment registers point to a segment of memory. EDS, ESS, EES,
ECS
•EDX – Stores high 32 bits of 64 bit values
Instruction Set
•Data transfer
•Arithmetic and logical
•Stack Operations
•Branching and Looping
•Function calls
•String Instructions
•Prefix to instructions
Data transfer instructions
MOV Destination, Source - Format
» Data transfer is always from RIGHT to LEFT.
» Source Register is unaffected.
LEA – Load effective address.
» Loads the offset Address of the specified variable into the
destination.
» Equivalent of int y = &x;
Arithmetic and Logical instructions
•Operation destination, source – Format
»ADD AX, BX
»SUB AX, [BX]
»OR AX, [BX+4]
»XOR AX, AX – Fastest way to clear
registers
Exercise 1
int x=4, y=6, a=3, b=2;
__asm
Write an assembly program {to evaluate the
following expression. (All variables areMOV
32 bit integers)
EAX, x
» EAX = x*y + a – b
» EBX =( x^y) | ( a&b)
MUL y
ADD EAX, a
SUB EAX, b
MOV EBX, x
XOR EBX, y
MOV ECX, a
AND ECX, b
OR EBX, ECX
}
Branching and Looping
•JMP Addr – Loads EIP with Addr
•Conditional Jumps
» Transfers control based on a condition
» Based on state of one or more flags
» ALU operation sets flags
Exercise
2
Multiplication
String
length of
byarepeated
constantaddition.
string
Write an assembly program to evaluate the
int x =9, y=10,
char*
z=0;
pChar = “Test data";
expression “ z = x * y ”using
» Repeated addition
» MUL instruction
__asm
MOV EDI, pChar
{
XOR ECX, ECX
COMPARE:
XOR
CMP
EAX,
[EDI],
EAX 0
Write an assembly program to calculate
MOV
JNZ
EBX,
INCREASE
y the string
length of a constant string MULT: ADD
JMP
EAX,
DONE
x
INCREASE:
DEC
INC
EBX
ECX
JNZ
INC
MULT
EDI
MOV
JMP
z, COMPARE
EAX
}
DONE:
MOV len, ECX
Stack Operations
PUSH: PUSH EAX
» ESP decreases by 4/ 2/ 1
» Data is moved on to top of stack
» Used extensively to pass parameters to functions.
POP: POP EAX
» ESP increases 4/ 2/ 1
» Data is copied to the destination
» Compliment of PUSH
Exercise 3
Function
to swap
variables
Swapto
two
integers.
Write an assembly program
swap
two integers
void swap(int* pX, int* pY)
x and y.
int x=4, y=5;
{
__asm
__asm
Write a C program to swap
two numbers using a
{{
function Swap(int* pX, int* pY). Implement the
EAX, pX
PUSH MOV
x
Swap function directly in assembly language
MOV EBX, pY
PUSH y
PUSH DWORD PTR [EAX]
POP x
PUSH DWORD PTR [EBX]
POP y
POP DWROD PTR [EAX]
}
}
}
POP DWORD PTR [EBX]
Function calls
CALL – CALL ADDR
» Used for function calls.
» Implicitly pushes the EIP on to the stack.
» Reads the address specified (ADDR) and loads EIP with ADDR.
RET – RET n
» Used to return to the calling function.
» Implicitly pops the DWORD on the TOS into EIP.
» ‘n’ Specifies the number to be added to ESP after returning. Used
for stack clean up.
Compile the C program!!
int g_iVar = 5;
int Fn(int x, int y)
void main()
{
{
int z=0;
int z=0;
z = x+ y
z = Fn(2,4);
return z;
g_iVar = z;
}
}
C and assembly language - FAQ
•How are function calls in ‘C’ translated into
assembly?
•How are parameters passed to the function?
•What does it mean to say local variables are
stored on stack? Scope of local variables!
•How are global variables accessed?
C and Assembly language
Contd….
•Cannot pass many parameters in registers
•Scope – Desirable feature
•Stack – Ideal to store local variables
•ESP cannot be used to access the local variables
•EBP is used to access them!!!
Parameters, Local and Global variables
•Before a function is called parameters are pushed
onto stack
•Parameters are accessed by [EBP +n]
•Local variables are accessed by [EBP –n]
•Integers are returned in EAX
•Global variables are accessed by direct address
values
Compile the C program
Contd…
void main()
int Fn(int x, int y)
{
{
int z=0;
int z=0;
MOV z, 0
MOV z, 0
z = Fn(2,4);
z = x+ y;
PUSH 0x00000004
MOV EAX, x
PUSH 0x00000002
ADD EAX, y
CALL Fn
MOV z, EAX
MOV z, EAX
return z;
g_iVal = z;
MOV [g_iVar], EAX
}
RET
}
Compile the C Program
CODE SEGMENT – Function – main()
Contd….
STACK SEGMENT
.
int z = 0;
C100
MOV [EBP-4], 0
ESP
z = Fn(2,4);
ESP
C101
PUSH 0x00000004
C102
PUSH 0x00000002
C103
Call
C104
MOV [EBP-4], EAX
C200
g_iVar = z;
C105
.
.
MOV [g_iVar], EAX
ESP
C104
0x00000002
0x00000004
ESP
EBP
0x00000000
local var Z
Compile the C Program
CODE SEGMENT – Function – Fn()
C200
MOV EBP, ESP
C201
SUB ESP, 0x40
Contd….
STACK SEGMENT
ESP
Local
variable
space
int z=0;
C202
MOV [EBP-4], 0
Z
z = x+ y
C203
MOV EAX, [EBP+4]
C204
ADD EAX, [EBP+8]
C205
MOV [EBP-4], EAX
ESP
EBP
return z;
C206
C206
0x00000000
0x00000006
C104
0x00000002
0x00000004
ADD ESP, 0x40
RET
EBP
0x00000000
local var Z
CODE SEGMENT – Function – main()
STACK SEGMENT
.
int z = 0;
C100
MOV [EBP-4], 0
z = Fn(2,4);
Stack corruption!!!!!
You computer will now
EBP
C102You
PUSH 0x00000002
have REBOOT!!!!!
accessed the stack
of
C103 Call C200
the function “Fn()” ESP
C101
C104
PUSH 0x00000004
MOV [EBP-4], EAX
g_iVar = z;
C105
MOV [g_iVar], EAX
C106
RET
0x00000006
C104
0x00000002
0x00000004
0x00000000
Local var Z
Compile the C Program
CODE SEGMENT – Function – main()
Contd….
STACK SEGMENT
.
int z = 0;
C100
MOV [EBP-4], 0
ESP
z = Fn(2,4);
ESP
C101
PUSH 0x00000004
C102
PUSH 0x00000002
C103
Call
C104
MOV [EBP-4], EAX
C200
g_iVar = z;
C105
.
.
MOV [g_iVar], EAX
ESP
C104
0x00000002
0x00000004
ESP
EBP
0x00000000
local var Z
Compile the C Program
CODE SEGMENT – Function – Fn()
C200
PUSH EBP
C202
MOV EBP, ESP
C203
SUB ESP, 0x40
int z=0;
C204
MOV [EBP-4], 0
Contd….
STACK SEGMENT
ESP
Z
ESP
z = x+ y
C205
MOV EAX, [EBP+8]
C206
ADD EAX, [EBP+12]
C207
MOV [EBP-4], EAX
ESP
ESP
EBP
ADD ESP, 0x40
C209
POP EBP
C20A
RET 8
0x00000000
0x00000006
EBP - main()
C104
0x00000002
return z;
C208
Local
variable
space
0x00000004
EBP
0x00000000
local var Z
CODE SEGMENT – Function – main()
STACK SEGMENT
.
int z = 0;
C100
MOV [EBP-4], 0
z = Fn(2,4);
C101
PUSH 0x00000004
C102
PUSH 0x00000002
ESP
C103
Call
ESP
C104
MOV [EBP-4], EAX
C200
0x0000006
MOV [g_iVar], EAX
C106
Epilogue
0x00000002
0x00000004
g_iVar = z;
C105
C104
ESP
EBP
0x00000006
0x00000000
Local var Z
Function calls in C - Summary
Function call gets translated to CALL addr
Prologue
» Store the current EBP on stack
» Set up the stack - Initialize the EBP
» Allocate space for local variables.
Execute the function accordingly
Epilogue
» Set the ESP to its original value
» Set the EBP back to its original value
Stack clean up
•When?
» Happens after returning from a function
•Why?
» Undo the effect of pushing parameters
•How?
» RET N or ADD ESP, N
C Program
void main()
Assembly
Prologue
MOV [EBP-4], 0
{
int z = 0;
z = Function(2, 4);
}
PUSH 0x00000004
PUSH 0x00000002
CALL Function
MOV [EBP-4], EAX
Epilogue
Contd……
/*Contd……*/
Contd…
C Program
int Function(int a, int b)
Assembly
Contd…
PUSH EBP
MOV EBP, ESP --------- Prologue
{
SUB ESP, N
int c=0;
c = a + b;
MOV [EBP-4], 0
MOV EAX, [EBP + 8] --- Body
ADD EAX, [EBP+12]
return c;
}
MOV [EBP-4], EAX
ADD ESP, N
POP EBP ----------------- Epilogue
RET 8
Calling conventions
__cdecl
» Default calling convention of C functions
» Needed for variable argument list
» Caller cleans the stack - ADD ESP, N instruction
__stdcall
» Faster than the __cdecl call.
» Callee cleans the stack - RET N instruction
Contd……
Back to Exercise 3
Function to swap variables
Write a C program to swap
two numbers using a
swap(int*
pX, int*
void
swap(int* pX,
int* pY)
void
swap(int*
pX,
int* pY)
pY)
function
Swap(int*
pX, int*void
pY).
Implement
the
{
{
{
Swap function directly in assembly
language
Function to swap variables
__asm
__asm
__asm
{
{
{
MOV DWORD PTR EAX, [EBP+4]
PUSH DWORD PTR [pX]
PUSH DWORD PTR [[EBP+4]]
PUSH DWORD PTR [[EBP+8]]
POP DWROD PTR [[EBP+4]]
POP DWORD PTR [[EBP+8]]
}
}
}
MOV DWORD PTR EBX, [EBP+8]
PUSH DWORD PTR [pY]
PUSH DWORD PTR [EAX]
POP DWROD PTR [pX]
PUSH DWORD PTR [EBX]
POP DWORD PTR [pY]
POP DWROD PTR [EAX]
POP DWORD PTR [EBX]
}
}
}
Double indirection is not a valid instruction
What about C++?
struct stTest
class clsTest
{
{
int x;
int x;
int y;
int y;
public:
};
void FnTest()
void FnTest(stTest* pSt)
{
x = 0;
y=1;
{
pSt->x = 0;
pSt->y = 1;
}
}
};
void main()
void main()
{
{
clsTest obj;
obj.FnTest();
stTest obj;
FnTest(&obj);
}
}
Calling convention
Contd…
this call – The C++ calling convention
» Behaves like the __cdecl call in most ways
» This pointer is passed in the ECX register
» Stores the this pointer in [EBP-4] location on stack
String Instructions
•Uses ESI, EDI as its operands.
•After the operation ESI and EDI are automatically
Incremented/ Decremented depending on the
direction flag.
•Usually used with the Prefix instructions.
•Very efficient for standard looping instructions.
Prefix to instructions
REP – REP MOVSB
» Used to repeat instructions unconditionally
» Implicitly decrements ECX by 1 after each execution
» Stops once ECX = 0
REPNE/ REPE – REPE SCASB
» Used to repeat instructions conditionally
» Implicitly decrements ECX by 1 after each execution
» Stops once ECX = 0 or ZERO flag is set/ reset
Optimized C functions
•Memcpy
•Strlen
•Memset