Optimizing Squeak - Georgia Institute of Technology

Download Report

Transcript Optimizing Squeak - Georgia Institute of Technology

Optimizing Squeak
Measuring the Speed of Squeak
MessageTally and TimeProfileBrowser
Changes to improve speed
Choose operations appropriately
Choose collections to improve speed
How collections work
Build a primitive
When building a primitive is useful/necessary
Coming soon: How the VM works and how to
build primitives...
7/17/2015
Copyright 2000, Georgia Tech
1
MessageTally
MessageTally provides a variety of tools for
analyzing your code.
time: - Returns the time in milliseconds that it
took to do some operation
MessageTally time: [100000 timesRepeat: [4 * 4]] “44”
MessageTally time: [100000 timesRepeat: [4.0 * 4.0]] “80”
MessageTally time: [100000 timesRepeat: [4 * 4.0]] “76”
MessageTally time: [100000 timesRepeat: [4.0 * 4]] “79”
7/17/2015
Copyright 2000, Georgia Tech
2
What’s eating up the time?
Part of it: Floats are slower
But the bigger part of it is the unpacking of
Object -> Class (to figure out type) -> NativeFormat
MessageTally time: [10000 timesRepeat: [432432432 * 4324324324.0]]
“9”
MessageTally time: [10000 timesRepeat: [4324324324.0 * 432432432]]
“8”
MessageTally time: [10000 timesRepeat: [4 * 4324324324.0]] “9”
MessageTally time: [10000 timesRepeat: [4324324324.0 * 4]] “7”
7/17/2015
Copyright 2000, Georgia Tech
3
A Different Way to Look at
Executing Code
At regular intervals, interrupt the executing
process with a “spy” process
Figure out which method it is that it executing
at that moment
Reports
The “tree” of which methods called which
other methods
The percentage of time spent (over the whole
tree) in each “leaf”
7/17/2015
Copyright 2000, Georgia Tech
4
MessageTally spyOn:
Does a “spy” on a process
Reports percentages of time
A “primitive” leaf is attributed to its method
(one-level up)
To trim tree, <2% is not shown, but can be
added into leaves
Example: MessageTally spyOn: [10000
timesRepeat: [3.14159 printString]]
7/17/2015
Copyright 2000, Georgia Tech
5
- 139 tallies, 2407 msec.
**Tree**
100.0 Float(Object)>>printString
74.1 Float(Number)>>printOn:
|74.1 Float>>printOn:base:
| 74.1 Float>>absPrintOn:base:
| 18.7 Character class>>digitValue:
| 16.5 primitives
| 16.5 False>>|
| 15.1 Float(Number)>>ceiling
| |12.9 Float(Number)>>floor
| 4.3 LimitedWriteStream(WriteStream)>>nextPut:
25.9 String class(SequenceableCollection class)>>streamContents:limitedTo:
17.3 LimitedWriteStream(WriteStream)>>contents
|15.1 String(SequenceableCollection)>>copyFrom:to:
| 15.1 String(Object)>>species
7.2 LimitedWriteStream class(PositionableStream class)>>on:
5.8 LimitedWriteStream(WriteStream)>>on:
3.6 LimitedWriteStream(PositionableStream)>>on:
7/17/2015
Copyright 2000, Georgia Tech
6
**Leaves**
18.7 Character class>>digitValue:
18.0 False>>|
16.5 Float>>absPrintOn:base:
15.1 String(Object)>>species
12.9 Float(Number)>>floor
4.3 LimitedWriteStream(WriteStream)>>nextPut:
2.9 SmallInteger(Magnitude)>>max:
7/17/2015
Copyright 2000, Georgia Tech
7
Where does the time go?
Notice in that example
The biggest piece of the printString execution
is the conversion of each individual digit to
character
Character class>>digitValue:
But second biggest is a logical Or.
18.0 False>>|
Where is that happening?
7/17/2015
Copyright 2000, Georgia Tech
8
TimeProfileBrowser
TimeProfileBrowser does do spying, like
MessageTally
TimeProfileBrowser onBlock: [10000
timesRepeat: [3.14159 printString]]
But it also acts as a code browser so that
you can see each piece of code!
7/17/2015
Copyright 2000, Georgia Tech
9
TimeProfileBrowser
7/17/2015
Copyright 2000, Georgia Tech
10
The Problem of Spying
Spying is inaccurate
Run the same test several times: Different results
each time!
Have to run something often enough (e.g., 1000
timesRepeat:…) to catch the right methods
Alternative, accurate counts with tallySends:
Uses the fact that Squeak’s VM is generated from a working
simulation of the VM
Actually simulates the VM to get perfectly accurate counts of
how often each method is called.
Can also be useful for debugging: It’s a trace!
7/17/2015
Copyright 2000, Georgia Tech
11
MessageTally tallySends:
[3.14159 printString]
This simulation took 0.0 seconds.
**Tree**
2 Float(Object)>>printString
1 Float(Number)>>printOn:
|1 Float>>printOn:base:
| 1 Float>>absPrintOn:base:
| |7 SmallInteger>>*
| | |7 SmallInteger(Integer)>>*
| | | 7 Float>>adaptToInteger:andSend:
| |7 LimitedWriteStream(WriteStream)>>nextPut:
| |6 Character class>>digitValue:
7/17/2015
Copyright 2000, Georgia Tech
12
Measuring Squeak’s Speed
Now that we have tools for measuring
Squeak, let’s start figuring out what’s slow
and what’s fast.
What’s fast:
Integer arithmetic is faster than floating point
(expected)
Special messages, coded into the bytecode
+ - > < at: at:put: bitOr: bitAnd: class = == new
value do: size
7/17/2015
Copyright 2000, Georgia Tech
13
The VM and Bytecodes
The VM (e.g., squeak.exe) interprets
bytecodes
Bytecodes are the machine language of a
virtual machine
The “VM” is, strictly speaking, a “VM
simulator” or “interpreter”
You can see bytecodes for a method by
doing “show bytecodes” from code pane
7/17/2015
Copyright 2000, Georgia Tech
14
Special Messages are fast
lookups
 Special messages, like +, actually map to a single
bytecode
One memory access, no lookup
Non-special messages involve passing a pointer to a memory
location where the message selector is stored
7/17/2015
Copyright 2000, Georgia Tech
15
A Word on Primitives
Primitives are the bottommost layer of the
method hierarchy
They are not defined in terms of
bytecodes, but in terms of the native code
Think of them as subroutine calls into the VM
You can make up your own primitives!
In latest versions of Squeak, they can even
be dynamically loaded
7/17/2015
Copyright 2000, Georgia Tech
16
But much of speed is Squeaklevel choices
Integers vs. floats, Squeak-code vs.
primitives are low-level VM decisions
Most of what determines fast or slow code
is at the level of your Squeak code
Choices in collections
Algorithm coding
7/17/2015
Copyright 2000, Georgia Tech
17
Brief review of Collections
Dictionary: Takes a key and a value, e.g.,
aDict at: ‘dog’ put: ‘Rufus’.
Array: Just like any language
OrderedCollection: Like a Java vector
Bag: You can add to it, and it remembers
the number of identical elements
Set: You can add to it, and it remembers
only the element
7/17/2015
Copyright 2000, Georgia Tech
18
Speed of Adding
Dictionaries are the most general indexed
collection, but they’re also slow to add to.
d := Dictionary new.
MessageTally time: [1 to: 10000 do: [:i | d at: i put:
i]]. “152”
a := Array new: 10000.
MessageTally time: [1 to: 10000 do: [:i | a at: i put:
i]]. “2”
7/17/2015
Copyright 2000, Georgia Tech
19
OrderedCollections are only slow
to grow
oc := OrderedCollection new: 10000.
MessageTally time: [1 to: 10000 do: [:i | oc add: i]]. “17”
MessageTally time: [1 to: 10000 do: [:i | oc at: i put: i]]. “11”
Once an OrderedCollection is the right
size, at:put: is within six times the speed
of an Array (2 ms from previous slide)
It’s slower because Array’s at:put: is a
primitive, while OC’s checks bounds first
7/17/2015
Copyright 2000, Georgia Tech
20
Why are OrderedCollections slow to
grow?
add: newObject
^self addLast: newObject
addLast: newObject
"Add newObject to the end of the receiver. Answer
newObject."
lastIndex = array size ifTrue: [self makeRoomAtLast].
lastIndex := lastIndex + 1.
array at: lastIndex put: newObject.
^ newObject
“makeRoomAtLast calls self grow…”
7/17/2015
Copyright 2000, Georgia Tech
21
OrderedCollections double in size
on each grow!
grow
"Become larger. Typically, a subclass has to override this if the
subclass adds instance variables."
| newArray |
newArray := Array new: self size + self growSize.
newArray replaceFrom: 1 to: array size with: array startingAt: 1.
array:= newArray
growSize
^ array size max: 2 “returns the maximum of the array size or 2”
7/17/2015
Copyright 2000, Georgia Tech
22
Is that bad?
Think about the average case of adding to
an OrderedCollection
Most of the time it won’t need to grow
Doubling in size means that you’ll not do it
very often!
You’re trading off space for time, a classic
tradeoff
7/17/2015
Copyright 2000, Georgia Tech
23
Speed of Access
MessageTally time: [1 to: 10000 do: [:i | d at:
i]]. “Dictionary: 60”
MessageTally time: [1 to: 10000 do: [:i | a at:
i]]. “Array: 2”
MessageTally time: [1 to: 10000 do: [:i | oc
at: i]]. “OrderedCollection: 9”
7/17/2015
Copyright 2000, Georgia Tech
24
SortedCollections are great but
slow
SortedCollections keep their components
sorted, but that’s a cost (note that the
below are a magnitude less than previous)
sc := SortedCollection new.
MessageTally time: [1 to: 1000 do: [:i | sc
add: i]]. “12”
MessageTally time: [1 to: 1000 do: [:i | d at:
i]]. “4”
7/17/2015
Copyright 2000, Georgia Tech
25
Adding to Non-Sequenced
Collections
o := OrderedCollection new.
MessageTally time: [1 to: 10000 do: [:i | o add: i]].
“14”
s := Set new.
MessageTally time: [1 to: 10000 do: [:i | s add: i]].
“113”
b := Bag new.
MessageTally time: [1 to: 10000 do: [:i | b add: i]].
“265”
7/17/2015
Copyright 2000, Georgia Tech
26
Let’s find an element!
MessageTally time: [10 timesRepeat: [o
detect: [:n | n >= 5000]]]. “45”
MessageTally time: [10 timesRepeat: [s
detect: [:n | n >= 5000]]]. “48”
MessageTally time: [10 timesRepeat: [b
detect: [:n | n >= 5000]]]. “256”
Bags looks unbearably slow! Why would
you ever use one?
7/17/2015
Copyright 2000, Georgia Tech
27
Iteration is the wrong way to find
an element!
MessageTally time: [100 timesRepeat: [o
includes: 5000]]. “444”
MessageTally time: [100 timesRepeat: [s
includes: 5000]]. “0”
MessageTally time: [100 timesRepeat: [b
includes: 5000]]. “0”
7/17/2015
Copyright 2000, Georgia Tech
28
How are Bags so fast?
Dictionaries!
Bags are so fast because their
implementation is actually a Dictionary (a
hashtable)!
Dictionaries are not slow!
They’re slow if you use them as arrays, and
they’re slow to iterate across
But for finding a specific element, they are
blindingly fast!
7/17/2015
Copyright 2000, Georgia Tech
29
Implementation of Bags
Bags have one instance variable, a
Dictionary named contents
add: newObject
^self add: newObject withOccurrences: 1
add: newObject withOccurrences: anInteger
"Add the element newObject to the receiver. Do so as though the
element were added anInteger number of times. Answer
newObject."
contents at: newObject put: (contents at: newObject ifAbsent: [0])
+ anInteger.
^ newObject
7/17/2015
Copyright 2000, Georgia Tech
30
Dictionaries are key to fast
lookups
Dictionaries are used heavily in Squeak
E.g., Smalltalk is a kind of Dictionary
Everything in Smalltalk (or Squeak) knows
its own hash
Hash functions need to be
Fast
Unique for unique objects
Captures how objects differ in actual practice
7/17/2015
Copyright 2000, Georgia Tech
31
Some Sample Hash Functions
“Integer” hash
^(self lastDigit bitShift: 8) + (self digitAt: 1)
“Float” hash
"Both words of the float are used; 8 bits are removed from each end
to clear most of the exponent regardless of the byte ordering. (The
bitAnd:'s ensure that the intermediate results do not become a large
integer.) Slower than the original version in the ratios 12:5 to 2:1
depending on values. (DNS, 11 May, 1997)"
^ (((self basicAt: 1) bitAnd: 16r00FFFF00) +
((self basicAt: 2) bitAnd: 16r00FFFF00)) bitShift: -8
7/17/2015
Copyright 2000, Georgia Tech
32
More hash functions
“Character” hash
^value
“Point” hash
^(x hash bitShift: 2) bitXor: y hash
“String” hash
|lm|
(l _ m _ self size) <= 2
ifTrue: [l = 2
ifTrue: [m _ 3]
ifFalse: [l = 1
ifTrue: [^((self at: 1) asciiValue bitAnd: 127) * 106].
^21845]].
^(self at: 1) asciiValue * 48 + ((self at: (m - 1)) asciiValue + l)
7/17/2015
Copyright 2000, Georgia Tech
33
Summary
Lots of ways to time/trace in Squeak
MessageTally and TimeProfileBrowser
Making things fast in Squeak
Choose data types wisely
Use primitives
Code wisely
Arrays vs. hashing - for iteration, arrays; for
finding, hashing
7/17/2015
Copyright 2000, Georgia Tech
34