Optimizing Squeak - Georgia Institute of Technology
Download
Report
Transcript Optimizing Squeak - Georgia Institute of Technology
Optimizing Squeak
Measuring the Speed of Squeak
MessageTally and TimeProfileBrowser
Changes to improve speed
Choose operations appropriately
Choose collections to improve speed
How collections work
Build a primitive
When building a primitive is useful/necessary
Coming soon: How the VM works and how to
build primitives...
7/17/2015
Copyright 2000, Georgia Tech
1
MessageTally
MessageTally provides a variety of tools for
analyzing your code.
time: - Returns the time in milliseconds that it
took to do some operation
MessageTally time: [100000 timesRepeat: [4 * 4]] “44”
MessageTally time: [100000 timesRepeat: [4.0 * 4.0]] “80”
MessageTally time: [100000 timesRepeat: [4 * 4.0]] “76”
MessageTally time: [100000 timesRepeat: [4.0 * 4]] “79”
7/17/2015
Copyright 2000, Georgia Tech
2
What’s eating up the time?
Part of it: Floats are slower
But the bigger part of it is the unpacking of
Object -> Class (to figure out type) -> NativeFormat
MessageTally time: [10000 timesRepeat: [432432432 * 4324324324.0]]
“9”
MessageTally time: [10000 timesRepeat: [4324324324.0 * 432432432]]
“8”
MessageTally time: [10000 timesRepeat: [4 * 4324324324.0]] “9”
MessageTally time: [10000 timesRepeat: [4324324324.0 * 4]] “7”
7/17/2015
Copyright 2000, Georgia Tech
3
A Different Way to Look at
Executing Code
At regular intervals, interrupt the executing
process with a “spy” process
Figure out which method it is that it executing
at that moment
Reports
The “tree” of which methods called which
other methods
The percentage of time spent (over the whole
tree) in each “leaf”
7/17/2015
Copyright 2000, Georgia Tech
4
MessageTally spyOn:
Does a “spy” on a process
Reports percentages of time
A “primitive” leaf is attributed to its method
(one-level up)
To trim tree, <2% is not shown, but can be
added into leaves
Example: MessageTally spyOn: [10000
timesRepeat: [3.14159 printString]]
7/17/2015
Copyright 2000, Georgia Tech
5
- 139 tallies, 2407 msec.
**Tree**
100.0 Float(Object)>>printString
74.1 Float(Number)>>printOn:
|74.1 Float>>printOn:base:
| 74.1 Float>>absPrintOn:base:
| 18.7 Character class>>digitValue:
| 16.5 primitives
| 16.5 False>>|
| 15.1 Float(Number)>>ceiling
| |12.9 Float(Number)>>floor
| 4.3 LimitedWriteStream(WriteStream)>>nextPut:
25.9 String class(SequenceableCollection class)>>streamContents:limitedTo:
17.3 LimitedWriteStream(WriteStream)>>contents
|15.1 String(SequenceableCollection)>>copyFrom:to:
| 15.1 String(Object)>>species
7.2 LimitedWriteStream class(PositionableStream class)>>on:
5.8 LimitedWriteStream(WriteStream)>>on:
3.6 LimitedWriteStream(PositionableStream)>>on:
7/17/2015
Copyright 2000, Georgia Tech
6
**Leaves**
18.7 Character class>>digitValue:
18.0 False>>|
16.5 Float>>absPrintOn:base:
15.1 String(Object)>>species
12.9 Float(Number)>>floor
4.3 LimitedWriteStream(WriteStream)>>nextPut:
2.9 SmallInteger(Magnitude)>>max:
7/17/2015
Copyright 2000, Georgia Tech
7
Where does the time go?
Notice in that example
The biggest piece of the printString execution
is the conversion of each individual digit to
character
Character class>>digitValue:
But second biggest is a logical Or.
18.0 False>>|
Where is that happening?
7/17/2015
Copyright 2000, Georgia Tech
8
TimeProfileBrowser
TimeProfileBrowser does do spying, like
MessageTally
TimeProfileBrowser onBlock: [10000
timesRepeat: [3.14159 printString]]
But it also acts as a code browser so that
you can see each piece of code!
7/17/2015
Copyright 2000, Georgia Tech
9
TimeProfileBrowser
7/17/2015
Copyright 2000, Georgia Tech
10
The Problem of Spying
Spying is inaccurate
Run the same test several times: Different results
each time!
Have to run something often enough (e.g., 1000
timesRepeat:…) to catch the right methods
Alternative, accurate counts with tallySends:
Uses the fact that Squeak’s VM is generated from a working
simulation of the VM
Actually simulates the VM to get perfectly accurate counts of
how often each method is called.
Can also be useful for debugging: It’s a trace!
7/17/2015
Copyright 2000, Georgia Tech
11
MessageTally tallySends:
[3.14159 printString]
This simulation took 0.0 seconds.
**Tree**
2 Float(Object)>>printString
1 Float(Number)>>printOn:
|1 Float>>printOn:base:
| 1 Float>>absPrintOn:base:
| |7 SmallInteger>>*
| | |7 SmallInteger(Integer)>>*
| | | 7 Float>>adaptToInteger:andSend:
| |7 LimitedWriteStream(WriteStream)>>nextPut:
| |6 Character class>>digitValue:
7/17/2015
Copyright 2000, Georgia Tech
12
Measuring Squeak’s Speed
Now that we have tools for measuring
Squeak, let’s start figuring out what’s slow
and what’s fast.
What’s fast:
Integer arithmetic is faster than floating point
(expected)
Special messages, coded into the bytecode
+ - > < at: at:put: bitOr: bitAnd: class = == new
value do: size
7/17/2015
Copyright 2000, Georgia Tech
13
The VM and Bytecodes
The VM (e.g., squeak.exe) interprets
bytecodes
Bytecodes are the machine language of a
virtual machine
The “VM” is, strictly speaking, a “VM
simulator” or “interpreter”
You can see bytecodes for a method by
doing “show bytecodes” from code pane
7/17/2015
Copyright 2000, Georgia Tech
14
Special Messages are fast
lookups
Special messages, like +, actually map to a single
bytecode
One memory access, no lookup
Non-special messages involve passing a pointer to a memory
location where the message selector is stored
7/17/2015
Copyright 2000, Georgia Tech
15
A Word on Primitives
Primitives are the bottommost layer of the
method hierarchy
They are not defined in terms of
bytecodes, but in terms of the native code
Think of them as subroutine calls into the VM
You can make up your own primitives!
In latest versions of Squeak, they can even
be dynamically loaded
7/17/2015
Copyright 2000, Georgia Tech
16
But much of speed is Squeaklevel choices
Integers vs. floats, Squeak-code vs.
primitives are low-level VM decisions
Most of what determines fast or slow code
is at the level of your Squeak code
Choices in collections
Algorithm coding
7/17/2015
Copyright 2000, Georgia Tech
17
Brief review of Collections
Dictionary: Takes a key and a value, e.g.,
aDict at: ‘dog’ put: ‘Rufus’.
Array: Just like any language
OrderedCollection: Like a Java vector
Bag: You can add to it, and it remembers
the number of identical elements
Set: You can add to it, and it remembers
only the element
7/17/2015
Copyright 2000, Georgia Tech
18
Speed of Adding
Dictionaries are the most general indexed
collection, but they’re also slow to add to.
d := Dictionary new.
MessageTally time: [1 to: 10000 do: [:i | d at: i put:
i]]. “152”
a := Array new: 10000.
MessageTally time: [1 to: 10000 do: [:i | a at: i put:
i]]. “2”
7/17/2015
Copyright 2000, Georgia Tech
19
OrderedCollections are only slow
to grow
oc := OrderedCollection new: 10000.
MessageTally time: [1 to: 10000 do: [:i | oc add: i]]. “17”
MessageTally time: [1 to: 10000 do: [:i | oc at: i put: i]]. “11”
Once an OrderedCollection is the right
size, at:put: is within six times the speed
of an Array (2 ms from previous slide)
It’s slower because Array’s at:put: is a
primitive, while OC’s checks bounds first
7/17/2015
Copyright 2000, Georgia Tech
20
Why are OrderedCollections slow to
grow?
add: newObject
^self addLast: newObject
addLast: newObject
"Add newObject to the end of the receiver. Answer
newObject."
lastIndex = array size ifTrue: [self makeRoomAtLast].
lastIndex := lastIndex + 1.
array at: lastIndex put: newObject.
^ newObject
“makeRoomAtLast calls self grow…”
7/17/2015
Copyright 2000, Georgia Tech
21
OrderedCollections double in size
on each grow!
grow
"Become larger. Typically, a subclass has to override this if the
subclass adds instance variables."
| newArray |
newArray := Array new: self size + self growSize.
newArray replaceFrom: 1 to: array size with: array startingAt: 1.
array:= newArray
growSize
^ array size max: 2 “returns the maximum of the array size or 2”
7/17/2015
Copyright 2000, Georgia Tech
22
Is that bad?
Think about the average case of adding to
an OrderedCollection
Most of the time it won’t need to grow
Doubling in size means that you’ll not do it
very often!
You’re trading off space for time, a classic
tradeoff
7/17/2015
Copyright 2000, Georgia Tech
23
Speed of Access
MessageTally time: [1 to: 10000 do: [:i | d at:
i]]. “Dictionary: 60”
MessageTally time: [1 to: 10000 do: [:i | a at:
i]]. “Array: 2”
MessageTally time: [1 to: 10000 do: [:i | oc
at: i]]. “OrderedCollection: 9”
7/17/2015
Copyright 2000, Georgia Tech
24
SortedCollections are great but
slow
SortedCollections keep their components
sorted, but that’s a cost (note that the
below are a magnitude less than previous)
sc := SortedCollection new.
MessageTally time: [1 to: 1000 do: [:i | sc
add: i]]. “12”
MessageTally time: [1 to: 1000 do: [:i | d at:
i]]. “4”
7/17/2015
Copyright 2000, Georgia Tech
25
Adding to Non-Sequenced
Collections
o := OrderedCollection new.
MessageTally time: [1 to: 10000 do: [:i | o add: i]].
“14”
s := Set new.
MessageTally time: [1 to: 10000 do: [:i | s add: i]].
“113”
b := Bag new.
MessageTally time: [1 to: 10000 do: [:i | b add: i]].
“265”
7/17/2015
Copyright 2000, Georgia Tech
26
Let’s find an element!
MessageTally time: [10 timesRepeat: [o
detect: [:n | n >= 5000]]]. “45”
MessageTally time: [10 timesRepeat: [s
detect: [:n | n >= 5000]]]. “48”
MessageTally time: [10 timesRepeat: [b
detect: [:n | n >= 5000]]]. “256”
Bags looks unbearably slow! Why would
you ever use one?
7/17/2015
Copyright 2000, Georgia Tech
27
Iteration is the wrong way to find
an element!
MessageTally time: [100 timesRepeat: [o
includes: 5000]]. “444”
MessageTally time: [100 timesRepeat: [s
includes: 5000]]. “0”
MessageTally time: [100 timesRepeat: [b
includes: 5000]]. “0”
7/17/2015
Copyright 2000, Georgia Tech
28
How are Bags so fast?
Dictionaries!
Bags are so fast because their
implementation is actually a Dictionary (a
hashtable)!
Dictionaries are not slow!
They’re slow if you use them as arrays, and
they’re slow to iterate across
But for finding a specific element, they are
blindingly fast!
7/17/2015
Copyright 2000, Georgia Tech
29
Implementation of Bags
Bags have one instance variable, a
Dictionary named contents
add: newObject
^self add: newObject withOccurrences: 1
add: newObject withOccurrences: anInteger
"Add the element newObject to the receiver. Do so as though the
element were added anInteger number of times. Answer
newObject."
contents at: newObject put: (contents at: newObject ifAbsent: [0])
+ anInteger.
^ newObject
7/17/2015
Copyright 2000, Georgia Tech
30
Dictionaries are key to fast
lookups
Dictionaries are used heavily in Squeak
E.g., Smalltalk is a kind of Dictionary
Everything in Smalltalk (or Squeak) knows
its own hash
Hash functions need to be
Fast
Unique for unique objects
Captures how objects differ in actual practice
7/17/2015
Copyright 2000, Georgia Tech
31
Some Sample Hash Functions
“Integer” hash
^(self lastDigit bitShift: 8) + (self digitAt: 1)
“Float” hash
"Both words of the float are used; 8 bits are removed from each end
to clear most of the exponent regardless of the byte ordering. (The
bitAnd:'s ensure that the intermediate results do not become a large
integer.) Slower than the original version in the ratios 12:5 to 2:1
depending on values. (DNS, 11 May, 1997)"
^ (((self basicAt: 1) bitAnd: 16r00FFFF00) +
((self basicAt: 2) bitAnd: 16r00FFFF00)) bitShift: -8
7/17/2015
Copyright 2000, Georgia Tech
32
More hash functions
“Character” hash
^value
“Point” hash
^(x hash bitShift: 2) bitXor: y hash
“String” hash
|lm|
(l _ m _ self size) <= 2
ifTrue: [l = 2
ifTrue: [m _ 3]
ifFalse: [l = 1
ifTrue: [^((self at: 1) asciiValue bitAnd: 127) * 106].
^21845]].
^(self at: 1) asciiValue * 48 + ((self at: (m - 1)) asciiValue + l)
7/17/2015
Copyright 2000, Georgia Tech
33
Summary
Lots of ways to time/trace in Squeak
MessageTally and TimeProfileBrowser
Making things fast in Squeak
Choose data types wisely
Use primitives
Code wisely
Arrays vs. hashing - for iteration, arrays; for
finding, hashing
7/17/2015
Copyright 2000, Georgia Tech
34