Spreadsheet Design Concepts

Download Report

Transcript Spreadsheet Design Concepts

Spreadsheet Design Concepts
Simon Murphy
[email protected]
Developer – Codematic Ltd
XL User
Conference 2005
Patrick
• Very disappointed not to be here
• Has made his session available for EUC
attendees only at:
• www.sysmod.com/pobssauditaug05.pdf
• Be quick I guess he’ll remove it soon
XL User
Conference 2005
Spreadsheet background
•
•
•
•
•
•
5-60 Mb size
20-200,000 formulas
1-10,000 unique formulas
5-10,000 lines of VBA
£1M to Billions in values
Often linked to other technologies such as OLAP, ADO,
COM or .net etc.
• Finance, Banking and Sales and Marketing areas
• Development cost up to $1M
• Active member of Eusprig – European Spreadsheet Risk
group – dedicated to raising awareness of dangers and
error rates in commercial spreadsheets www.eusprig.org
XL User
Conference 2005
Agenda
•
•
•
•
Fundamental Imperative
Security
Software Development Lifecycle
Design
–
–
–
–
–
Technology choice
Inputs
Data
Logic/Formulas
Documentation
• Summary
• Any Questions
XL User
Conference 2005
Fundamental Imperative
• Manage complexity (McConnell)
• Solution complexity grows at 4 x the rate of
problem complexity. (Glass)
• [Things] should be as simple as they can be, but
no simpler (Einstein)
• K.I.S.S.
• This principle should drive all other work.
• Easier to build, easier to test, easier to document
etc…
XL User
Conference 2005
Security
•
•
•
Is everybody’s concern
Spreadsheets can be used as a staging board for privilege escalation (with
your login details!)
Consider SD3 +C
– Secure by
•
•
•
•
Design
Default
Deployment
Communication
•
•
Threat Modelling – Assets, Threats
Threat Types – STRIDE
•
Threats – rate with DREAD
•
Spreadsheets (all flavours) are fairly insecure
•
•
Set macro security to high and use code signing certificates.
See Microsoft MOC 2840A – Implementing security for more info.
– (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service,
Elevation of Privilege)
– (Damage potential, Reproducibility, Exploitability, Affected Users, Discoverability)
– Compiled UDFs (.net, COM, XLL) and Database servers can help
XL User
Conference 2005
Software Development Lifecycle
• Systems Development lifecycle
–
–
–
–
–
–
–
–
–
–
Requirements,
Analysis,
Logical Design,
<Technology Choice>,
Physical Design,
Construction,
Test,
Release,
Maintain.
In some shape or form.
• Spreadsheet Development lifecycle – “Oh! I
need a model” – clickety-click, reasonableness check,
release, (Test & Maintain in live environment).
XL User
Conference 2005
Big Design Issues
•
•
•
•
•
•
•
•
•
•
•
Project Scope
Business needs
Security
Performance
Maintainability
Extensibility
Availability
Scalability
Human factors
Integration
Methodologies
XL User
Conference 2005
Design – Excel or not
• Excel/VBA is often not technically the best
solution
– Databases better for large volumes of data
– Compiled languages better for security
– Spreadsheets are a 2 dimensional tool where most
business problems are 5+ dimensions
• It is good when considering
–
–
–
–
Initial speed of development
Cost of initial development
Current skill sets
Simple initial deployment
Note: If you need to restrict Excel functionality you may be better with .net and a grid control
XL User
Conference 2005
Assuming Spreadsheet
• Design
–
–
–
–
–
–
–
–
–
N-Tier
Defensive Designs
Inputs
Data
Layout
Formulas
VBA
Extending Excel
Documentation
XL User
Conference 2005
N-Tier
• Basic version is 3 tier
– Data layer
– Logic (or analysis) layer
– Presentation layer
• Each of these may be further broken out
(into N tiers)
• Easily implemented in worksheets
• Suitable for most non trivial workbooks
• Powerful and flexible but adds weight
XL User
Conference 2005
Basic design (N-tier)
Model
Support structure
Inputs
Assumptions
Reference Data
WorkbookStructure
Logic
Accountability
Reports
Revision history
XL User
Conference 2005
Demo
Defensive Designs
•
•
•
•
•
•
•
•
•
•
•
How will I test this?
Clearly identify Input areas
Group items that need updating with similar frequency
Place formulas where they are safe from accidents
Don’t use sheet protection, its weak, annoying, counter productive,
and reduces peoples ability to check and understand your logic (it
creates more problems than it solves) Use a compiled component if
security is a genuine concern. Demo
Use cross checks
Use a (simple) consistent convention to separate items that should
be treated differently (inputs/formulas or actual/forecast)
Use blanks cells around different blocks (enables ‘Current Region’)
Consider using a status sheet with summary error reports on it
Consider support structure – Index, assumptions, accountability.
Consider personal and tool comfort zone
XL User
Conference 2005
Inputs
• Never trust inputs
• Always test for correctness as soon as possible
• Limit inputs wherever possible (eg option buttons, or drop
downs)
• Use data validation but be aware of its limits demo
• Help users by clearly identifying what is needed
• Give clear feedback on errors or problems
• Minimise input – Let Excel do the work
• Have validation formulas nearby
• If test is ok say “ok” (or “Row/Column check ok”) not “”
XL User
Conference 2005
Database (Relational)
• Useful for flexible and efficient storage and updating
• Identify Entities (nouns), Attributes (adjectives) and
relationships (verbs) in problem statement
• 2 approaches
– Top down (identify entities then attributes)
– Bottom up (group attributes to describe entities)
• Data normalisation
– Basically keep related items together
– Provides design flexibility
– 3rd Normal Form (TNF/3NF)
• All items depend on the key (a unique identifier), the whole key
and nothing but the key.
• More useful at the transaction processing end rather
than analysis and reporting. (eg sales recording)
• Analysis performance can be poor
• PL demo
XL User
Conference 2005
Database (OLAP)
• Useful for flexible/powerful reporting.
• The most useful concept of Dimensionality – the number of ways to
describe something
• In a spreadsheet it’s the row, column, worksheet, workbook, and
maybe directory.
• Very useful for things that may ordinarily be missed
• How to describe the numbers (dimensionality)
– Eg Inflation <Time?, Country?, Cost Type?>
– Eg. P&L value <P&L line description, Time, Business unit, value type
(actual, budget etc), currency, Company etc>
– Useful for layout – time in cols or time in sheets?
• Hierarchies are ways of adding up dimension elements
–
–
–
–
There may be more than one hierarchy for any given dimension
Eg time
day > week > quarter > financial year
day > calendar month > calendar year
• Reporting performance can be excellent as many calculations preaggregated
XL User
Conference 2005
Spreadsheet eg. of Dimensionality
XL User
Conference 2005
Spreadsheet eg. of Dimensionality 2
338 is
Period 8 2005
Actual Figure
For
Employment
Costs
For the North
For Large Co
XL User
Conference 2005
Pivot Tables
• Probably the most important feature in
spreadsheets
• A superb way to manage complexity
• Most users can’t access them because their data
is already half pivoted.
• Repeated blocks are a strong hint to use pivots
• Pivot source data should have 1 one column
with numbers in (slightly simplified)
• Demo
XL User
Conference 2005
Software Development Principles
•
•
•
•
Modularisation
Cohesion
Coupling
Fan in / Fan out
XL User
Conference 2005
Modularisation
•
•
Basic idea – to break down complexity into understandable chunks (note
Millers Theorem (7+/-2))
Advantages
•
Disadvantages/Limitations
•
Use of a block of cells for one (single) task
–
–
–
–
–
Simplifies and adds analysis layers
Adds flexibility
Improves robustness
Reduces dependencies
Improves testability
– All Cells can be read from everywhere
– Can add redundancy
– Not really applicable for very small models
Volume Calcs
(eg Geographic)
Total Volumes
XL User
Conference 2005
Volume Summary
Sales Price
Total Revenue
Worksheet modularisation
XL User
Conference 2005
Worksheet modularisation 2
XL User
Conference 2005
Modular blocks
A ‘block’ is a area of cells surrounded by empty cells, that performs some analysis
XL User
Conference 2005
Cohesion
• How interrelated a unit is
• High cohesion means all elements are highly
interrelated – this is good, it aids understanding
and reduces range of influence
• Low cohesion makes things harder to
understand like random letters
• Cohesion is like well normalised data – but also
considers what the unit does
• Cohesion test…
XL User
Conference 2005
Cohesion 2
• A<kfj\s =]’#\kdjw487
• Give me all your chocolate
XL User
Conference 2005
Cohesion 3
• Who can remember the first string?
• The second?
XL User
Conference 2005
Cohesion 4
• Keep different shaped data blocks apart
• Either diagonally or on separate sheets
• Try to make all formulas in a block similar
XL User
Conference 2005
Poor Cohesion
XL User
Conference 2005
Coupling
• How strongly 2 separate elements depend
on each other
• Low coupling is better, especially through
clearly defined interfaces
• High coupling often means hidden
dependencies which generally leads to
incorrect modifications (side effects)
• Example: hard coded cell addresses in
VBA code
XL User
Conference 2005
Fan in / fan out
• Low fan in – a ‘unit’ (cell/worksheet/VBA
routine) depends on only a few other units
– Good because it minimises dependencies and reduces
complexity
• High fan out – a ‘unit’ (cell/worksheet/VBA
routine) is used by many others
– Good because it minimises duplication
• Example: putting VAT (Sales tax) rates in their
own cells, and referring there in calculations
• Example: calculating an offset once and using
the result many times
XL User
Conference 2005
Logic/Formulas
•
•
•
•
•
•
•
•
•
•
•
Sketch design on a whiteboard first
Don’t use IF(ISERROR(, be specific demo
Put expected part first in Ifs
Don’t start in A1 – Give yourself room to manoeuvre. Try D10 – hide
the unused.
Line sheets up on first data cell rather than headers
Don’t hide rows and columns, use a separate sheet if possible
Use Goal seek and or VBA rather than circular references
Avoid more than 2 or 3 levels of formula nesting – break it out
across several cells.
Use tools – many pay for themselves on first use. But don’t totally
rely on them.
Place totals above and to the left of details (more flexible and robust
eg links, filtering)
Build for testability
XL User
Conference 2005
VBA Connection
•
•
•
VBA UDFs should get all range info via parameters
Where (non UDF) VBA uses worksheet ranges, these should almost certainly
be named ranges.
– Sheet1.[Inflation].value = 0.02, rather than
– Sheet1.[C5].value = 0.02, which may become invalid if rows or columns
are inserted or deleted
– strInflationRange = “C5” is just as bad – it creates a hidden dependency
that must be manually updated when worksheet changes (poor coupling)
Variable and routine names:
•
•
•
•
•
•
Don’t use code comments demo make code ultra clear instead
Option Explicit on
Avoid Application.Run (non VBA – breaks error management)
routines 1 screen long max
3-4 levels of nesting max
Use source control
–
–
–
–
–
Use very meaningful names (8-30 chars length)
Use a simple naming convention (matched pairs)
Use scope prefixes (g, m)
Don’t use data type variable prefixes (s, str, l, lng etc) (see .net advice)
avoid abbreviations
XL User
Conference 2005
Extending Excel
• Excel is powerful not perfect
• Leverage benefits whilst managing weaknesses
by using complimentary technologies
• Data: VBA, ADO, ODBC, OLAP, .net, COM, Info
Bridge, XML, DDE, Web Queries, SOX Solution
Accelerator
• Logic: VBA, COM, .net, xll, Pivots, Filters
• Search the web for vast array of free or cheap
tips and tools.
XL User
Conference 2005
Documentation
• If you design for simplicity only very minor additional
documentation is needed
• Design and build the user Interface with the Users needs
primary, the documentation will automatically be there
• Design and build the business logic parts with the
maintainers needs primary, the main documentation will
automatically be there, expand as required.
• Integral tests should explicitly clarify intent.
• Reports should contain enough description to be
meaningful.
• External documentation is almost always so out of date its
worse than useless.
• Excessive documentation is too hard to plod through
• Poorly targeted documentation is pointless
• Working software is more useful than documentation
• Sometimes documentation is important
XL User
Conference 2005
Summary
•
•
•
•
•
•
•
•
Manage complexity
Consider security carefully
Excel/VBA not the best tool for everything
Be defensive, especially with inputs
Understand your data
Aim for cohesive models with low coupling
Use names to connect VBA to worksheets
Use complimentary technologies where
appropriate
• Manage documentation
XL User
Conference 2005
Discussion points
• VBA comments – Good or Bad?
– My view - bad
• Test before release – Good or Bad?
– My view - bad
XL User
Conference 2005
Questions?
• [email protected]
– Spreadsheet consulting, reviewing,
maintaining, rescuing, migrating, add-in
development etc.
• Websites
– www.codematic.net
– www.xlanalyst.co.uk
XL User
Conference 2005