Getting Unstuck: Working with Legacy Code and Data
Download
Report
Transcript Getting Unstuck: Working with Legacy Code and Data
GETTING UNSTUCK:
WORKING WITH LEGACY
CODE AND DATA
Cory Foy – http://www.cornetdesign.com
Goals
What is Legacy Code?
How do we change Legacy Code?
Common patterns for code bases
Does Legacy Code have to be code, or can it be
something else like a really long bullet on a
PowerPoint slide, or perhaps a database?
Next Steps
Legacy Code
How do you define Legacy Code?
Several definitions possible
Code
we’ve gotten from somewhere else
Code you have to change, but don’t understand
Demoralizing code (Big ball of mud)
Code without unit tests
Legacy Code
Legacy Code
Code that needs to have behavior preserved
What is behavior?
The
way in which someone behaves
The way in which a person, organism, or group
responds to a specific set of conditions
The way that a machine operates or a substance reacts
under a specific set of conditions
Legacy Code
What’s
the behavior of the following code?
Legacy Code
Does
the following code add behavior?
Legacy Code
Now
have we changed the behavior?
How do we change Legacy Code?
Why would we want to change the code?
Four reasons to change software
Adding
a feature
Fixing a bug
Improving the design
Optimizing resource usage
Each has unique attributes
Adding a feature / Fixing a bug
Causes the following changes
Structure
Functionality
(adding or replacing)
Need to be able to know the new functionality
works
Need to be able to know that the system as a
whole is still functioning appropriately
Improving the Design
Causes the following changes:
Structure
Note that it does functionality is not listed above
Important to be able to know that all functionality
works before and after the change
Optimizing Resource Usage
Changes
Resource
usage
May cause structure change
Again note that functionality is ideally not in the
above list
Need to have a way to make sure functionality was
not changed
Need to have a way to verify the optimization
goals have been met (and stay met)
Edit and Pray
Carefully plan the changes you are going to make
Make sure you understand the code to be modified
Make the changes
Run the system to make sure the change was made
Do some additional testing to smoke test that
everything seems to be functioning
Pray you don’t get a call at 2am that the system
doesn’t work anymore
Cover and Modify
Verify that the system is working by running the tests
Write tests to expose the behavior you want to add
or change
Write code to make the test pass
Refactor duplication
Wash, rinse, repeat
Verify the system is still working by running the tests
Feather’s Legacy Change Algorithm
Michael Feather’s discusses a Legacy Code Change
Algorithm in Working Effectively with Legacy Code
Five steps
Identify
change points
Find test points
Break dependencies
Write tests
Make changes and refactor
These steps have common steps and scenarios
Patterns for the Change Algorithm
Identify Change Points
One
of the key areas architects and architecture comes
into play
If you aren’t sure where, put it in – you can refactor
later (with unit test support)
Patterns for the Change Algorithm
Identify Change Points
Scenarios
I
don’t understand the code well enough to change it
Notes / Sketching
Listing Markup
Separate Responsibilities
Understand method structure
Extract Methods
Effect Sketch
Scratch Refactoring
Delete Unused Code
Patterns for the Change Algorithm
Identify Change Points
Scenarios
My
application has no structure
Tell the story of the system
Naked CRC (Class, Responsibility, and Collaborations)
Conversation Scrutiny
Patterns for the Change Algorithm
Find Test Points
Where
can you write tests to exercise the behavior you
want to add/change?
Important to have team standards for where unit tests
should go
Patterns for the Change Algorithm
Find Test Points
Scenarios
I
need to make a change, what methods should I test?
Reason about effects (Effect Sketch)
Reasoning Forward (TDD)
Effect propagation
Effect reasoning
Effect analysis
Patterns for the Change Algorithm
Find Test Points
Scenarios
I
need to make many changes in one area – do I have to
break all dependencies?
Interception Points
Higher-Level interception points
Pinch Points (encapsulation boundary)
Pinch Point Traps
Patterns for the Change Algorithm
Break Dependencies
Generally
the most difficult part of the process
Usually don’t have tests to tell if breaking
dependencies will cause problems
Patterns for the Change Algorithm
Break Dependencies
Scenarios
How
do I know I’m not breaking anything?
Hyperaware editing
Single-goal editing
Preserve Signatures
Lean on the compiler
Pair Programming (aka Real-Time Code Reviews)
Patterns for the Change Algorithm
Break Dependencies
Scenarios
I
can’t get this class into a test harness
Irritating Parameters
Hidden Dependencies
Construction Blob
Irritating Global Dependency
Horrible Include Dependencies
Onion Parameter
Aliased Parameter
Patterns for the Change Algorithm
Break Dependencies
Scenarios
I
can’t run this method in a test harness
Hidden Methods
“Helpful” language features
Undetectable Side Effect
Sensing variables
Command/Query Separation
Patterns for the Change Algorithm
Break Dependencies
Scenarios
I
need to change a monster method and can’t write tests
Introduce sensing variables
Extract what you know
Break out a method object
Skeletonize Methods
Find Sequences
Extract to the current class first
Extract small pieces
Be prepared to redo extractions
Patterns for the Change Algorithm
Break Dependencies
Scenarios
It
takes forever to make a change
Understanding
Lag Time
Breaking Dependencies
Build Dependencies
Patterns for the Change Algorithm
Write Tests
Tests
may be more difficult to write then normal unit
tests
May have less-than-ideal scenarios
Patterns for the Change Algorithm
Write Tests
Scenarios
I
need to make a change, but don’t know what tests to write
Characterization Tests
Characterizing Classes
Targeted Testing
Writing
Characterization Tests
Write tests for the area you’ll be making the change. Write as
many as you need to understand the code.
Then write tests for the things you need to change
If converting or moving functionality, write tests to verify the
behavior on a case-by-case basis
DEMO: Change Algorithm at Work
Step through a common scenario, implementing the
tests as we go
Legacy Code isn’t just Code
Most applications aren’t just simple console apps
They deal with many dependencies
File
Systems
Registries
Databases
Hardware
Legacy Code isn’t just Code
These dependencies can cause legacy problems of
their own
Database
schemas
Existing data in the tables
Business logic in the database
No access to development data that mirrors production
In other words, Legacy Data
Legacy Data
So where does this Legacy Data come from?
Flat
Files
XML Documents
RDB’s
Object DB’s
Other DB’s
Application Wrappers
Your DB
Many, many sources
Legacy Data
Legacy data produces its own unique set of
challenges
Data
quality
Data architecture problems
Database design problems
Process-related challenges
Data Quality
Common Data Quality problems
•A single column is used for
several purposes
•Determining the purpose of a
column by the value of one or
more other columns
•Inconsistent data values /
formatting
•Missing data / columns
•Additional columns
•Important attributes and
relationships are hidden in
text fields
•Data values that stray from
their field descriptions and
business rules
•Various key strategies for the
same type of entity
•Unrealized relationships
between data records
•One attribute is stored in
several fields
•Inconsistent use of special
characters
•Different data types for
similar columns
•Different levels of detail
•Different modes of operation
•Varying timeliness of data
•Varying default values
•Various representations
http://www.agiledata.org/essays/legacyDatabases.html#DataProblems
Data Architecture Problems
Common Architectural Problems may include:
Applications responsible for data cleansing (instead of DB)
Different database paradigms
Different hardware platforms / storage
Fragmented / Redundant / Inaccessible data sources
Inconsistent semantics
Inflexible architecture
Lack of event notification
No or inefficient security
Varying timeliness of data sources
Design Problems
There may be key design issues with the database
Database
encapsulation scheme exists, but it’s difficult
to use
Ineffective (or no) naming conventions
Inadequate documentation
Original design goals at odds with current project
needs
Inconsistent key strategy
Design goals at odds with data storage (treating
relational DBs as object DBs, etc)
Design Problems
Example
Application
which presented custom forms to users
Implementers could create custom forms with custom
questions and validations
Beautiful OO architecture – Forms had Groups which
had Items
Everything was rendered dynamically and could be
updated on the fly
Design Problems
Example
The
Form, Group, Item and other “objects” were all
stored as individual records in one database table
A user in the system had on average 74 forms with an
average of 30 questions. With a target of 20,000
users in the database, this would lead to over 50 million
rows in the one table.
We identified one stored proc as one of the main
culprits. It had something like the following
Design Problems
Example
INSERT
INTO @tmpTable
SELECT ot.myCol FROM OtherTable ot
WHERE ot.bitMask & (144567 | 99435) = 0
This led to a full table scan for one of their most heavily
used procs – degrading performance significantly
(average page load time of over 7 seconds)
Working with Legacy Data
So how do you deal with legacy data?
Strategies
Avoid
it
Develop Error Handling Strategy
Work Iteratively and Incrementally
Prefer Read-Only Legacy Access
Encapsulate Legacy Data Access
Introduce Data Adapters for Simple Data Access
Introduce a staging database for complex access
Adopt Existing Tools
Working with Legacy Data
We couldn’t avoid the data – the proc had to be
changed
So we developed an incremental 5 step plan
Add
an IsValidRecord column to the table
Update the Column based on the bitmask for each row
Change the proc to use the column instead of the
bitmask
Make sure all tests are still passing
Introduce Update and Insert Triggers to automatically
populate the column
Working with Legacy Data
Advantages
Required
no change to application code
We could rapidly test the application
We could make incremental changes to see
improvements
What made it work
Testing/QA
Database with production-like data
Regression tests to insure functionality
Timing tests to show performance improvement
Process Problems
All the issues aren’t technical
Working
with legacy data when you don’t have to
Data design drives your object model
Legacy data issues overshadow everything else
App developers ignore legacy issues
You choose not to refactor the legacy data sources
Politics
You are too focused on the data to see the software
Refactoring Databases
Databases should not be left out of the refactoring
process
“An
interesting observation is that when you take a big
design up front (BDUF) approach to development
where your database schema is created early in the
life of your project you are effectively inflicting a
legacy schema on yourself. Don’t do this.”
Scott Ambler maintains a catalog of DB Refactoring
How do you refactor a database?
Refactoring Databases
Refactoring Databases
Implementing Database Refactoring in your
organization
Start
simple
Accept that iterative and incremental development is
the norm
Accept that there is no magic solution to get you out of
your existing mess
Adopt a 100% regression testing policy
Try it
Next Steps
Dealing with legacy code is hard
Integration
issues
Code Issues
Political Issues
There are ways out
Important to address pain points first
Next Steps
So where can you go from here?
Working
Effectively With Legacy Code by Michael
Feathers
Agile Database Techniques by Scott Ambler
Refactoring Databases by Scott Ambler
http://www.agiledata.org
NUnit, JUnit, CppUnit, CppUnitLite, dbFit, Fitnesse
http://www.cornetdesign.com