Getting Unstuck: Working with Legacy Code and Data

Download Report

Transcript Getting Unstuck: Working with Legacy Code and Data

GETTING UNSTUCK:
WORKING WITH LEGACY
CODE AND DATA
Cory Foy – http://www.cornetdesign.com
Goals





What is Legacy Code?
How do we change Legacy Code?
Common patterns for code bases
Does Legacy Code have to be code, or can it be
something else like a really long bullet on a
PowerPoint slide, or perhaps a database?
Next Steps
Legacy Code


How do you define Legacy Code?
Several definitions possible
 Code
we’ve gotten from somewhere else
 Code you have to change, but don’t understand
 Demoralizing code (Big ball of mud)
 Code without unit tests
Legacy Code
Legacy Code


Code that needs to have behavior preserved
What is behavior?
 The
way in which someone behaves
 The way in which a person, organism, or group
responds to a specific set of conditions
 The way that a machine operates or a substance reacts
under a specific set of conditions
Legacy Code
 What’s
the behavior of the following code?
Legacy Code
 Does
the following code add behavior?
Legacy Code
 Now
have we changed the behavior?
How do we change Legacy Code?


Why would we want to change the code?
Four reasons to change software
 Adding
a feature
 Fixing a bug
 Improving the design
 Optimizing resource usage

Each has unique attributes
Adding a feature / Fixing a bug

Causes the following changes
 Structure
 Functionality


(adding or replacing)
Need to be able to know the new functionality
works
Need to be able to know that the system as a
whole is still functioning appropriately
Improving the Design

Causes the following changes:
 Structure


Note that it does functionality is not listed above
Important to be able to know that all functionality
works before and after the change
Optimizing Resource Usage

Changes
 Resource
usage
 May cause structure change



Again note that functionality is ideally not in the
above list
Need to have a way to make sure functionality was
not changed
Need to have a way to verify the optimization
goals have been met (and stay met)
Edit and Pray






Carefully plan the changes you are going to make
Make sure you understand the code to be modified
Make the changes
Run the system to make sure the change was made
Do some additional testing to smoke test that
everything seems to be functioning
Pray you don’t get a call at 2am that the system
doesn’t work anymore
Cover and Modify






Verify that the system is working by running the tests
Write tests to expose the behavior you want to add
or change
Write code to make the test pass
Refactor duplication
Wash, rinse, repeat
Verify the system is still working by running the tests
Feather’s Legacy Change Algorithm


Michael Feather’s discusses a Legacy Code Change
Algorithm in Working Effectively with Legacy Code
Five steps
 Identify
change points
 Find test points
 Break dependencies
 Write tests
 Make changes and refactor

These steps have common steps and scenarios
Patterns for the Change Algorithm

Identify Change Points
 One
of the key areas architects and architecture comes
into play
 If you aren’t sure where, put it in – you can refactor
later (with unit test support)
Patterns for the Change Algorithm

Identify Change Points
 Scenarios
I
don’t understand the code well enough to change it




Notes / Sketching
Listing Markup
 Separate Responsibilities
 Understand method structure
 Extract Methods
 Effect Sketch
Scratch Refactoring
Delete Unused Code
Patterns for the Change Algorithm

Identify Change Points
 Scenarios
 My



application has no structure
Tell the story of the system
Naked CRC (Class, Responsibility, and Collaborations)
Conversation Scrutiny
Patterns for the Change Algorithm

Find Test Points
 Where
can you write tests to exercise the behavior you
want to add/change?
 Important to have team standards for where unit tests
should go
Patterns for the Change Algorithm

Find Test Points
 Scenarios
I
need to make a change, what methods should I test?





Reason about effects (Effect Sketch)
Reasoning Forward (TDD)
Effect propagation
Effect reasoning
Effect analysis
Patterns for the Change Algorithm

Find Test Points
 Scenarios
I
need to make many changes in one area – do I have to
break all dependencies?




Interception Points
Higher-Level interception points
Pinch Points (encapsulation boundary)
Pinch Point Traps
Patterns for the Change Algorithm

Break Dependencies
 Generally
the most difficult part of the process
 Usually don’t have tests to tell if breaking
dependencies will cause problems
Patterns for the Change Algorithm

Break Dependencies
 Scenarios
 How





do I know I’m not breaking anything?
Hyperaware editing
Single-goal editing
Preserve Signatures
Lean on the compiler
Pair Programming (aka Real-Time Code Reviews)
Patterns for the Change Algorithm

Break Dependencies
 Scenarios
I
can’t get this class into a test harness







Irritating Parameters
Hidden Dependencies
Construction Blob
Irritating Global Dependency
Horrible Include Dependencies
Onion Parameter
Aliased Parameter
Patterns for the Change Algorithm

Break Dependencies
 Scenarios
I
can’t run this method in a test harness



Hidden Methods
“Helpful” language features
Undetectable Side Effect
 Sensing variables
 Command/Query Separation
Patterns for the Change Algorithm

Break Dependencies
 Scenarios
I
need to change a monster method and can’t write tests








Introduce sensing variables
Extract what you know
Break out a method object
Skeletonize Methods
Find Sequences
Extract to the current class first
Extract small pieces
Be prepared to redo extractions
Patterns for the Change Algorithm

Break Dependencies
 Scenarios
 It
takes forever to make a change




Understanding
Lag Time
Breaking Dependencies
Build Dependencies
Patterns for the Change Algorithm

Write Tests
 Tests
may be more difficult to write then normal unit
tests
 May have less-than-ideal scenarios
Patterns for the Change Algorithm

Write Tests
 Scenarios
I
need to make a change, but don’t know what tests to write



Characterization Tests
Characterizing Classes
Targeted Testing
 Writing



Characterization Tests
Write tests for the area you’ll be making the change. Write as
many as you need to understand the code.
Then write tests for the things you need to change
If converting or moving functionality, write tests to verify the
behavior on a case-by-case basis
DEMO: Change Algorithm at Work

Step through a common scenario, implementing the
tests as we go
Legacy Code isn’t just Code


Most applications aren’t just simple console apps
They deal with many dependencies
 File
Systems
 Registries
 Databases
 Hardware
Legacy Code isn’t just Code

These dependencies can cause legacy problems of
their own
 Database
schemas
 Existing data in the tables
 Business logic in the database
 No access to development data that mirrors production

In other words, Legacy Data
Legacy Data

So where does this Legacy Data come from?
 Flat
Files
 XML Documents
 RDB’s
 Object DB’s
 Other DB’s
 Application Wrappers
 Your DB
 Many, many sources
Legacy Data

Legacy data produces its own unique set of
challenges
 Data
quality
 Data architecture problems
 Database design problems
 Process-related challenges
Data Quality

Common Data Quality problems
•A single column is used for
several purposes
•Determining the purpose of a
column by the value of one or
more other columns
•Inconsistent data values /
formatting
•Missing data / columns
•Additional columns
•Important attributes and
relationships are hidden in
text fields
•Data values that stray from
their field descriptions and
business rules
•Various key strategies for the
same type of entity
•Unrealized relationships
between data records
•One attribute is stored in
several fields
•Inconsistent use of special
characters
•Different data types for
similar columns
•Different levels of detail
•Different modes of operation
•Varying timeliness of data
•Varying default values
•Various representations
http://www.agiledata.org/essays/legacyDatabases.html#DataProblems
Data Architecture Problems

Common Architectural Problems may include:
Applications responsible for data cleansing (instead of DB)
 Different database paradigms
 Different hardware platforms / storage
 Fragmented / Redundant / Inaccessible data sources
 Inconsistent semantics
 Inflexible architecture
 Lack of event notification
 No or inefficient security
 Varying timeliness of data sources

Design Problems

There may be key design issues with the database
 Database
encapsulation scheme exists, but it’s difficult
to use
 Ineffective (or no) naming conventions
 Inadequate documentation
 Original design goals at odds with current project
needs
 Inconsistent key strategy
 Design goals at odds with data storage (treating
relational DBs as object DBs, etc)
Design Problems

Example
 Application
which presented custom forms to users
 Implementers could create custom forms with custom
questions and validations
 Beautiful OO architecture – Forms had Groups which
had Items
 Everything was rendered dynamically and could be
updated on the fly
Design Problems

Example
 The
Form, Group, Item and other “objects” were all
stored as individual records in one database table
 A user in the system had on average 74 forms with an
average of 30 questions. With a target of 20,000
users in the database, this would lead to over 50 million
rows in the one table.
 We identified one stored proc as one of the main
culprits. It had something like the following
Design Problems

Example
 INSERT
INTO @tmpTable
SELECT ot.myCol FROM OtherTable ot
WHERE ot.bitMask & (144567 | 99435) = 0
 This led to a full table scan for one of their most heavily
used procs – degrading performance significantly
(average page load time of over 7 seconds)
Working with Legacy Data


So how do you deal with legacy data?
Strategies
 Avoid
it
 Develop Error Handling Strategy
 Work Iteratively and Incrementally
 Prefer Read-Only Legacy Access
 Encapsulate Legacy Data Access
 Introduce Data Adapters for Simple Data Access
 Introduce a staging database for complex access
 Adopt Existing Tools
Working with Legacy Data


We couldn’t avoid the data – the proc had to be
changed
So we developed an incremental 5 step plan
 Add
an IsValidRecord column to the table
 Update the Column based on the bitmask for each row
 Change the proc to use the column instead of the
bitmask
 Make sure all tests are still passing
 Introduce Update and Insert Triggers to automatically
populate the column
Working with Legacy Data

Advantages
 Required
no change to application code
 We could rapidly test the application
 We could make incremental changes to see
improvements

What made it work
 Testing/QA
Database with production-like data
 Regression tests to insure functionality
 Timing tests to show performance improvement
Process Problems

All the issues aren’t technical
 Working
with legacy data when you don’t have to
 Data design drives your object model
 Legacy data issues overshadow everything else
 App developers ignore legacy issues
 You choose not to refactor the legacy data sources
 Politics
 You are too focused on the data to see the software
Refactoring Databases

Databases should not be left out of the refactoring
process
 “An
interesting observation is that when you take a big
design up front (BDUF) approach to development
where your database schema is created early in the
life of your project you are effectively inflicting a
legacy schema on yourself. Don’t do this.”


Scott Ambler maintains a catalog of DB Refactoring
How do you refactor a database?
Refactoring Databases
Refactoring Databases

Implementing Database Refactoring in your
organization
 Start
simple
 Accept that iterative and incremental development is
the norm
 Accept that there is no magic solution to get you out of
your existing mess
 Adopt a 100% regression testing policy
 Try it
Next Steps

Dealing with legacy code is hard
 Integration
issues
 Code Issues
 Political Issues


There are ways out
Important to address pain points first
Next Steps

So where can you go from here?
 Working
Effectively With Legacy Code by Michael
Feathers
 Agile Database Techniques by Scott Ambler
 Refactoring Databases by Scott Ambler
 http://www.agiledata.org
 NUnit, JUnit, CppUnit, CppUnitLite, dbFit, Fitnesse
 http://www.cornetdesign.com