Transcript Slide 1
Processing and Analyzing Electronic Data
Arizona Paralegal Association Phoenix, September 12, 2006 Cliff Shnier, JD Director, Business Development Cataphora Inc Scottsdale, AZ 480-661-6183 [email protected]
The New Rules: they’re He-e-e-e-ere!
– The Supreme Court approved the changes and transmitted them to Congress on April 12, 2006. – All that’s needed is the enabling legislation. – These rule changes affect Rules 16, 26, 33, 34, 37, 45 and Form 35.
FRCP 26(a) [amended]
Rule 26. General Provisions Governing Discovery; Duty of Disclosure (a) REQUIRED DISCLOSURES…
a party
must,
without awaiting a discovery request, provide a copy of all documents,
electronically stored information
, and tangible things… in its control
that it may use to support its claim or defense
.
• • •
This is as far as many countries go!
Quebec Code of Civil Procedure, Art. 331.1 and 402 – 403; French Code of Civil Procedure, Article 753.
English Translation of Art. 753 of French Code of Civil Procedure:
• Pleadings shall set out expressly the claims of the parties as well as the issues of law and fact which are the basis of each claim.
A memorandum listing the documents in support of these claims shall be annexed to the pleadings.
– And that’s all you have to produce,
mon ami.
•
FRCP 26(b)
(b) DISCOVERY SCOPE AND LIMITS. … the scope of discovery is as follows:
• (1)
In General
. Parties may obtain discovery regarding any matter, not privileged,...
relevant to the claim or defense of any party
, including… any books,
documents...
This is much more than 26(a), and This is where the U.S. goes much further than most other jurisdictions.
The view from “over there” • “In the United Kingdom, extensive American-style discovery is viewed a cultural anomaly and a wasteful extravagance.
obtrusive.” Computer-based discovery is viewed as particularly
– Ken Withers, address at the University of Edinburgh, April 2001.
“exponentially greater volume”
From page 22 of the Commentary by the Rules Committee, Sept 2005
Electronic Data Volumes
• 1 MB = roughly • 1 GB = roughly 75 pages 75,000 pages • Therefore, 30 Gb = 2.25M pgs, • = 1000 boxes • = 250 lineal feet of five-tier shelves • = 50 file cabinets.
•QUIZ: A company with 10,000 employees generates 2.5 million e mail messages per: •___ Year? ____ Month? ___ Week?
“ On a single ten square inch hard drive, more data can be stored than would fit on the entire floor of a building.
” Arkfeld, Electronic Discovery and Evidence, p 1-9, quoting a 1999 article by Kimberly Richard in 21Whittier L.Rev. 463
We had PCs since the early 80’s • So why didn’t e-Discovery show up until the mid -1990’s?
The answer of course, is…
•
Connectivity -- the Internet!
• Until mid-90’s, computers were just tools to create paper documents. • Then very quickly, business switched to written communication
without paper
. – e-mail replaced paper (and fax). – By 2000, paper “a superfluous by-product.” – e-mail even replaced the telephone.
“Many informal messages that were previously relayed by telephone or at the water cooler are now sent via email.”
Byers v Illinois State Police (N.D. Ill, 2002)
The explosion of electronic data
• Over 95% of corporate documents are now electronic • Email has become indispensable
3.400 trillion
U.S. Corporate E-mail Volume Growth
Trillions • All electronic documents are discoverable • No more “I won’t ask if you won’t ask”.
They’re asking.
1.600
1.400
1.200
1.000
0.800
0.600
0.400
0.200
0.000
1999 2000 2001
Source: Wall Street Journal, January 10, 2000; IDC
The path
not
taken…
• The committee might easily have decided that broad scope was no longer tenable.
• Instead, they mostly preserved modern US style broad discovery • and recognized that technology, the source of the problem, is also the source of the solution …
FRCP 26(f): “Meet and Confer”
Rule 26. General Provisions Governing Discovery; Duty of Disclosure
• (f) to discuss any issues relating to
preserving
discoverable information …and to develop a proposed discovery plan concerning… – (3) discovery of electronically stored information
including the form/s in which it should be produced…
–
(
4) relating to
privilege or protection as trial preparation including asserting such claims after [inadvertent] production
The Stages of Discovery when it was Paper Collect
Physical collection (or delivery) of documents 1
Organize Review
Photocopy or Scan boxes , Bates number, track documents in or, by 90’s, code into database 2 Evaluate for production Decide relevance Decide privilege 3
Produce
Ultimate physical delivery of documents; receiving from other side 4
Except with Electronic Data, there’s also an earlier step
-- Preservation
Preserve
Ensure the electronic data you need is kept intact 0
Collect
How to copy/compile responsive ESI?
find / 1
Review Organize How to process ESI (and its much greater volume) so you can review it and utilize it?
2 How to review ESI (and its
much greater
volume?) 3
Obligation to Preserve (1)
Obligation to preserve (2):
The Stages of Discovery: the challenges when the information is
Electronic
Collect
How to find copy/compile responsive EDD?
/ 1
Organize Review Produce How to process EDD (and its much greater volume) so you can review it and utilize it?
2 How to review EDD (and its
much greater
volume?) 3 What is the best method for producing EDD? (and how would you like to receive it?) 4
The Stages of Discovery: Moving on from “Step Zero”, Preservation, to “Step 1”,
Collection Collect
How to find copy/compile responsive EDD?
/ 1
Organize Review How to process EDD (and its much greater volume) so you can review it and utilize it?
2 How to review EDD (and its
much greater
volume?) 3
Produce
What is the best method for producing EDD?
4
After you’ve collected the electronic data… •
“…remember, that’s all you’ve got at that point. A whole lot of messy electronic data.”
William Cwiklo, Panelist on Electronic Data Discovery, Glasser LegalWorks, Fairmont Hotel, San Francisco, February 1999.
Discovery Stage 2:
Organize
meaning
Process
that Electronic Data – it somehow to make it useable
Collect
How to find/ copy/compile responsive EDD?
1
Organize Review Produce
How to
process
E-Data (and its much greater volume) so you can review it and use it?
2 How to review EDD (and its much greater volume?) What is the best method for producing EDD?
3 4
The Options for
Processing
Electronic Data (1-2) • •
1. Print Everything:
Print out the entire collection (from native app) and review paper for relevancy.
2. Print->Scan->Code:
The “1997” model
“In the shift to a new medium, the content reflects the previous medium.”
-- Marshall McLuhan Example: the first ten years of television were visual radio. (Acknowledgment to Michelle Ostrom of Attenex.) http://www.mcluhan.utoronto.ca/mcluhanprojekt/allen2.htm
1997 processing: Print-Scan-Code; Electronic
to Paper
to Electronic
Paralegal/Word Processing Print out all files Paper Scanner (OCR) Coder Responsive review Results Litigation Database Production
The Options for
Processing
Electronic Data (3) Why “process” electronic data at all?
•
1. Print Everything:
Print out the entire collection (from native app) and review paper for relevancy.
•
2. Print->Scan->Code:
The “1997” model •
3. “Do Nothing”:
itself.
Review each custodian’s files in their Native format, and using the Native application software • So what’s wrong with “doing nothing”?
The No-Process “Do nothing” approach: Using Outlook to review Outlook •
No tagging, No annotating
•
No Redacting
•
Merely moving the data to another machine changes its appearance.
Using Outlook to review Outlook: “Advanced Find” •
Slow
•
Limited search flexibility
•
Responses are simply a listing of e-mails – can’t format reports
•
Will NOT search attachments
The Options for
Processing
Electronic Data (4) • • • •
1. Print Everything:
Print out the entire collection (from native app) and review paper for relevancy.
2. Print->Scan->Code:
The “1997” model
3. “Do Nothing”:
itself. Review each custodian’s files in their Native format, and using the Native application software
4. Convert (‘process’) electronic data to another electronic form better suited to reviewing:
Then review entire collection either with in-house litigation support software or on-line through an ASP Repository
Processing Electronic Data – Conversion to TIFF in the late 1990’s • Conversion of e-mails and e-docs to: a TIFF image, linked to indexed bibliographic information; with full text; and maintains parent/attachment relation.
A faster, cheaper way to convert e-data to the model we had gotten used to with paper – a database record linked to a scanned image.
By 1999, processing that Electronic Data meant
Converting
it to TIFF
Collect
How to find/ copy/compile responsive EDD?
1
Process Review Produce How to process E-Data so you can review it and use it? The answer for a while was Convert to TIFF
2 How to review EDD (and its much greater volume?) What is the best method for producing EDD?
3 4
But the volume kept growing!
3.400 trillion
U.S. Corporate E-mail Volume Growth
Trillions 1.600
1.400
1.200
1.000
0.800
0.600
0.400
0.200
0.000
1999 2000 2001
. . .
2004
Source: Wall Street Journal, January 10, 2000; IDC
Sedona Conference Search and Information Retrieval,
Principle 1: In litigation… where the volume of discoverable electronically stored information is large, it may not be feasible to perform human review of every document for responsiveness or privilege, and automated search and information retrieval methods and tools may be necessary and valuable.
This isn ’ t just a brainstorm of words and phrases.
Courts now expect automated
•
processes to identify responsive data
“A responding party may satisfy its good faith obligation to preserve and produce potentially responsive electronic data and documents by
such as data sampling, using electronic tools and processes, searching, or the use of selection criteria
, to identify data most likely to contain responsive information.” (emphasis added)
–
Zakre v. Norddeutsche Landesbank Girozentrale, 2004 WL 764895 (S.D.N.Y. Apr. 9, 2004)
adopting Sedona Principle 11 verbatim.
Automated tools in e-discovery
• De-duplication • Keywords and Boolean • • • •
Statistical Clustering Natural Language and fuzzy searching Concept search tools Taxonomies and Ontologies
“Search Engine” Software • Attenex • Autonomy • Cataphora • Dolphin Search • Engenium • Guidance • Stratify • Syngence
Electronic Data Discovery Processing
and
Review Choices Automated Statistical or Pattern Matching Attenex, Cataphora Dolphin Search, Stratify Ontologies Cataphora, H5, Metalincs Contextual Cataphora Linear Applied, Concordance, EED, FIOS, Kroll, Summation
Approaches to Data Organization
Context Concept Content
Analysis
documents with relationships among relevant people similarity of salient features generalized words or phrases
A Simple Ontology
• ROYALTY CONCEPT – Royalty – Commission – Honorarium – Usage Fee – Slice of the Pie
•
A More Realistic Ontology
ROYALTY CONCEPT
• royalty • royalties • rty • commission • commissions • comm.
• honorarium • honorariums • honoraria • usage fee • usage charge • usg fee • use fee • fee for use • fee for usage • incent* • insent* • charge for use • charged for use • charging for use • charges for use • licence fee • license fee • lisense fee • “take cut”~2 • “takes cut”~2 • “took cut”~2 • “slice pie”~5 • “piece pie”~5 • “piece action”~5 • “slice action”~5 • -king • -queen • -prince • -princess
Intake Data 100%
Reviewing the Right Data
Duplicates 25% Non-Responsive (NR)
Junk 20%
(Spam/Jokes/etc.) NR Business 20% NR Personal 20% Privileged 3% Relevant
&
Responsive 12%
Estimates: These figures vary based upon the data set received
Getting to Responsive Data
Keywords versus Ontologies Reviewable 1.575
“Responsive” to Keywords 0.842
Final Ontology Pass Responsive 0.109
All numbers in millions of items
Yet for all that breadth, keywords still miss vital documents!
8,553 responsive documents missed by keyword search
(Almost 8% of responsive documents missed by keyword search)
Cost and Time Savings
• Cost to review “keyword” docs: $2,526,000 • Cost to process, create ontologies and review docs found by them: $1,621,076 • Net cost savings: $904,924 • Keyword review time: Over 11 weeks • Ontology time: 6 weeks or less including both review and processing time
The end-product of
Processing
now
• Less data • Standardized so each e-mail, each attachment, each free-standing electronic file will have: – A “database record” linked to – The data itself in its native format – and/or with other renderings, a TIFF or PDF image.
• So no longer is it “a whole lot of messy electronic data”
But do it right, with the right people!
•
The EDD industry abounds with “me-too” newcomers.
EDD done here!
•
This guy can’t be your expert witness
The Evolution of Electronic Discovery Processing Print and Review 1995 AD Print, Scan, Code, Review 1997 TIFF And Review Circa 1999 Simple Filtering 2001 Keyword Searching 2002 Analytical Defensible Reliable Reduction, then Review, 2004-
Discovery Stage 3: Review
Collection
How to find/ copy/compile responsive EDD?
1
Organization
How to process EDD so you can review and utilize?
2
Review Production
How to review EDD (and its substantially greater volume?) 3 What is the best method for producing EDD?
4
“But I only trust humans looking at every document - it’s tried and true”
• • Full review is rarely as accurate as automated searching.
Humans make errors, get distracted, bored and tired.
• Typical human error rate is 25% And expense of human review of every document in dollars and time is prohibitive.
No manual review of millions of documents is cost-effective or accurate
After culling by whatever means, you’ve still got quite a lot.
Use computing power to enhance review Grouping data, multiple document decisions at once Workflow / QA can accelerate and improve quality
Why Context Is Important
In a hardcopy document, a prior
sentence paragraph
provides the context or In an e-mail, SMS/Text message, or IM, the
previous or next
message may provide context Today, a whole case could turn on… Let’s Do it! OK. Go Ahead!
Sure G2G, SLAP, WIIFM
Reviewing without Context
What “matter”?
“It”?
Is this document: • • • • Privileged Non-Responsive Relevant?
Incriminating?
Can’t really tell?
What’s “touchy?
Reviewing in Context
Discussion Time Line: One Month
Context Across Documents
Our bond offering has a cash shortfall. What shall we do?
Let’s issue more bonds to cover the shortfall.
Great idea. Let’s go ahead with it!
That’s illegal. Don’t even think about it.
Context Provides Meaning
• Let’s focus on the last two documents.
• Note that the word “bond” is not used.
• Nevertheless, these documents contain important evidence.
Great idea. Let’s go ahead with it!
That’s illegal. Don’t even think about it.
The Solution
• Review documents as a causally-related group, not in isolation.
Our bond offering has a cash shortfall. What shall we do?
Great idea. Let’s go ahead with it!
Let’s issue more bonds to cover the shortfall That’s illegal. Don’t even think about it.
Native File Review of a Discussion
Discovery Stage 4: Production
Collection
How to find/ copy/compile responsive EDD?
1
Organization
How to process EDD so you can review and utilize?
2
Review Production
How to review EDD (and its substantially greater volume?) 3 What is the best method for producing EDD?
4
FRCP 34
[amended, continued] •
Rule 34. Production of Documents, Electronically Stored Information, and Things and Entry Upon Land for Inspection and Other Purposes.
(b) PROCEDURE. … the request may specify the form in which the electronically stored information is to be produced….
…[the responding party may object] to the requested form, stating the reasons, and the form it intends to use [instead].
Let the Games Begin!
Possible Production Formats
• Paper (if the other side asks for it this way, be happy to oblige. It is the least useful format in which to receive a production.) • Paper-like (TIFF or PDF images) – TIFF images without any searchable data at all are increasingly unacceptable.
• Native Files • Hosted “production” areas of the producing party’s web repository.
Don’t forget, you’re in it to win it
• After production, you still have to work with all your data and everything the other side has produced to you; • Prepare for depositions, brief witnesses, prepare for trial, investigate and analyze • Any database allows searching, sorting, and basic reporting • – yawn, that’s so 1987.
We know who you are and who you’re talking to…
Traffic Analysis 1
Comparative Evidence Analytics
Technology has vastly improved how we practice law
1976 2006
Cliff Shnier, JD Director, Business Development Cataphora Inc Scottsdale, AZ 480-661-6183 [email protected]