Transcript Slide 1

Processing and Analyzing Electronic Data

Arizona Paralegal Association Phoenix, September 12, 2006 Cliff Shnier, JD Director, Business Development Cataphora Inc Scottsdale, AZ 480-661-6183 [email protected]

The New Rules: they’re He-e-e-e-ere!

– The Supreme Court approved the changes and transmitted them to Congress on April 12, 2006. – All that’s needed is the enabling legislation. – These rule changes affect Rules 16, 26, 33, 34, 37, 45 and Form 35.

FRCP 26(a) [amended]

Rule 26. General Provisions Governing Discovery; Duty of Disclosure (a) REQUIRED DISCLOSURES…

a party

must,

without awaiting a discovery request, provide a copy of all documents,

electronically stored information

, and tangible things… in its control

that it may use to support its claim or defense

.

• • •

This is as far as many countries go!

Quebec Code of Civil Procedure, Art. 331.1 and 402 – 403; French Code of Civil Procedure, Article 753.

English Translation of Art. 753 of French Code of Civil Procedure:

• Pleadings shall set out expressly the claims of the parties as well as the issues of law and fact which are the basis of each claim.

A memorandum listing the documents in support of these claims shall be annexed to the pleadings.

– And that’s all you have to produce,

mon ami.

FRCP 26(b)

(b) DISCOVERY SCOPE AND LIMITS. … the scope of discovery is as follows:

• (1)

In General

. Parties may obtain discovery regarding any matter, not privileged,...

relevant to the claim or defense of any party

, including… any books,

documents...

This is much more than 26(a), and This is where the U.S. goes much further than most other jurisdictions.

The view from “over there” • “In the United Kingdom, extensive American-style discovery is viewed a cultural anomaly and a wasteful extravagance.

obtrusive.” Computer-based discovery is viewed as particularly

– Ken Withers, address at the University of Edinburgh, April 2001.

“exponentially greater volume”

From page 22 of the Commentary by the Rules Committee, Sept 2005

Electronic Data Volumes

• 1 MB = roughly • 1 GB = roughly 75 pages 75,000 pages • Therefore, 30 Gb = 2.25M pgs, • = 1000 boxes • = 250 lineal feet of five-tier shelves • = 50 file cabinets.

•QUIZ: A company with 10,000 employees generates 2.5 million e mail messages per: •___ Year? ____ Month? ___ Week?

 “ On a single ten square inch hard drive, more data can be stored than would fit on the entire floor of a building.

”  Arkfeld, Electronic Discovery and Evidence, p 1-9, quoting a 1999 article by Kimberly Richard in 21Whittier L.Rev. 463

We had PCs since the early 80’s • So why didn’t e-Discovery show up until the mid -1990’s?

The answer of course, is…

Connectivity -- the Internet!

• Until mid-90’s, computers were just tools to create paper documents. • Then very quickly, business switched to written communication

without paper

. – e-mail replaced paper (and fax). – By 2000, paper “a superfluous by-product.” – e-mail even replaced the telephone.

“Many informal messages that were previously relayed by telephone or at the water cooler are now sent via email.”

Byers v Illinois State Police (N.D. Ill, 2002)

The explosion of electronic data

• Over 95% of corporate documents are now electronic • Email has become indispensable

3.400 trillion

U.S. Corporate E-mail Volume Growth

Trillions • All electronic documents are discoverable • No more “I won’t ask if you won’t ask”.

They’re asking.

1.600

1.400

1.200

1.000

0.800

0.600

0.400

0.200

0.000

1999 2000 2001

Source: Wall Street Journal, January 10, 2000; IDC

The path

not

taken…

• The committee might easily have decided that broad scope was no longer tenable.

• Instead, they mostly preserved modern US style broad discovery • and recognized that technology, the source of the problem, is also the source of the solution …

FRCP 26(f): “Meet and Confer”

Rule 26. General Provisions Governing Discovery; Duty of Disclosure

• (f) to discuss any issues relating to

preserving

discoverable information …and to develop a proposed discovery plan concerning… – (3) discovery of electronically stored information

including the form/s in which it should be produced…

(

4) relating to

privilege or protection as trial preparation including asserting such claims after [inadvertent] production

The Stages of Discovery when it was Paper Collect

Physical collection (or delivery) of documents 1

Organize Review

Photocopy or Scan boxes , Bates number, track documents in or, by 90’s, code into database 2 Evaluate for production Decide relevance Decide privilege 3

Produce

Ultimate physical delivery of documents; receiving from other side 4

Except with Electronic Data, there’s also an earlier step

-- Preservation

Preserve

Ensure the electronic data you need is kept intact 0

Collect

How to copy/compile responsive ESI?

find / 1

Review Organize How to process ESI (and its much greater volume) so you can review it and utilize it?

2 How to review ESI (and its

much greater

volume?) 3

Obligation to Preserve (1)

Obligation to preserve (2):

The Stages of Discovery: the challenges when the information is

Electronic

Collect

How to find copy/compile responsive EDD?

/ 1

Organize Review Produce How to process EDD (and its much greater volume) so you can review it and utilize it?

2 How to review EDD (and its

much greater

volume?) 3 What is the best method for producing EDD? (and how would you like to receive it?) 4

The Stages of Discovery: Moving on from “Step Zero”, Preservation, to “Step 1”,

Collection Collect

How to find copy/compile responsive EDD?

/ 1

Organize Review How to process EDD (and its much greater volume) so you can review it and utilize it?

2 How to review EDD (and its

much greater

volume?) 3

Produce

What is the best method for producing EDD?

4

After you’ve collected the electronic data… •

“…remember, that’s all you’ve got at that point. A whole lot of messy electronic data.”

William Cwiklo, Panelist on Electronic Data Discovery, Glasser LegalWorks, Fairmont Hotel, San Francisco, February 1999.

Discovery Stage 2:

Organize

meaning

Process

that Electronic Data – it somehow to make it useable

Collect

How to find/ copy/compile responsive EDD?

1

Organize Review Produce

How to

process

E-Data (and its much greater volume) so you can review it and use it?

2 How to review EDD (and its much greater volume?) What is the best method for producing EDD?

3 4

The Options for

Processing

Electronic Data (1-2) • •

1. Print Everything:

Print out the entire collection (from native app) and review paper for relevancy.

2. Print->Scan->Code:

The “1997” model

“In the shift to a new medium, the content reflects the previous medium.”

-- Marshall McLuhan Example: the first ten years of television were visual radio. (Acknowledgment to Michelle Ostrom of Attenex.) http://www.mcluhan.utoronto.ca/mcluhanprojekt/allen2.htm

1997 processing: Print-Scan-Code; Electronic

to Paper

to Electronic

Paralegal/Word Processing Print out all files Paper Scanner (OCR) Coder Responsive review Results Litigation Database Production

The Options for

Processing

Electronic Data (3) Why “process” electronic data at all?

1. Print Everything:

Print out the entire collection (from native app) and review paper for relevancy.

2. Print->Scan->Code:

The “1997” model •

3. “Do Nothing”:

itself.

Review each custodian’s files in their Native format, and using the Native application software • So what’s wrong with “doing nothing”?

The No-Process “Do nothing” approach: Using Outlook to review Outlook •

No tagging, No annotating

No Redacting

Merely moving the data to another machine changes its appearance.

Using Outlook to review Outlook: “Advanced Find” •

Slow

Limited search flexibility

Responses are simply a listing of e-mails – can’t format reports

Will NOT search attachments

The Options for

Processing

Electronic Data (4) • • • •

1. Print Everything:

Print out the entire collection (from native app) and review paper for relevancy.

2. Print->Scan->Code:

The “1997” model

3. “Do Nothing”:

itself. Review each custodian’s files in their Native format, and using the Native application software

4. Convert (‘process’) electronic data to another electronic form better suited to reviewing:

Then review entire collection either with in-house litigation support software or on-line through an ASP Repository

Processing Electronic Data – Conversion to TIFF in the late 1990’s • Conversion of e-mails and e-docs to:   a TIFF image, linked to indexed bibliographic information;    with full text; and maintains parent/attachment relation.

A faster, cheaper way to convert e-data to the model we had gotten used to with paper – a database record linked to a scanned image.

By 1999, processing that Electronic Data meant

Converting

it to TIFF

Collect

How to find/ copy/compile responsive EDD?

1

Process Review Produce How to process E-Data so you can review it and use it? The answer for a while was Convert to TIFF

2 How to review EDD (and its much greater volume?) What is the best method for producing EDD?

3 4

But the volume kept growing!

3.400 trillion

U.S. Corporate E-mail Volume Growth

Trillions 1.600

1.400

1.200

1.000

0.800

0.600

0.400

0.200

0.000

1999 2000 2001

. . .

2004

Source: Wall Street Journal, January 10, 2000; IDC

Sedona Conference Search and Information Retrieval,

Principle 1: In litigation… where the volume of discoverable electronically stored information is large, it may not be feasible to perform human review of every document for responsiveness or privilege, and automated search and information retrieval methods and tools may be necessary and valuable.

This isn ’ t just a brainstorm of words and phrases.

Courts now expect automated

processes to identify responsive data

“A responding party may satisfy its good faith obligation to preserve and produce potentially responsive electronic data and documents by

such as data sampling, using electronic tools and processes, searching, or the use of selection criteria

, to identify data most likely to contain responsive information.” (emphasis added)

Zakre v. Norddeutsche Landesbank Girozentrale, 2004 WL 764895 (S.D.N.Y. Apr. 9, 2004)

adopting Sedona Principle 11 verbatim.

Automated tools in e-discovery

• De-duplication • Keywords and Boolean • • • •

Statistical Clustering Natural Language and fuzzy searching Concept search tools Taxonomies and Ontologies

“Search Engine” Software • Attenex • Autonomy • Cataphora • Dolphin Search • Engenium • Guidance • Stratify • Syngence

Electronic Data Discovery Processing

and

Review Choices Automated Statistical or Pattern Matching Attenex, Cataphora Dolphin Search, Stratify Ontologies Cataphora, H5, Metalincs Contextual Cataphora Linear Applied, Concordance, EED, FIOS, Kroll, Summation

Approaches to Data Organization

Context Concept Content

Analysis

documents with relationships among relevant people similarity of salient features generalized words or phrases

A Simple Ontology

• ROYALTY CONCEPT – Royalty – Commission – Honorarium – Usage Fee – Slice of the Pie

A More Realistic Ontology

ROYALTY CONCEPT

• royalty • royalties • rty • commission • commissions • comm.

• honorarium • honorariums • honoraria • usage fee • usage charge • usg fee • use fee • fee for use • fee for usage • incent* • insent* • charge for use • charged for use • charging for use • charges for use • licence fee • license fee • lisense fee • “take cut”~2 • “takes cut”~2 • “took cut”~2 • “slice pie”~5 • “piece pie”~5 • “piece action”~5 • “slice action”~5 • -king • -queen • -prince • -princess

Intake Data 100%

Reviewing the Right Data

Duplicates 25% Non-Responsive (NR)

Junk 20%

(Spam/Jokes/etc.) NR Business 20% NR Personal 20% Privileged 3% Relevant

&

Responsive 12%

Estimates: These figures vary based upon the data set received

Getting to Responsive Data

Keywords versus Ontologies Reviewable 1.575

“Responsive” to Keywords 0.842

Final Ontology Pass Responsive 0.109

All numbers in millions of items

Yet for all that breadth, keywords still miss vital documents!

8,553 responsive documents missed by keyword search

(Almost 8% of responsive documents missed by keyword search)

Cost and Time Savings

• Cost to review “keyword” docs: $2,526,000 • Cost to process, create ontologies and review docs found by them: $1,621,076 • Net cost savings: $904,924 • Keyword review time: Over 11 weeks • Ontology time: 6 weeks or less including both review and processing time

The end-product of

Processing

now

• Less data • Standardized so each e-mail, each attachment, each free-standing electronic file will have: – A “database record” linked to – The data itself in its native format – and/or with other renderings, a TIFF or PDF image.

• So no longer is it “a whole lot of messy electronic data”

But do it right, with the right people!

The EDD industry abounds with “me-too” newcomers.

EDD done here!

This guy can’t be your expert witness

The Evolution of Electronic Discovery Processing Print and Review 1995 AD Print, Scan, Code, Review 1997 TIFF And Review Circa 1999 Simple Filtering 2001 Keyword Searching 2002 Analytical Defensible Reliable Reduction, then Review, 2004-

Discovery Stage 3: Review

Collection

How to find/ copy/compile responsive EDD?

1

Organization

How to process EDD so you can review and utilize?

2

Review Production

How to review EDD (and its substantially greater volume?) 3 What is the best method for producing EDD?

4

“But I only trust humans looking at every document - it’s tried and true”

• • Full review is rarely as accurate as automated searching.

Humans make errors, get distracted, bored and tired.

• Typical human error rate is 25% And expense of human review of every document in dollars and time is prohibitive.

No manual review of millions of documents is cost-effective or accurate

 After culling by whatever means, you’ve still got quite a lot.

 Use computing power to enhance review  Grouping data, multiple document decisions at once  Workflow / QA can accelerate and improve quality

Why Context Is Important

In a hardcopy document, a prior

sentence paragraph

provides the context or In an e-mail, SMS/Text message, or IM, the

previous or next

message may provide context Today, a whole case could turn on… Let’s Do it! OK. Go Ahead!

Sure G2G, SLAP, WIIFM

Reviewing without Context

What “matter”?

“It”?

Is this document: • • • • Privileged Non-Responsive Relevant?

Incriminating?

Can’t really tell?

What’s “touchy?

Reviewing in Context

Discussion Time Line: One Month

Context Across Documents

Our bond offering has a cash shortfall. What shall we do?

Let’s issue more bonds to cover the shortfall.

Great idea. Let’s go ahead with it!

That’s illegal. Don’t even think about it.

Context Provides Meaning

• Let’s focus on the last two documents.

• Note that the word “bond” is not used.

• Nevertheless, these documents contain important evidence.

Great idea. Let’s go ahead with it!

That’s illegal. Don’t even think about it.

The Solution

• Review documents as a causally-related group, not in isolation.

Our bond offering has a cash shortfall. What shall we do?

Great idea. Let’s go ahead with it!

Let’s issue more bonds to cover the shortfall That’s illegal. Don’t even think about it.

Native File Review of a Discussion

Discovery Stage 4: Production

Collection

How to find/ copy/compile responsive EDD?

1

Organization

How to process EDD so you can review and utilize?

2

Review Production

How to review EDD (and its substantially greater volume?) 3 What is the best method for producing EDD?

4

FRCP 34

[amended, continued] •

Rule 34. Production of Documents, Electronically Stored Information, and Things and Entry Upon Land for Inspection and Other Purposes.

(b) PROCEDURE. … the request may specify the form in which the electronically stored information is to be produced….

…[the responding party may object] to the requested form, stating the reasons, and the form it intends to use [instead].

Let the Games Begin!

Possible Production Formats

• Paper (if the other side asks for it this way, be happy to oblige. It is the least useful format in which to receive a production.) • Paper-like (TIFF or PDF images) – TIFF images without any searchable data at all are increasingly unacceptable.

• Native Files • Hosted “production” areas of the producing party’s web repository.

Don’t forget, you’re in it to win it

• After production, you still have to work with all your data and everything the other side has produced to you; • Prepare for depositions, brief witnesses, prepare for trial, investigate and analyze • Any database allows searching, sorting, and basic reporting • – yawn, that’s so 1987.

We know who you are and who you’re talking to…

Traffic Analysis 1

Comparative Evidence Analytics

Technology has vastly improved how we practice law

1976 2006

Cliff Shnier, JD Director, Business Development Cataphora Inc Scottsdale, AZ 480-661-6183 [email protected]