The Zen of Data Science Eugene Dubossarsky Chief Data Scientist – Principal Founder –

Download Report

Transcript The Zen of Data Science Eugene Dubossarsky Chief Data Scientist – Principal Founder –

The Zen of Data Science
Eugene Dubossarsky
Chief Data Scientist –
Principal Founder –
Presentation Summary - Promised
-Key concepts, dos and don'ts of Data Science
-Science and engineering : very different!
- What are Data Scientists for?
- Where should Data Science sit in the business?
- How should data science be measured, managed,
planned?
- Starting, nourishing and growing a successful Data
Science function in your business skills and
experience
- Becoming an effective data scientist
Presentation Summary – But
Actually More Like...

Shameless self promotion

Parables

Metaphors

Abstract Philosophical Stuff

Surprises

Challenges and Reframes

You saying “This is relevant to my life
how?”
Presentation Summary

Tools vs Ideas – Science vs Technology

Finding vs Building – Science and Engineering

Engagement

Exploration – a legitimate, vital and strategic
business activity

Intelligence – a business function

Mastery

Apprenticeship
The “Zen” bit

The bare essence

The kernel of truth

The thing that isn't illusion

The way (Tao) to enlightenment (Satori)

Clarity and simplicity derived from meditation,
possibly quite different to everyday experience
Parable 1: Getting Airports Wrong

Everybody thinks that this is an airplane:
Parable 1: Getting Airports Wrong



Imagine your job is to build an Airport
You need to take the design of airplanes in to
account.
The only problem is:
Parable 1: Getting Airports Wrong


This is what is called a “fundamental category
error”. Anything done with this misconception
in place will be a waste of time, money and
resources.
“Working around it”, and “being realistic about
the client's expectations” is a bit beside the
point.
Parable 1: Getting Airports Wrong

Most people probably want to focus on the
aerodynamics of the “airplane” as currently
conceived, the buzz around technology to
support such “airplanes” and may see this as
being “business focused”, while more
fundamental discussions would be seen as
“negative”, “academic” or too “challenging”.
Parable 1: Getting Airports Wrong

Nevertheless, getting the fundamental issue
sorted out would seem to be the first order of
business, no matter how abstract,
controversial, politically inconvenient or
offensive to some quarters, or how many
people have built careers managing, selling
and practicing in this paradigm.
Parable 1: Getting Airports Wrong

Because... Uh.. Donkey ?
Data, Science, Tools and
Definitions

Data Scientist =

“Hadoop Guy” ?

“Guy Who Does Stuff with Data” ?

Guy Who Does Stuff with Lots of Data ?

Guy Who Does Stuff with Big Data ?

Guy Who Does Stuff With Big Data That
Sounds Cool or Businessy?
(And what makes Data “Big” anyway?)
Science and Engineering

Is there a difference ?

What is it ?

What is a “Data Engineer” ?

What is a “non-Data Engineer” ?
Science and Engineering



Are actually direct opposites
Skills, positioning, personality types,
appropriate management frameworks and
place in the business are quite different.
The confusion needs sorting out.
Science and Engineering
Now I've Lost You...


That's not “realistic” - most “data scientists” are
actually “engineers” by this framework !
That sounds too “technical”, “academic” or not
“relevant to business”
Now I've Lost You...

That's not “realistic” - most “data scientists” are
actually “engineers”
Yep.

That sounds too “technical”, “academic” or “not
relevant to business”
Maybe, Too Bad and No
Engineering



Start with an identified idea, end with a design
Build or maintain something to pre-defined
parameters
Uncertainty is the enemy (time, budget,
resources, performance)
Engineering




Plans, Timeframes and Specifications, vs
ongoing (loosely focused) discussion
Delivers Products and pre-determined
KPIs. The Unexpected is a (usually
unwelcome) exception
Works to milestones and a specification
Engaged with operational and technical
management
Engineers




Outcomes are Things
An Engineer may do more or less the same
thing many times
An Engineer performs “projects” and manages
“processes”
An engineer is managed according to tight
requirements
Engineers

easier to identify

easier to manage

easier to understand

less stressful to deal with

Easier to train

more plentiful

easier to recruit
Engineers And Data



Data is a resource to move and manipulate
Focus is on building and maintaining
processes that do that
Data is a “commodity” that flows through the
system. The focus is on the system.
Science and Scientists

Start with reality - derive new insights

Uncertainty is your job

“Projects” and “processes” are anathema, and
people who manage them don't help

Explore and Interrogate Data

No two jobs are the same

No job can be specified too tightly

Findings are inherently uncertain, otherwise
why bother ?
Scientists and Data

Focused on The Data.

Tools help but don't feature.


Data is complex, an undiscovered country to
explore.
Data is not a commodity : it is complex, everchanging and information rich
Scientists and The CEO


Data is “The Last Frontier”, where dangers
lurk and opportunities abound. The
scientist is the guide.
Objective is to Tell the Story of the Data, to
someone who cares and matters (ideally
CEO), preferably as part of an ongoing
conversation
Science and Engineering

Scientists help you identify new risks and
opportunities, they provide transformational
insights.

Engineers make transformations tangible

Scientists explore

Engineers deliver and maintain

The personality types are actually quite
different
Science and Engineering

There is a lot of crossover

It is good to be skilled in both

Many of the tools used are the same

The distinction is not obvious to most outsiders

The distinction is crucial
Why the Confusion?

It's all “technical”, apparently

It has the word “data” in it.

Some vendors like it that way.

Much of management likes it that way.

Much of management is out of its depth

And almost all of HR and recruting
.
Science and Engineering



Real Business Needs Both
Pretend Business only needs Engineering
(and maybe not even that)
Science is crucial for real competition and
risk

Science is irrelevant otherwise

Engineering is Delivery

Science is Intelligence
The Intelligence Function – Where
Data Science Should Sit in the
Business?

Absent in most “enterprises”

Present informally in most real businesses


A strategic, secret asset not to be bragged
about or shared
“Data” is not just structured, electronic,
concerete or even conscious
The Intelligence Function






Strategic, secret role
Trusted, discreet, low-key advisor, mentor,
guide
A mix of Mr Spock, James Bond and Steve
Jobs
May guises, many names
Well understood by militaries at war, and
organisations with real challenges, risks and
uncertainty
Often next in line for CEO
The Intelligence Function – Where
Data Science Should Sit in the
Business

Not IT

Not Operations

Right near the CEO

Reporting directly, discretely, interactively


Not managed by Prince2, waterfall or any
other “project management” or “Business
Analysis” methods
Lean Startup, real Agile (see Manifesto) and
OODA loop much more like it
Data Science and Analytics Today

Insights or Process ?

Tools or Outcomes ?

Transformation or BAU ?

Value or Compliance ?

Asset or Vanity ?

Engaged or Disengaged ?

Measured ?
Data Science and Analytics Today

Insights or Process ?

Tools or Outcomes ?

Transformation or BAU ?

Value or Compliance ?

Asset or Vanity ?

Engaged or Disengaged ?

Measured ?
Insights vs Process

Insights CANNOT be the same each time.

But Much of “Analytics” can


Deriving value from predictive targeting is a
repeatable, mechanical process.
Deriving value from insights derived from the
same model is not.
Insights vs Process


Only one requires a scientist.
Only one is valued by businesses that don't
have real competitive, environmental and
other change pressures.
Data Science and Analytics Today

Insights or Process ?

Tools ?

Transformation or BAU ?

Value or Compliance ?

Asset or Vanity ?

Engaged or Disengaged ?

Measured ?
Tools and Trinkets

Is “Hadoop” really the most important thing on
a “data scientist's resume ?

Why or why not ?

What is missing ?
Data Science and Analytics Today

Insights or Process ?

Tools ?

Transformation or BAU ?

Value or Compliance ?

Asset or Vanity ?

Engaged or Disengaged ?

Measured ?
Data Science and Analytics Today

Insights or Process ?

Tools or Science ?

Transformation or BAU ?

Value or Compliance ?

Asset or Vanity ?

Engaged or Disengaged ?

Measured ?
Data Science and Analytics Today

Insights or Process ?

Tools or Science ?

Transformation or BAU ?

Value or Compliance ?

Asset or Vanity ?

Engaged or Disengaged ?

Measured ?
Data Science and Analytics Today

Insights or Process ?

Tools or Science ?

Transformation or BAU ?

Value or Compliance ?

Vital Asset or Vanity ?

Engaged or Disengaged ?

Measured ?
Value, Compliance or Vanity ?

What would happen to the business if the
analytics/data science/data mining function
disappered overnight ?

Who would care ?

Why ?


Why does the function exist in the business in
the first place ?
Science does not serve vanity well, and is
not necessary for compliance.
Data Science and Analytics Today

Insights or Process ?

Tools or Science ?

Transformation or BAU ?

Value or Compliance ?

Vital Asset or Vanity ?

Leadership Engaged or Disengaged ?

Measured ?
Engagement in Parables


Is investing in data analytics like investing in
stocks or investing in an education (or gym
membership) ?
If analytics was a taxi, does the CEO think the
analytics function are car mechanics, drivers
or tour guides, does he know, does he care ?
Engagement in Extremes

Analytics in a hedge fund

Analytics in a bank

Basel II compliance analytics in a bank

What are the KPIs ?

Does the CEO personally care about them ?


Can the organisation do without the analytics
function ?
Can the organisation afford the CEO ignoring
the analytics function ?
Data Science and Analytics Today

Insights or Process ?

Tools or Science ?

Transformation or BAU ?

Value or Compliance ?

Vital Asset or Vanity ?

Leadership Engaged or Disengaged ?

Measured ?
Measurement


How many predictive analytics function in
banking, telco, insurance etc are measured
explicitly on improvement in predictive
accuracy, with the CEO keeping an eye on this
(retention, acquisition, risk, pricing models) ?
How many know/care about the predictive
accuracy of their competitors ?
Finding Training and Managing Data
Scientists
Not Easy
Finding Data Scientists




Data Scientists are part engineer, part
enterpreneur and part hunter/gatherer –
outcome focused explorers !
ADHD is an asset, personality profile is not
typical corporate
Communication skills and lateral thinking as
important as technical skill
Technical skills are DEEEEP, eclectic
Finding Data Scientists

Most severely recruiters out of their depth

Ditto most HR


The best people are un-/under-/mis-employed
!
It takes one to know one
Training Data Scientists

Eclectic skill set

Hard Skills


Stats/Machine
Learning/Computing/Psychology

Domain expertise
Many “soft skills”

Conceptual

Communication

Science !

Agile/Lean Startup/Cynefin/OODA
Training Data Scientists

Experience is crucial

Mistakes are valuable

Apprenticeship is Key !

Courses help, but not a substitute. Won't teach
the soft skills and conceptual outlook
Managing Data Scientists




Yes: Real Agile, Lean Startup, Cynefin, OODA
loop
No: PRINCE2, Project Management,
“Business Analysis”, Operational
Management, the IT function.
Yes: someone who is engaged, empowered,
interested.
No: Just about everyone actually doing this out
there...
So Who Needs Data Scientists?

Businesses facing real competition, real
threats, real uncertainty and real change.
Who Doesn't Really Need Data
Scientists ?

Everyone Else.