Volume - France in the United Kingdom

download report

Transcript Volume - France in the United Kingdom

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Big Ideas in Big Data?

French-British Workshop on Big Data - London, November 2012 Monica Marinucci Director of Research, Oracle Global Education & Research Industry Unit Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Big Data in Research: Volume

Exponential growth in data and the ability to access critical information

Volume

Very large quantities of data

The volume of worldwide

climate data

is expanding rapidly, creating challenges for both physical archiving and sharing, for ease of access of relevant information in a multidisciplinary environment

Evolution of ESA's EO Data Archives between 1986-2007 and future estimates (up to 2020) 22000 21000 20000 19000 18000 17000 16000 15000 14000 13000 12000 11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 1986 1989 1993 1995 1998 2000 2003 2005 2007 2015 2020 Year Future Data Estimates LANDSAT 2-4 MSS (75-Dec 93)

The volume of

ENVISAT HR (March 02-today) TERRA Modis (June 01-today) QUICK SCATT (01-today) /PROBA (May 02-today) LANDSAT 7 ETM (April 99-Dec 03) earth-observation SEA STAR SeaWifs (Apr 98-today) ERS 2 HR (May 95-today) ERS 2 LBR (May 95-today)

satellites passed

3PB ERS 1 HR (Jul 91-Mar 00) ERS 1 LBR (Jul 91-Mar 00) SPOT 1-4 HRV (87-today)

the projection for

2020 NOAA 9-17 AVHRR (86-today) LANDSAT 5 TM (April 84-today) NIMBUS 7 (Nov 78-May 86), SEASAT (Jun-Oct 78)

in

2007

is and

seven-fold

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Big Data in Research: Velocity

Rapid growth in speed of data generation The LOFAR Radio-Interferometre is producing 1.6TB/sec  setting new frontiers for

radio-astronomy

Velocity

Extremely fast streams of data © CERN

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

In

high energy physics

, the Large Hadron Collider generates 60TB of data per day

Big Data in Research: Variety

Enterprise infrastructure ability to quickly accommodate new data sources

© CERN

The proposed Large Synoptic Survey Telescope will record 30 trillion bytes of

image

data every day

© CERN

Variety

Wide range of data type characteristics

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

In

genomics

on average scientists can fully sequence 167 individuals per week, generating 250GB of images or

200 movie

files

Big Data in Research: Value

Ability to translate raw data into information and knowledge

Value

High potential value if harnessed correctly

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

In

genomics

the cost of sequencing is

dropping by 50% every 5 months

“…

analysis, not sequencing

, will be the main expense hurdle”

(

Chris Ponting , University of Oxford, UK in Feb 2011 Article “Will Computers crash Genomics?”)

Materials Science: Nanotube composites Nature 447 http://www.bcu.ac.uk/elss

New Frontiers in silico

• (Extremely) Large Data Volumes Storage Access Metadata Exascale computing The Carleton Wind Turbine http:// http://onlyhdwallpapers.com

• Global Collaborations Data sets integration Large scale simulations & modeling Context based Visualisation • Cross-Discipline Research Cross-breeding of technology and innovative methods inspired by new collaborations and exchange of methods and approaches http://compbio.cs.toronto.edu/l Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle Labs

• To look for novel approaches and methodologies • To focus on real-world outcomes: to develop technologies that will someday play a significant role in the evolution of technology and society. • 4 main areas: • Exploratory research • Directed research • Consulting • Product incubation Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Erasmus Medical Centre

• Complex data processing and analysis.

• Ability to • load huge data information in minimum time • store these data and their genomic DNA research results on storage disk • have an efficient system able to give them query performance • • • • Thanks to an Exadata-based solution, Erasmus Medical Centre achieved:

For a 11 minute query, Exadata could improve it to 1 second,

have immediate results which is a major advantage for researchers to Smart Scan and Flash Card : give performance in analyzing data. Hybrid Columnar Compression : gives performance in the ability to manipulate Tb of data (

133 Gb to 11 Gb

), with increased performance.

compression from

Adding Oracle Database 11g features like partitioning gives more performance in manipulating, quantifying data obtained through the study of various genomes More information in the Press Release:

Erasmus Medical Center employs Oracle Exadata for DNA research

https://emeapressoffice.oracle.com/Press-Releases/Erasmus-Medical-Center-employs-Oracle-Exadata-for-DNA-research-1a0e.aspx Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Visualisation

PCA CLUSTERS HEATMAP

Courtesy of Prof. Peter van der Spek, Erasmus Medical Centre

CHEMICAL STRUCTURES CHROMOSOMES How is every record related to every other?

BRAIN ATLAS What is the range and distribution of values?

PATIENT CORRELATION What is the range and distribution of values?

PATHWAY NETWORKS What is the range and distribution of values?

DNA, RNA & PROTEIN SEQUENCING DATA Ref: Allele1 What are the major themes or concepts?

How are the numeric attributes correlated?

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

What are the supported regulatory relationships?

Allele2 What is the underlying natural sequence variation?

Innovating with …

© CERN

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

… however …

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Q&A Thank you

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.