MyTarget.com - University of Rhode Island

Download Report

Transcript MyTarget.com - University of Rhode Island

Databases as Analytical
Engines for Drug Discovery
Susie Stephens
Principal Product Manager, Life Sciences
Oracle Corporation
[email protected]
Outline
 Data Challenges
 Case Studies
 Summary
Access Distributed Data
External Sites
UltraSearch
Distributed query
Flat files
External
Table
Sybase
MySQL
Generic
Connectivity
DBlinks
Transparent
Gateway
SRS
Transparent
Gateway
Integrate a Variety of Data Types







XML
CLOBs
XML
Text
Images
Video
Relational
Users Defined
Objects








Nucleotide Sequences
Gene Expression Data
Papers
Cell Histology Images
Protein Folding Video
SwissProt
KEGG
Chemical Structures
Manage Vast Quantities of Data






Partitioning
Oracle Data Guard
Real Application Clusters (RAC)
Automated Storage Management
Adaptive Instance Tuning
Automated Application and SQL
Tuning
 Automated Database Diagnostic
Monitor (ADDM)
 Scheduling
50TB
40TB
30TB
20TB
10TB
0
Collaborate Securely
 Integrated
communications
 Single enterprise
search
 Flexible access
 Fine grained access
control
 Auditing
 Workflow
 Personalized portal
Find Patterns and Insights
 Oracle Data Mining
–
Find relationships & clusters
 Oracle Discoverer & Oracle
OLAP
–
Interactive query & drill-down
 Statistics
–
mean, stdev, median, correlations, linear
regression
 Oracle Text
–
Cluster & Classify documents of interest
 Table Functions
–
Implement complex algorithms within the
database
Outline
 Data Challenges
 Case Studies
 Summary
Regular Expression Searches
• A powerful method of describing both simple &
•
•
•
•
•
complex patterns for searching & manipulating
A multilingual regular expression support for SQL &
PL/SQL string types
Follows POSIX style Regexp syntax
Support standard Regexp operators
Includes common extensions such as case-insensitive
matching, sub-expression back-references, etc.
Compatible with popular Regexp implementations like
GNU, Perl, Awk
Case Study: Retrieve Protein Data
from SGD using Regular Expressions
Case study courtesy of Prolexys Pharmaceuticals, Inc.
HTTP Raw Data
</script>
</head><body><body bgcolor='#FFFFFF'>
<table cellpadding="2" width="100%" cellspacing="0" border="0"><tr><td colspan="4"><hr width="100%" /></td></tr><tr><td valign="middle" align="right"><a
href="http://www.yeastgenome.org/"><img alt="SGD" border="0" src="http://www.yeastgenome.org/images/SGD-to.gif" /></a></td><th valign="middle"
nowrap="1">Quick Search:</th><td valign="middle" align="left"><form method="post" action="http://db.yeastgenome.org/cgi-bin/SGD/search/quickSearch"
enctype="application/x-www-form-urlencoded">
<input type="text" name="query" size="13" /><input type="submit" name="Submit" value="Submit" />
</form></td><th valign="middle" align="left"><a href="http://www.yeastgenome.org/sitemap.html">Site Map</a> | <a
href="http://www.yeastgenome.org/HelpContents.shtml">Help</a> | <a href="http://www.yeastgenome.org/SearchContents.shtml">Full Search</a> | <a
href="http://www.yeastgenome.org/">Home</a></th></tr><tr><td align="left" colspan="4"><table cellpadding="1" width="100%" cellspacing="0" border="0"><tr
align="center" bgcolor="navajowhite"><td><font size="-1"><a href="http://www.yeastgenome.org/ComContents.shtml">Community Info</a></font></td><td><font
size="-1"><a href="http://www.yeastgenome.org/SubmitContents.shtml">Submit Data</a></font></td><td><font size="-1"><a href="http://seq.yeastgenome.org/cgibin/SGD/nph-blast2sgd">BLAST</a></font></td><td><font size="-1"><a href="http://seq.yeastgenome.org/cgi-bin/SGD/web-primer">Primers</a></font></td><td><font
size="-1"><a href="http://seq.yeastgenome.org/cgi-bin/SGD/PATMATCH/nph-patmatch">PatMatch</a></font></td><td><font size="-1"><a
href="http://db.yeastgenome.org/cgi-bin/SGD/seqTools">Gene/Seq Resources</a></font></td><td><font size="-1"><a href="http://www.yeastgenome.org/Vlyeast.shtml">Virtual Library</a></font></td><td><font size="-1"><a href="http://db.yeastgenome.org/cgi-bin/SGD/suggestion">Contact
SGD</a></font></td></tr></table></td></tr><tr><td colspan="4"><hr width="100%" /></td></tr></table><table cellpadding="0" width="100%" cellspacing="0"
border="0"><tr><td width="10%"><br /></td><td valign="middle" align="center" width="80%"><h1>Sequence for a region of YDR099W/BMH2</h1></td><td
valign="middle" align="right" width="10%"></td></tr></table><p /><center><a target="infowin" href="http://db.yeastgenome.org/cgi-bin/SGD/suggestion">Send
questions or suggestions to SGD</a></center><p /><p /><center><a target="infowin" href="http://seq.yeastgenome.org/cgi-bin/SGD/nphblast2sgd?name=YDR099W&amp;suffix=prot">BLAST search</a> | <a target="infowin" href="http://seq.yeastgenome.org/cgi-bin/SGD/nphfastasgd?name=YDR099W&amp;suffix=prot">FASTA search</a></center><p /><center><hr width="35%" /></center><p /><font color="FF0000"><strong>Protein
translation of the coding sequence.</strong></font><p /><p />Other Formats Available: <a href="http://db.yeastgenome.org/cgibin/SGD/getSeq?map=pmap&amp;seq=YDR099W&amp;flankl=0&amp;flankr=0&amp;rev=">GCG</a><pre>>YDR099W Chr 4
MSQTREDSVYLAKLAEQAERYEEMVENMKAVASSGQELSVEERNLLSVAYKNVIGARRAS
WRIVSSIEQKEESKEKSEHQVELIRSYRSKIETELTKISDDILSVLDSHLIPSATTGESK
VFYYKMKGDYHRYLAEFSSGDAREKATNSSLEAYKTASEIATTELPPTHPIRLGLALNFS
VFYYEIQNSPDKACHLAKQAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDISES
GQEDQQQQQQQQQQQQQQQQQAPAEQTQGEPTK*
</pre><hr size="2" width="75%">
<table width="100%"><tr><td valign="top" align="left"><a href="http://www.yeastgenome.org/"><img border="0"
src="http://www.yeastgenome.org/images/arrow.small.up.gif" />Return to SGD</a></td><td valign="bottom" align="right"><form method="post"
action="http://db.yeastgenome.org/cgi-bin/SGD/suggestion" enctype="application/x-www-form-urlencoded" target="infowin" name="suggestion">
<input type="hidden" name="script_name" value="/cgi-bin/SGD/getSeq" /><input type="hidden" name="server_name" value="db.yeastgenome.org" /><input
type="hidden" name="query_string" value="seq=YDR099W&amp;flankl=0&amp;flankr=0&amp;map=p3map" /><a
href="javascript:document.suggestion.submit()">Send a Message to the SGD Curators<img border="0" src="http://www.yeastgenome.org/images/mail.gif" /></a>
</form></td></tr></table></body></html>
Function to Parse out AA Sequence
create or replace function orf2seq (
p_orf in varchar2
) return varchar2 is
v_stream clob;
strt number;
begin
-- Retrieve the HTTP stream:
v_stream := httpuritype.getclob(httpuritype.createuri(
'http://db.yeastgenome.org/cgi-bin/SGD/getSeq?seq='||p_orf||
'&flankl=0&flankr=0&map=p3map')
);
-- Trim off the head of the stream:
strt := dbms_lob.instr(v_stream, 'Submit', 1, 1);
-- Strip out control characters, new lines, etc.:
v_stream := regexp_replace(dbms_lob.substr(v_stream, 4000, strt), '[[:cntrl:]]', '');
-- Return the AA sequence:
return(regexp_substr(dbms_lob.substr(v_stream, 4000, strt), '[[:upper:]]{10,}'));
end;
AA Sequence for ORF ‘YDR099W’
SQL> select orf2seq('YDR099W') from dual;
ORF2SEQ('YDR099W')
-------------------------------------------------------------------------------MSQTREDSVYLAKLAEQAERYEEMVENMKAVASSGQELSVEERNLLSVAYKNVIG
ARRASWRIVSSIEQKEESKEKSEHQVELIRSYRSKIETELTKISDDILSVLDSHLIPSA
TTGESKVFYYKMKGDYHRYLAEFSSGDAREKATNSSLEAYKTASEIATTELPPTHPI
RLGLALNFSVFYYEIQNSPDKACHLAKQAFDDAIAELDTLSEESYKDSTLIMQLLRD
NLTLWTSDISESGQEDQQQQQQQQQQQQQQQQQAPAEQTQGEPTK
Elapsed: 00:00:01.24
SQL> insert into pseq (orf_id, sequence)
2 values ('YDR099W', orf2seq('YDR099W'));
Case Study: Motif Searching in
Proteins
PROSITE database of protein sequence motifs
ID TYR_PHOSPHO_SITE; PATTERN
AC PS00007
DT APR-1990 (CREATED); APR-1990 (DATA UPDATE); APR-1990 (INFO UPDATE)
DE Tyrosine kinase phosphorylation site
PA [RK]-x(2,3)-[DE]-x(2,3)-Y
CC /TAXO-RANGE=??E?V; CC /SITE=5,phosphorylation
CC /SKIP-FLAG=TRUE
DO PDOC00007
Source: http://www.expasy.org/prosite/ps_frequent_patterns.txt


TKP Pattern: [RK]-x(2,3)-[DE]-x(2,3)-Y
– R=Arginine, K=Lysine, D=Aspartate, E=Glutamate, Y=Tyrosine, x=any AA
Oracle10g Regular Expression Equivalent
– [RK].{2,3}[DE].{2,3}[Y]
Case study courtesy of Prolexys Pharmaceuticals, Inc.
SQL to Retrieve All Proteins
Interacting with TKP
select distinct
substr(a.refseq_id, 1, 9) refseq_id,
length(a.seq_string_varchar) seq_length,
regexp_instr(a.seq_string_varchar, '[RK].{2,3}[DE].{2,3}[Y]', 1, 1) motif_offs1,
regexp_instr(a.seq_string_varchar, '[RK].{2,3}[DE].{2,3}[Y]', 1, 2) motif_offs2,
regexp_instr(a.seq_string_varchar, '[RK].{2,3}[DE].{2,3}[Y]', 1, 3) motif_offs3,
regexp_instr(a.seq_string_varchar, '[RK].{2,3}[DE].{2,3}[Y]', 1, 4) motif_offs4
from
target_db a,
y2h_interaction_p b
where
a.refseq_id like 'NP%'
and regexp_like(a.seq_string_varchar, '[RK].{2,3}[DE].{2,3}[Y]')
and (substr(a.refseq_id,1,9) = b.bait_refseq or substr(a.refseq_id,1,9) =
b.prey_refseq)
;
Query Results
REFSEQ_ID SEQ_LENGTH MOTIF1_OFFS MOTIF2_OFFS MOTIF3_OFFS MOTIF4_OFFS
------------ ---------- ----------- ----------- ----------- ----------NP_003961
1465
14
202
347
537
NP_003968
330
241
0
0
0
NP_003983
490
8
50
62
93
NP_004001
3562
3085
0
0
0
...
MHHCKRYRSPEPDPYLSYRWKRRRSYSREHEGRLRYPSRREPPPRRSRSRSHDRLPYQRRY
RERRDSDTYRCEERSPSFGEDYYGPSRSRHRRRSRERGPYRTRKHAHHCHKRRTRSCSSAS
SRSQQSSKRTGRSVEDDKEGHLVCRIGDWLQERYEIVGNLGEGTFGKVVECLDHARGKSQVAL
KIIRNVGKYREAARLEINVLKKIKEKDKENKFLCVLMSDWFNFHGHMCIAFELLGKNTFEFLKENN
FQPYPLPHVRHMAYQLCHALRFLHENQLTHTDLKPENILFVNSEFETLYNEHKSCEEKSVKNTSI
RVADFGSATFDHEHHTTIVATRHYRPPEVILELGWAQPCDVWSIGCILFEYYRGFTLFQTHENRE
HLVMMEKILGPIPSHMIHRTRKQKYFYKGGLVWDENSSDGRYVKENCKPLKSYMLQDSLEHVQL
FDLMRRMLEFDPAQRITLAEALLHPFFAGLTPEERSFHTSRNPSR
SQL to Retrieve Motif Frequency
by Protein
select
c.refseq_id "Refseq ID",
rs2desc(c.refseq_id) "Protein Description",
a.cnt "Repetitions",
b.ps_ac "Prosite AC",
b.descr "Motif Description"
from
motif_data a,
ps_data b,
target_dbp c
where
a.ps_ac = b.ps_ac
and a.sequence_id = c.sequence_id
order by
3 desc, 1
;
Query Results
Refseq ID
Protein Description
Repetitions
Prosite AC
Motif Description
--------------- ------------------------------ ----------- ------------ ------------------------------
NP_055995.2
NP_056363.1
NP_001139.2
NP_066267.1
NP_056363.1
NP_005520.2
NP_066267.1
P_001139.2
NP_115495.1
...
spectrin repeat containing,
nuclear envelope 2
bullous pemphigoid antigen 1,
230/240kDa
ankyrin 2, neuronal
145
PS00006
Casein kinase II phosphorylation site
132
PS00006
Casein kinase II phosphorylation site
115 PS00006
Casein kinase II phosphorylation site
110
PS00006
Casein kinase II phosphorylation site
102
PS00005
Protein kinase C phosphorylation site
97
PS00008
N-myristoylation site
97
PS00005
Protein kinase C phosphorylation site
96
PS00005
Protein kinase C phosphorylation site
monogenic, audiogenic seizure
95
susceptibility 1 homolog (mouse)
PS00006
Casein kinase II phosphorylation site
ankyrin 3, node of Ranvier
(ankyrin G)
bullous pemphigoid antigen 1,
230/240kDa
heparan sulfate proteoglycan 2
(perlecan)
ankyrin 3, node of Ranvier
(ankyrin G)
ankyrin 2, neuronal
Regular Expression Searches Quote
"Thanks to Oracle 10g's Regular Expressions (RE) query support,
it's no longer necessary to export data from the database, process
it with a RE enabled tool and then import the data back into the
database. Now, RE processing can be handled with a single
query." - Marcel Davidson, Head of Database Administration,
Myriad Proteomics
Oracle Data Mining BLAST
 Implemented using a table function interface
 BLAST search functions can be placed in SQL
queries
 Different functions for match & align
 Combination of SQL queries & BLAST is very
powerful & flexible
CATG
00101
Case Study: BLAST as a Sequence
Identification Tool
 Identify protein with high sequence similarity and the
functional class
function, f_count
select function, COUNT(seq_id) f_count
GROUP BY
from (select t.seq_id, t.score, t.expect, g.function
from SwissProt_DB g,
seq_id, function
Table(BLASTP_MATCH(
‘AEQAERYDDMAAAMKRY’,
t.seq_id = g.seq_id
cursor (select seq_id, sequence
from SwissProt_DB),
seq_id, score, expect
SwissProt_DB
5)) t /* expect_value */
where t.seq_id = g.seq_id)
BLASTP_MATCH
group by function /* swissprot kw */
order by f_count
query_sequence, parameters
SwissProt_DB
Case Study: Homology Search
between Yeast and Human Data
Yeast Protein Interactome
Human Protein Interactome
Homology Mapping
A
X
Determined
experimentally
with Y2H
C
B
Determined
experimentally
with Y2H
Y
Inferred through BLAST
Z
Interlogs: (A|X, B|Y) and (A|X, B|Z)
Case study courtesy of Prolexys Pharmaceuticals, Inc.
Batch BLAST: Human (query) vs.
Yeast (subject)
for v1 in c1 loop
insert into yeast_human_homolog (
human_refseq,
yeast_orf_name,
score,
expect
)
select
v1.refseq_id,
t.t_seq_id,
t.score,
t.expect
from
table ( blastp_match (
v1.sequence_string,
cursor ( select a.yeast_acn, a.yeast_seq
from yeast_prot_seq a )
)
)t
where
t.expect < 0.00001
;
end loop;
BLAST Results
Yeast
Yeast
Human
Human
Gene 1
Gene 2
Refseq 1
Refseq 2
------- ------- ----------- ----------- -------- -------YAR018C YIL061C
NP_XXXXX1.1 NP_YYYYY1.1
YBL016W YDL159W NP_XXXXX2.1 NP_YYYYY2.1
YBL016W YDL159W NP_XXXXX3.1 NP_YYYYY3.1
YBL016W YDL159W NP_XXXXX4.1 NP_YYYYY4.1
YBL016W YDL159W NP_XXXXX5.1 NP_YYYYY5.1
YBL063W YIL061C
NP_XXXXX6.1 NP_YYYYY6.1
YBL063W YIL061C
NP_XXXXX7.1 NP_YYYYY7.1
YBR109C YDR356W NP_XXXXX8.1 NP_YYYYY8.1
YBR109C YDR356W NP_XXXXX9.1 NP_YYYYY9.1
YBR109C YDR356W NP_XXXX10.1 NP_YYYY10.1
YBR109C YDR356W NP_XXXX11.1 NP_YYYY11.1
YBR109C YFR014C NP_XXXX12.1 NP_YYYY12.1
YBR109C YOL016C NP_XXXX13.1 NP_YYYY13.1
Yeast
Interactors
Expect 1
Expect 2
4.79E-12
1.11E-08
2.63E-10
4.57E-07
1.57E-22
3.17E-64
2.30E-06
1.78E-07
1.24E-08
5.19E-07
3.92E-10
3.67E-48
3.67E-48
4.58E-06
5.25E-10
9.04E-11
8.33E-09
1.11E-08
8.67E-06
4.58E-06
7.74E-11
7.74E-11
2.80E-20
4.39E-11
6.91E-17
1.82E-17
Human
Interactors
Interlogs
BLAST Quote
"Oracle 10g's new BLAST feature will enable us to easily integrate
multiple types of genomic and proteomic data for complicated
queries used in the mining of our proprietary protein-protein
interaction and cDNA sequence datasets." - Jake Chen, Principal
Bioinformatics Scientist, Myriad Proteomics
Spatial Network Data Model
 Data model for managing graph
(link-node) structures
 Rich graph analysis functions
 Supports variety of network
structures (hierarchical, directed,
undirected, random, scale-free)
 Framework for applying network
constraints and rules (e.g. path
length, cost, minimum bounding
rectangle)
 Bundled Java visualiser & APIs
for 3rd party tools, application
development
Case Study: Integration Architecture
Native
Formats
NREF
 Data type
determines
available routes
 Routes can be
determined
using semantics
EMBL
Nodes
GO
KEGG
Edges
Graph
BIND
AFCS
Distributed
Database
layer
NDM layer
(semantic layer)
Network Route
Case study courtesy of Beyond Genomics, Inc.
Network Data Model Quote
"Beyond Genomics, Inc., as a leading systems biology company,
believes that Oracle 10g's network data model will significantly
advance the integration of metabolomic, proteomic, transcriptomic,
and clinical data sets and the applications that derive value from
these data." – Eric Neumann, Vice President Strategic Informatics,
Beyond Genomics, Inc.
Oracle Data Mining
 Unsupervised Learning
–
–
–
–
Hierarchical K-means Cluster
O-Cluster
Non-Negative Matrix Factorization
Apriori
 Supervised Learning
–
–
–
–
Naïve Bayes
Adaptive Bayes Network
Support Vector Machines
PredictorVariance
 ODM can mine structured data, text data, or
structured and text data
K-Means Clustering
• Hierarchical k-means produces tree of clusters
• All splits are binary
• Each cluster has a centroid & a histogram
• Achieves a reliable solution in a single run
• Ranked rules that describe attributes for cluster
• Cluster assignments are probabilistic using a
Bayesian model
• Operates on very deep datasets by using a
summarization module
Case Study: Brain Tumor Clustering
•
Collection of 42 Human Brain tumors* and 7,129 gene expression
profiles
•
Clustering of samples according to their gene expression profiles
•
It is an example of class and taxonomy discovery
•
Does the data cluster according to the known biological classes?
42 Tumor Samples:
• Normal Cerebellum [MD] (4)
• Malignant Gliomas [MGlio] (10)
• Medulloblastomas [MD] (10)
• Rhabdoid tumors [Rhabdoid] (10)
• Primitive Neuroectodermal [PNET] (8)
* Pomeroy et al
Nature 415, 24, p436
(2002).
ODM Hierarchical k-Means Clustering
Node 1
MD
MGlio
Rhabdoid
Ncer
Node 2
Node 3
PNET
Node 4
Glioblastoma
Cluster
Node 5
Normal
Cluster
Node 6
Medulloblastoma
Cluster
Node 7
Rhabdoid
Cluster
Literature Results using Hierarchical
Clustering
MD
MGlio
Rhabdoid
Ncer
PNET
From Pomeroy et al Nature 415, 24, p436 (2002).
Association Rules
• Captures frequent co-occurrences of items/attribute
values
(A, B) => C
occurrence or A and B together implies C
• Can be applied in different scenarios
•
Market basket analysis
•
Pattern discovery
Predictive applications
•
• ODM uses SQL-based implementation of Apriori
algorithm
Case Study: Analysis of Trends in a
Patient Group
Clinical Table of 60 Medulloblastoma Patients
7 Clinical attributes:
Subtype: classic or desmoplastic medulloblastoma
Size (tumor size): T1-T4
Stage: M0-M4
Sex: M, F
Age (range): 0-5, 5-10, 10-15….
Outcome: S (treatment success),
F (treatment failure)
Chemo (regime type): 0,1,2,3,4,5,6
* Pomeroy et al
Nature 415, 24, p436
(2002).
Association Rules Results
Over 100 rules reflecting factual or known
relationships in data:
Age=1 THEN Sex=M
(confidence = 0.8)
Interpretation: Most 5-10 year-old patients are male
Subtype=Desmoplastic THEN Stage=M0
(confidence = 0.79)
Interpretation: Most desmoplastic patients in the study have stage M0
Association Rules Results
Other interesting trends:
Stage=M0 THEN Outcome=S
(confidence = 0.74)
Interpretation: Stage M0 vs non-M0 is a predictor of
treatment outcome
Stage=M0 AND Size=T3 AND Chemo=1 THEN Outcome=S
(confidence = 0.92)
Interpretation: Most patients with stage M0, size T3 who
received chemo regime 1 had good response to treatment
Support Vector Machines
• SVM provides a very general multi-purpose and
powerful classifier
• SVM does not require feature selection and can work
well with thousands of input features
• SVM is accurate and can approximate complex
functional relationships
• SVM works in binary, multi-class, sparse (text)
classification and regression
• SVM is easy to train and apply and can be used in
discovery mode or in production automated
methodologies
Case Study: Classification of Normal
Human Tissue and Tumors
Multiple Examples (14) of normal human tissue
and tumors
• Could a single model distinguish normal vs cancer?
• Train set: 200 samples, test set: 80 samples
• Microarrays profiles for 7,129 genes
Normal Tissue
vs.
Cancer
S. Ramaswamy et al,
Proc. Natl. Acad. Sci.
USA 98: 15149-15154
(2001)
Support Vector Machines Results
Normal vs. Cancer (Multiple types)
SVM Test Set Predictions
Predicted
Normal Cancer
Actual
Normal 16
10
Cancer
51
3
Test set accuracy: 83.75%
(Naïve Bayes = 75%)
Classification of Multiple Tumor Types
DNA Microarray Data for 14 Tumor Classes
Published Datasets
•
S. Ramaswamy et al, Proc. Natl. Acad. Sci.
USA 98: 15149-15154 (2001)
•
C. Yeang et al, Procs. of ISMB 2001.
Bioinformatics Discovery Note, 1:1-7,
(2001)
Results of Multiple Tumor Type Analysis
• Gene expression profiles for 7,129 genes
• Datasets tumor type composition:
Tumor Class
# Train
# Test
Tumor Class
# Train
# Test
Breast (BR)
8
3
Uterus (UT)
8
2
Prostate (PR)
8
2
Leukemia (LE)
24
6
Lung (LU)
8
3
Renal (RE)
8
3
Colorectal (CO)
8
5
Pancreas (PA)
8
3
Lymphoma (LY)
16
6
Ovary (OV)
8
3
Bladder (BL)
8
3
Mesothelioma (MS)
8
3
Melanoma (ML)
8
2
Brain (BR)
16
4
• 9 minutes training time on 500MHz Netra
• 78.3% accuracy for multi-tumor molecular classification
Outline
 Data Challenges
 Case Studies
 Summary
Summary
 Databases have functionality to access and
integrate distributed data
 There are data management, performance
and security benefits to performing analytics
in databases
 A range of analytical functionality is now
available in databases