WORKSHOP ON ONTOLOGIES OF CELLULAR NETWORKS Biological Network Analysis and Representational Implications 27 - 28 MAR 2008 Richard H.

Download Report

Transcript WORKSHOP ON ONTOLOGIES OF CELLULAR NETWORKS Biological Network Analysis and Representational Implications 27 - 28 MAR 2008 Richard H.

WORKSHOP ON ONTOLOGIES OF CELLULAR NETWORKS Biological Network Analysis and Representational Implications

27 - 28 MAR 2008

Richard H. Scheuermann, Ph.D.

Chief, Division of Biomedical Informatics U.T. Southwestern Medical Center 1

Motivation

• • • • The cell functions as a system of integrated components – – – There is increasing evidence that the cell system is composed of modules A “ module ” in a biological system is a discrete unit whose function is separable from those of other modules – Modules defined based on functional criteria reflect the critical level of biological organization (Hartwell, et al.) A modular system can reuse existing, well-tested modules The notion of regulation requires the assembly of individual components into modular networks Functional modules can then be assembled together into cellular networks Thus, identifying functional modules and their relationship from biological networks is important to the understanding of the organization, evolution and interaction of the cellular systems they represent 2

MoNet

3

1

Definition of network modules

2 3 4

Edge betweeness

• Girvan-Newman proposed an algorithm to find social communities within human population networks – – Utilized the concept of edge betweenness as a unit of measure • • defined as the number of shortest paths between all pairs of vertices that run through it edges between modules tend to have higher values Provides a quantitative criterion to distinguish edges inside modules from the edges between modules Betweenness = 20 5

A new definition of network modules

Definition of module degree:

Given a graph G, let U be a subgraph of G (U

G). The number of edges within U is defined as the indegree of U, ind(U). The number of edges that connect U to remaining part of G (G−U) is defined as the outdegree of U, outd(U).

Definition of module:

A subgraph U

G is a module if ind(U)

outd(U).

A subgraph is a complex module if it can be separated into at least two modules by removing edges inside it. Otherwise, it is a simple module.

Adjacent relationship between modules:

Given two subgraphs U, V

G, U and V are adjacent if U

V=

and there are edges in G connecting vertices in U and V.

6

Interaction Networks

•Large component of the

S. cerevisiae

protein interaction network •DIP database •2440 proteins & 6241 interactions •Large component of the

Homo sapiens

protein interaction network •BIOGRID database •6656 proteins & 19022 interactions 7

dMoNet Modules

•99 dMoNet simple modules •3 to 201 nodes in size •Include 1700 nodes out of the original 2440 nodes and 3459 of the 6241 edges •156 dMoNet simple modules •3 to 1048 nodes in size •Include 3169 nodes out of the original 6656 nodes and 6949 of the 19022 edges 8

Validation of modules

• Annotated each protein with the Gene Ontology TM (GO) terms from the

Saccharomyces

Genome Database (SGD) (Cherry et al. 1998; Balakrishna et al) • Quantified the co-occurrence of GO terms using the hypergeometric distribution analysis • The results show that each module has statistically significant co-occurrence of functional GO categories 9

S. cerevisiae dMoNet Module Evaluation

Annotated in Genome 163 Module Size 76 Annotated in Module 58 Probability 1.66E-60 84 93 50 6.49E-55 Modularity 1.54

2.22

99 102 115 16 103 17 96 12 201 44 45 21 111 14 30 12 62 36 35 16 44 14 25 11 2.81E-44 1.44E-43 2.64E-39 2.83E-37 9.31E-35 2.32E-34 3.63E-34 3.22E-28 1.47

1.62

1.04

23.50

1.65

1.67

1.46

8.50

Top 10 yeast network modules with lowest co-clustering p-values. The p-value threshold corresponding to a 5% chance of committing a Type I error based on the Bonferroni correction given a data set of size 2440 is 2.05E-05. Of the 99 modules, 84 have biological process co-clustering p-values below this threshold.

10

11

12

13

Proposed Module Naming Convention

14

Topologies and Measures

15

Ontology for Biomedical Investigation (OBI) Data Transformation Branch

16

Summary of terms

• Network analysis methods • Network components - network (graph), node (protein), edge (interaction), module (subgraph) • Component properties (qualities) - connectivity, degree, betweeness, density, modularity, edge weight • Topologies - star, ring, mesh, linear, combinations 17

Acknowledgements

• Network Analysis – Feng Luo (Clemson) – Roger Chang (UCSD) – Maya El Dayeh (SMU) – Yuhang Wang (SMU) – Preetam Ghosh (UTSW) • OBI Data Tranformation – Helen Parkinson (EBI) – Melanie Courtot (BCCRC) – Ryan Brinkman (BCCRC) – Elisabetta Manduchi (UPenn) – James Malone (EBI) – Monnie McGee (SMU • Support (NIAID) – 1N0140041 – 1N0140076 18