Transcript ccsws2 9830
Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill Collaborators Brian Kuhlman, UNC Biochem Many other members of the RosettaCommons Richardson lab, Duke Biochem Funding NIH NSF Scientific Models, esp. for Structural Molecular Biology Focus on statistical/computational models with Models are the lens through which we view data Models are predominantly geometric Computational models are complex Models evolve, so testing becomes crucial a sample source, observable local features, chosen functional form, fit parameters, & visualization/testing methods Capture assumptions and date used to build models to: Visualize for making design decisions while building Fit parameters to ensure best performance Record as scientific benchmarks Case Study: Rosetta protein structure prediction software [B] Physical and Conceptual models Kept simple to aid understanding Statistical and Computational models Evolve by combining simple models Even when complex can still be effective at Validation (Molprobity) or Prediction (Rosetta) Spiral development, much like software Discover problematic features in some data Create an energy function to adjust them Fit parameters to improve results Check into the software as a new option Make default option if everyone likes it Occasionally refactor and rewrite, removing outdated or unused models But less support for testing… Our goal: Capture data and assumptions from model building for use in model visualization and testing. Abstraction: A simple component of a complex computational model consists of: One or more sample sources giving Observable local features having a Hydrogen bond distances and angles Chosen functional form that Pdb files from native or decoys Energy from distances and angles Depends on fitting parameters Weights for combining terms KMB’03 data set A data set B ... gather features data set Z plots SQL query ggplot2 spec filter transform statistics Implemented tools Compare distributions from sample sources Tufte’s small multiples via ggplot Kernel density estimation Normalization Opportunities for Statistical analysis Dimension reduction … 1400 1200 1000 800 600 400 200 1. 45 1. 55 1. 65 1. 75 1. 85 1. 95 2. 05 2. 15 2. 25 2. 35 2. 45 2. 55 2. 65 2. 75 2. 85 0 [KMB’03] Histogram of Hbond A-H distances in natives Scientific unit tests native, HEAD, ^HEAD run on continuously testing server Knowledge-base score term creation native, release, experimental turn exploration into living benchmarks Test design hypotheses native, protocol, designs how strange is the this geometry? Scientific Models, esp. for Structural Molecular Biology Focus on statistical/computational models with Models are the lens through which we view data Models are predominantly geometric Computational models are complex Models evolve, so testing becomes crucial a sample source, observable local features, chosen functional form, fit parameters, & visualization/testing methods Capture assumptions and date used to build models to: Visualize for making design decisions while building Fit parameters to ensure best performance Record as scientific benchmarks Case Study: Rosetta protein structure prediction software [B]