Providing free or low-cost bioinformatics support at scale NCGAS provides a complete architecture of consulting, tools, and systems to support genomic science. As a.

Download Report

Transcript Providing free or low-cost bioinformatics support at scale NCGAS provides a complete architecture of consulting, tools, and systems to support genomic science. As a.

Providing free or low-cost bioinformatics support at scale
NCGAS provides a complete architecture of
consulting, tools, and systems to support genomic
science.
As a virtual instrument it uses research networks
for large data transfer among sequencing centers,
NCGAS facilities, and reference data repositories.
Large memory systems architected for assembly,
like the IU Mason system, available.
5 Petabyte high speed Data Capacitor storage
mounts directly to other national resources
NCGAS Virtual Genomics Science Instrument
Extreme Science and Engineering
Discovery Environment (XSEDE)
NSF-Funded
or
XSEDE
Allocation
NCGAS
Galaxy
Portal
Federally
Funded
Texas Advanced
Computing Center
Mason
100 GB
Internet2
Pittsburgh
Supercomputer Center
San Diego
Supercomputer Center
POD
10 GB
NLR
Sequencing Centers
NCGAS Business Model for Biomedical Research:
NSF-funded investigators provided all services at no charge.
XSEDE-approved investigators get free access to supercomputers
across the country in partnership with NCGAS.
Any federally-funded investigator can purchase cycles from the IU
POD (Penguin on Demand) system at reduced rates.
No data storage or transfer charges
Galaxy portal with workflow management provided
37 optimized analytical codes installed and professionally
managed. Will install others on request.
Trinity RNA-seq application optimized in partnership with the
Broad Institute
Medical research supported by NCGAS:
5 PB
Storage
POD
Galaxy
Portal
NCGAS.ORG
Reference
Data
Identifying the causative mutation that generates ethanol sensitivity in a
mutagenized zebrafish line. To generate a source of SNPs to identify a region
containing the mutation, we cross our mutagenized stock, on an AB genetic
background, to the TU background, which is the ensembl reference genome. These
hybrid mutant fish are highly polymorphic throughout the genome, except in the
region containing the mutation. This suppression of polymorphisms defines a
genetic interval containing the mutation and the genomic sequence in the region
can then be analyzed for predicted deleterious mutations. For this study, zebrafish
reads were aligned to a reference zebrafish genome using BWA, and variant-calls
were generated using SAM Tools. These calls were then mapped to identify regions
of homozygosity—regions that are likely to include the causative mutation on the
IU Mason system. The next step will be to identify the genes in this region that
carry mutations—possibly directly locating the gene whose mutation generates
ethanol resistance. Johann Eberhart, PI. University of Texas at Austin.
A genomewide association approach to study the role of common variation in the
risk for intracranial berry aneurysms, testing whether the presence of a particular
allele at a single SNP a group of SNPs is more common in individuals with an IA as
compared with healthy controls. Using 4,060 genotyped samples, this study
increased the number of SNPs by imputation to generate a ‘best guess’ as to the
most likely genotype of a SNP that was not on the genotyping array using data
generated as part of the 1000 Genomes Phase I integrated variant set of 30 million
SNPs. The program IMPUTE2 v2.2.2 was used. Because of our large sample size and
the large number of SNPs, this is computationally demanding in terms of disk
space, speed and memory and so was performed on the NCGAS Mason system.
Tatiana Foroud, PI. Indiana University School of Medicine.
Funded by the National Science Foundation Award #1062432. Some services free for all researchers; some free to NSF-funded researchers. Some services available as on demand / pay for services.
William K. Barnett, Ph.D.
Richard D. LeDuc, Ph.D.
Indiana University