Providing free or low-cost bioinformatics support at scale NCGAS provides a complete architecture of consulting, tools, and systems to support genomic science. As a.
Download ReportTranscript Providing free or low-cost bioinformatics support at scale NCGAS provides a complete architecture of consulting, tools, and systems to support genomic science. As a.
Providing free or low-cost bioinformatics support at scale NCGAS provides a complete architecture of consulting, tools, and systems to support genomic science. As a virtual instrument it uses research networks for large data transfer among sequencing centers, NCGAS facilities, and reference data repositories. Large memory systems architected for assembly, like the IU Mason system, available. 5 Petabyte high speed Data Capacitor storage mounts directly to other national resources NCGAS Virtual Genomics Science Instrument Extreme Science and Engineering Discovery Environment (XSEDE) NSF-Funded or XSEDE Allocation NCGAS Galaxy Portal Federally Funded Texas Advanced Computing Center Mason 100 GB Internet2 Pittsburgh Supercomputer Center San Diego Supercomputer Center POD 10 GB NLR Sequencing Centers NCGAS Business Model for Biomedical Research: NSF-funded investigators provided all services at no charge. XSEDE-approved investigators get free access to supercomputers across the country in partnership with NCGAS. Any federally-funded investigator can purchase cycles from the IU POD (Penguin on Demand) system at reduced rates. No data storage or transfer charges Galaxy portal with workflow management provided 37 optimized analytical codes installed and professionally managed. Will install others on request. Trinity RNA-seq application optimized in partnership with the Broad Institute Medical research supported by NCGAS: 5 PB Storage POD Galaxy Portal NCGAS.ORG Reference Data Identifying the causative mutation that generates ethanol sensitivity in a mutagenized zebrafish line. To generate a source of SNPs to identify a region containing the mutation, we cross our mutagenized stock, on an AB genetic background, to the TU background, which is the ensembl reference genome. These hybrid mutant fish are highly polymorphic throughout the genome, except in the region containing the mutation. This suppression of polymorphisms defines a genetic interval containing the mutation and the genomic sequence in the region can then be analyzed for predicted deleterious mutations. For this study, zebrafish reads were aligned to a reference zebrafish genome using BWA, and variant-calls were generated using SAM Tools. These calls were then mapped to identify regions of homozygosity—regions that are likely to include the causative mutation on the IU Mason system. The next step will be to identify the genes in this region that carry mutations—possibly directly locating the gene whose mutation generates ethanol resistance. Johann Eberhart, PI. University of Texas at Austin. A genomewide association approach to study the role of common variation in the risk for intracranial berry aneurysms, testing whether the presence of a particular allele at a single SNP a group of SNPs is more common in individuals with an IA as compared with healthy controls. Using 4,060 genotyped samples, this study increased the number of SNPs by imputation to generate a ‘best guess’ as to the most likely genotype of a SNP that was not on the genotyping array using data generated as part of the 1000 Genomes Phase I integrated variant set of 30 million SNPs. The program IMPUTE2 v2.2.2 was used. Because of our large sample size and the large number of SNPs, this is computationally demanding in terms of disk space, speed and memory and so was performed on the NCGAS Mason system. Tatiana Foroud, PI. Indiana University School of Medicine. Funded by the National Science Foundation Award #1062432. Some services free for all researchers; some free to NSF-funded researchers. Some services available as on demand / pay for services. William K. Barnett, Ph.D. Richard D. LeDuc, Ph.D. Indiana University