
Utilizing multiple tools including microarrays, high throughput proteomics, cell separation methods, and bioinformatics, the Glue Grant leaders organized the program to acquire and analyze large amounts of genome-wide data, to improve our systems level understanding of the innate inflammatory response to serious injury. The development of novel computational and bioinformatics tools for the analysis of genomics and proteomics data, and the extraction of biological information from genomic information was a critical component to achieve the program goals. The program faced an important challenge early on, however, in that there were few appropriate statistical approaches for analyzing high-throughput genomic and proteomic data and extracting biological information from genomic data derived from total blood leukocytes and their various cell types.
Read MoreOptimal approaches – learning from experience
Many of the commonly used academic and commercial programs (SAM, PAM, and others) for genome-wide expression analyses, several of which were developed by our Glue Grant investigators “BG, before Glue”, were not intended for use with large data sets of time series data like the data generated by the program. Knowing this, program leadership redirected resources early on for the development of more universal and useful bioinformatics, statistical, and pathway software tools that would not only be of immense value to this program, but would also provide value to the at-large clinical research community with broad applicability to many areas of clinical medicine.
Glue Grant investigators struggled with the optimal approaches to make the vast, diverse, and complex genomic findings and experimental data easily available and query accessible to the scientific community. The program leaders considered two different audiences of data users – clinical investigators who are interested in their own particular genomic pathways and the basic science bioinformatics experts who might be more interested in the basic science genomics rather than the clinical significance of the genomic findings. As you might expect, these audiences would have varying levels of sophistication using such complex datasets. Acknowledging this fact, the program has established web portals, which are web pages that include user-friendly software containing efficient implementation of the computational algorithms and a graphical user interface with built-in standard analysis pipelines, and visualization tools for displaying analysis results. The lessons learned through publishing the Years 1-5 data from mixed blood leukocytes have become particularly important as the investigators organize the web portals to provide processed genomic and proteomic data from enriched leukocyte populations (T-cells, monocytes, and neutrophils), and from human skeletal muscle, skin and fat tissue.
For more than 10 years, program investigators have utilized a wide range of statistical and bioinformatics tools in the analysis of the large-scale gene expression data from the clinical studies, and accumulated valuable experience and expertise in the computational analysis of the genomic data. Glue Grant investigators learned firsthand, for example, that the first generation exon arrays are significantly more challenging to analyze than traditional 3’ arrays (e.g., Affymetrix U133+). Most existing microarray analysis programs can handle the exon array data format and can compute expression indexes for genes. However, few, if any, can support isoform level and alternative splicing analysis. Furthermore, there are no visualization tools that allow investigators to examine their raw data and compare the data across multiple samples and time points. As a result, investigators often fail to exploit the rich exon-level information in the data, thereby using the array primarily to obtain gene level information. Glue Grant investigators have recognized that a software program with a graphical user interface and built-in standard analysis pipelines and visualization tools, together with the supporting website that includes the design and annotation of the array, analysis methods and software, and tutorials, will enable the more speedy transfer of the technology to the research community.
Novel bioinformatics tools to study disease class, time course, and the genetics of gene expression studies
Glue Grant funded investigators have developed several novel bioinformatics, statistical, and pathway analysis tools – now used by many laboratories in the field of translational genomics – for the exploration of disease class and time series genomic and proteomic data.
As important products for the research community, these software tools include:
Pathway analysis – Our Program investigators pioneered in collaboration with Ingenuity Inc. (Redwood City, CA) a novel systems-based approach to generate genomic information at the level of functional modules, i.e., knowledge-based network analysis. For the first time, this method allowed genome-wide identification of specific transcriptional networks and protein complexes significantly perturbed in microarray datasets.
Calvano SE, Xiao W, Richards DR, Felciano RM, Baker HV, Cho RJ, Inflammation and Host Response to Injury Large Scale Collab. Res. Program. A network-based analysis of systemic inflammation in humans. Nature. 2005 Oct 13;437(7061):1032-7. PubMed PMID: 16136080
EDGE – Extraction of Differential Gene Expression. EDGE is a software package for the significance analysis of DNA microarray experiments for both standard and time course experiments based on the Optimal Discovery Procedure and Time Course Methodology. EDGE is available for free download to investigators. Learn more about EDGE.
Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW. Significance analysis of time course microarray experiments. Proc Natl Acad Sci U S A. 2005 Sep 6;102(36):12837-42. PMCID: PMC1201697
Leek JT, Monsen E, Dabney AR, Storey JD. EDGE: extraction and analysis of differential gene expression. Bioinformatics. 2006 Feb 15;22(4):507-8. Epub 2005 Dec 15. Erratum in: Bioinformatics. 2006 Jun 1;22(11):1412. PubMed PMID: 16357033
Storey JD, Dai JY, Leek JT. The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics. 2007 Apr;8(2):414-32. Epub 2006 Aug 23. PubMed PMID: 16928955
SVA –Surrogate Variable Analysis. SVA is a methodology used to overcome the problems caused by heterogeneity in expression studies. SVA can be applied to disease class, time course, and genetics of gene expression studies. Learn more about SVA.
Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007 Sep;3(9):1724-35. PubMed PMID: 17907809; PubMed Central PMCID: PMC1994707
TANOVA – Time Course Analysis of Variance. TANOVA is an R package for the factorial analysis of both longitudinal and cross-sectional time course microarray data. Learn more about TANOVA.
Zhou B, Xu W, Herndon D, Tompkins R, Davis R, Xiao W, Wong WH; Inflammation and the Host Response to Injury Program. Analysis of factorial time-course microarrays with application to a clinical study of burn injury. Proc Natl Acad Sci U S A. 2010 Jun 1;107(22):9923-8. PubMed PMID: 20479259; PubMed Central PMCID: PMC2890487
JETTA – Junction and Exon Tooklits for Transcriptome Analysis. JETTA is an integrated software tool for gene and exon expression calculation and alternative splicing analyses. It can be applied to the analysis and visualization of both exon-junction array and RNA-Seq data. Learn more about JETTA.
Seok J, Xu W, Davis RW, Xiao W, RASA: robust alternative splicing analysis for human transcriptome arrays. Bioinformatics 2014. Accepted for publication
Seok J, Xu W, Gao H, Davis RW, Xiao W. JETTA: junction and exon toolkits for transcriptome analysis. Bioinformatics 2012. May 1;28(9):1274-5. PubMed PMID:22433281; PMCID: PMC3338022
HTA – Human Transcriptome Array. For high-throughput and cost-effective analyses of alternative splicing, Glue Grant investigators in collaboration with their strategic partner Affymetrix, developed a new 6.9 million feature human transcriptome array (Affymetrix Human Transcriptome Array, HTA 2.0), which allows comprehensive and reproducible assay of exons and exon-exon junctions in the human transcriptome. Learn more about HTA 2.0.
Xu W, Seok J, Mindrinos MN, Schweitzer AC, Jiang H, Wilhelmy J, Inflammation and Host Response to Injury Large-Scale Collaborative Research Program. Human transcriptome array for high-throughput clinical studies. Proc Natl Acad Sci U S A. 2011 Mar 1;108(9):3707-12. PubMed PMID:21317363; PMCID: PMC3048146
TRT – An interactive website developed by the Glue Grant investigators as a supplement to the first genomics of trauma publication. It allows the user to insert a favorite gene, or view the heat map to see how expression is affected by different clinical attributes observed in severe trauma. Learn more about the TRT.
Xiao W, Mindrinos MN, Seok J, Cuschieri J, Cuenca AG, Gao H, et al. Inflammation and Host Response to Injury Large-Scale Collaborative Research Program. A genomic storm in critically injured humans. J Exp Med. 2011 Dec 19;208(13):2581-90. PubMed PMID: 22110166; PMCID: PMC3244029
Improving the usability of the bioinformatics resources
The TRT is a remarkably intuitive tool and represents the paradigm for future efforts to make genomics data accessible and usable by the scientific community. This is one component of the overall vision to support a global genomics web portal for displaying heterogeneous datasets, to allow the first-of-its-kind comparative analyses of groups that differ by injury etiology, blood cell and tissue type, sampling time points, and time course. In addition, to aid the scientific community of investigators in studying his or her “favorite” gene or pathway, investigators are working to improve the web portals for querying single genes or pathways across differing injuries and differing cell and tissue types.
Based upon our program experience with trauma and burns, it is highly likely that other investigators can use the tools developed in the Glue Grant program to predict clinical trajectories in other disease groups beyond the field of injury, such as acute decompensated heart failure and solid organ rejection after transplantation.
Contact
Wenzhong Xiao, PhD | Ronald Tompkins, MD, ScD |
617-724-7261 | 617-726-3447 |
Visit the Glue Grant program website