
The completion of the Human Genome Project has resulted in the identification of more than 20,000 genes in the human genome. The development of high throughput analytical tools spawned during this genomic age enables the measurement of the RNA expression level of all known genes, or the transcriptome, simultaneously. Importantly, these data of the human transcriptome are providing scientists and clinicians with new perspectives into health and disease. By studying global transcriptome expression, scientists can devise methods of class prediction based on global gene expression, and more importantly, without a priori knowledge of the function of any of the genes.
Read MoreGene instructions
Genetic information is carried by DNA (deoxyribonucleic acid) in each gene, which has a set of instructions for making various substances that the body needs to survive. The cells in the body cannot use a gene by itself to do anything, rather, genes provide the instructions for making important products for the body, like proteins. The instructions are written in a “sequence” of chemicals called bases that are in a specific order for a specific genetic message to do or to make something. Not all DNA sequences have a known function or directly make proteins for the body, and it is believed some of these non-protein coding genes affect how other genes are expressed.
The body uses very sophisticated mechanisms to monitor what the body needs at what time and in what quantity. By analyzing the signals, the body decides if a gene product is needed or not, and the process by which the information in the gene is used to make the product is gene expression. Examination of specific pieces of DNA usually in a protein-coding gene on one of the 23 pairs of chromosomes in humans is “genetic” testing, which requires that the investigator knows which gene or genes to examine. “Genomic” or whole genome-wide testing examines every gene present in a cell whether these are protein or non-protein coding genes.
Expression is a snapshot in time
The gene expression process has many complicated steps all of which can be modulated by the genes. These include transcription, ribonucleic acid (RNA) splicing, translation, and post-translational modification of a protein. Information stored in the cell’s DNA is used in gene expression to identify the phenotype (the observable characteristics of an organism) via gene transcription and messenger RNA processing.
For years, cancer tumors had been screened for the expression level of specific genes. Using genome-scale expression profiling of tumors, distinct groups of genes were identified that corresponded to the presence of specific cell types in the tumors. The study of cancer tumors represents perhaps one of the earliest examples of using expression profiling in the clinical setting for the diagnosis and classification of human disease.
Studying expression in critically ill patients
Conducting genome-wide expression profiling on blood and tissue samples from critically ill patients poses many challenges, which include acquiring high quality RNA of sufficient quantity from the patient samples to generate detectable signals that can be analyzed for gene expression. With transcriptome analysis, there are many types of studies in medicine that can be conducted:
- class comparison in which the transcriptomes of two groups (complicated versus uncomplicated recovery patients, for example) are compared
- class prediction in which a “classifier” is built that can distinguish between two predefined classes of patients based upon the gene expression profiles of the samples
- class discovery in which a set of gene expression profiles are analyzed to discover subgroups that share common features or groups of genes that behave similarly in a disease state
Creation of a novel human transcriptome array
Technological advances in molecular biology have led to many different techniques for performing genome-wide examinations at the DNA, RNA, and protein levels. Stanford Genome Technology Center (SGTC) colleagues, who are also participating investigators in the Glue Grant program, developed many of the techniques currently used in academic and industrial biotechnology labs for DNA sequencing, genetic polymorphism, and in particular, DNA microarrays for the comprehensive analysis of messenger RNA (mRNA) expression.
Together with our strategic partner Affymetrix, our Glue Grant program investigators have developed a new 6.9 million feature oligonucleotide array of the human transcriptome for high-throughput and cost-effective analyses in clinical studies. The Affymetrix GeneChip® Human Transcriptome Array 2.0 (HTA) enables, for the first time, comprehensive examinations of the multiple mechanisms human cells use to regulate transcriptome in response to diseases, including improved analysis of gene expression, genome-wide quantitation of gene isoforms and identification of alternative splicing, detection of coding transcripts like single nucleotide polymorphisms (SNPs), allele specific expression analysis, examination of non-coding transcription and antisense expression, and the analyses of small RNAs.
In comparing the performance of the array with mRNA sequencing (RNA-Seq) of 46 million uniquely mappable reads per replicate, the HTA array is highly reproducible in estimating gene and exon abundance and in some instances, more sensitive than other technologies.
The array has been implemented in our clinical programs and has generated high quality, reproducible data. Considering the clinical trial requirements of cost, sample availability, and throughput, the HTA (now available as HTA 2.0) has a wide range of applications. An emerging approach for large-scale clinical genomic studies is to first utilize RNA sequencing to the sufficient depth for the discovery of transcriptome elements relevant to the disease process, followed by high throughput and reliable screening of these elements on thousands of patient samples using custom designed arrays.
New transcriptome assays based on the features from the HTA have also been developed in collaboration with Affymetrix for whole-transcript coverage in murine and monkey model research are becoming commercially available.
Relevant publications
Xu W, Seok J, Mindrinos MN, Schweitzer AC, Jiang H, Wilhelmy J, Inflammation and Host Response to Injury Large-Scale Collaborative Research Program. Human transcriptome array for high-throughput clinical studies. Proc Natl Acad Sci U S A. 2011 Mar 1;108(9):3707-12. PubMed PMID: 21317363; PubMed Central PMCID: PMC3048146
Fu GK, Xu W, Wilhelmy J, Mindrinos MN, Davis RW, Xiao W, et al. Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations. Proc Natl Acad Sci U S A. 2014 Feb 4;111(5):1891-6. PubMed PMID: 24449890; PubMed Central PMCID: PMC3918775
The SEQC consortium. Power and limitations of RNA-Seq: findings from the SEQC (MAQC-III) consortium. Nature Biotechnology 2014. Accepted for publication
Contact
Wenzhong Xiao, PhD | Ronald Davis, PhD | Ronald Tompkins, MD, ScD |
617-724-7261 | 650-812-2020 | 617-726-3447 |
Visit the Glue Grant program website