A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation

Alexander P Fields; Edwin H Rodriguez; Marko Jovanovic; Noam Stern-Ginossar; Brian J Haas; Philipp Mertins; Raktima Raychowdhury; Nir Hacohen; Steven A Carr; Nicholas T Ingolia; Aviv Regev; Jonathan S Weissman

doi:10.1016/j.molcel.2015.11.013

A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation

Mol Cell. 2015 Dec 3;60(5):816-827. doi: 10.1016/j.molcel.2015.11.013.

Affiliations

¹ Howard Hughes Medical Institute, Department of Cellular and Molecular Pharmacology, University of California, San Francisco and California Institute for Quantitative Biomedical Research, San Francisco, CA 94158, USA.
² The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
³ Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel.
⁴ The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, MA 02114, USA; Harvard Medical School, Boston, MA 02115, USA.
⁵ Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA.
⁶ The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02140, USA.
⁷ Howard Hughes Medical Institute, Department of Cellular and Molecular Pharmacology, University of California, San Francisco and California Institute for Quantitative Biomedical Research, San Francisco, CA 94158, USA. Electronic address: jonathan.weissman@ucsf.edu.

Abstract

A fundamental goal of genomics is to identify the complete set of expressed proteins. Automated annotation strategies rely on assumptions about protein-coding sequences (CDSs), e.g., they are conserved, do not overlap, and exceed a minimum length. However, an increasing number of newly discovered proteins violate these rules. Here we present an experimental and analytical framework, based on ribosome profiling and linear regression, for systematic identification and quantification of translation. Application of this approach to lipopolysaccharide-stimulated mouse dendritic cells and HCMV-infected human fibroblasts identifies thousands of novel CDSs, including micropeptides and variants of known proteins, that bear the hallmarks of canonical translation and exhibit translation levels and dynamics comparable to that of annotated CDSs. Remarkably, many translation events are identified in both mouse and human cells even when the peptide sequence is not conserved. Our work thus reveals an unexpected complexity to mammalian translation suited to provide both conserved regulatory or protein-based functions.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Animals
Cells, Cultured
Conserved Sequence
Dendritic Cells / drug effects
Humans
Lipopolysaccharides / pharmacology
Mice
Open Reading Frames
Proteome / metabolism*
Proteomics / methods*
Regression Analysis
Ribosomes / metabolism*

A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation

Authors

Affiliations

Abstract

Publication types

MeSH terms

Substances

Associated data

Grants and funding