Proteotranscriptomics assisted gene annotation and spatial proteomics of Bombyx mori

Published in BMC Genomics

Background: The process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise.

Results: Combining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage.

Conclusions: We show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.

The developmental proteome of Drosophila melanogaster

Published in Genome Research.

Drosophila melanogaster is a widely used genetic model organism in developmental biology. While this model organism has been intensively studied at RNA level, a comprehensive proteomic study covering the complete life cycle is still missing. Here, we apply label-free quantitative proteomics to explore proteome remodeling across Drosophila’s life cycle, resulting in 7,952 proteins, and provide a high temporal-resolved embryogenesis proteome of 5,458 proteins. Our proteome data enabled us to monitor isoform-specific expression of 34 genes during development, to identify the pseudogene Cyp9fpsi as a protein-coding gene and to obtain evidence of 268 small proteins. Moreover, the comparison with available transcriptomic data uncovered examples of poor correlation between mRNA and protein, underscoring the importance of proteomics to study developmental progression. Data integration of our embryogenesis proteome with tissue-specific data revealed spatial and temporal information for further functional studies of yet uncharacterized proteins. Overall, our high resolution proteomes provide a powerful resource and can be explored in detail in our interactive web interface.