How do we analyse a proteome?

There are between 20,000 and 25,000 unique protein-coding genes in the human genome. Also, there are often different splice variants and post-translational modifications (e.g. phosphorylation, methylation, acetylation, etc.) of these proteins. Theoretically there can easily be a hundred times more distinct proteins than what is coded by the human genome. 

Thankfully, not all proteins are expressed at the same time or location. Many additional factors influence which proteins are present and crucially their abundance, understanding this complexity and its importance in biology is the basis of proteomics. Understanding the differences between proteomes (e.g. health vs disease) is the field of quantitative proteomics. Even with a reduced number of theoretical proteins present in a cell, it can still present a technological challenge to analyse the proteome. 

Investigating the proteins that constitute a proteome. 

One of the significant advances in instrumentation has been the ability to couple two mass analysers together. This coupling is often referred to as “tandem mass spectrometry” (commonly known as MS/MS). Tandem-MS allows the analysis of complex samples such as proteins and peptides. The molecules of a given sample are ionised, and the first spectrometer (MS1) separates the ions by their mass-to-charge ratio (m/z). Ions of a particular m/z-ratio coming from MS1 are selected and then made to split into smaller fragment ions. These fragments are then introduced into the second mass spectrometer (MS2), which separates the fragments by their m/z-ratio and detects them. The fragmentation step makes it possible to identify and separate ions with very similar m/z-ratios in regular mass spectrometers. 

Whilst tandem mass spectrometry can help with analysis of complex samples; the proteome is still very complex! 

Not only do we want to understand the proteins that constitute a proteome, but we want to know the amino acid sequence of the proteins. Understanding the amino acid sequence of the proteins can help us precisely identify the protein (rather than a simple mass only identification). In addition, we can determine any modifications of the amino acids (e.g. phosphorylation) and determine any variation relative to the known genome (e.g. splice variants, point mutations, etc.)  

To make the samples less complex for analysis, we use a technique called bottom-up proteomics. Essentially bottom-up proteomics refers to the protein broken-up (enzymatically digested) into peptide fragments. These peptide fragments are then used to rebuild the protein, hence the name bottom-up. The bottom-up technique is relatively straightforward for a single protein, but what about the thousands of proteins present in a sample? To tackle this challenge, we use bottom-up proteomics coupled with chromatography to separate the sample. This technique is commonly referred to as shotgun proteomics and is actually derived from the shotgun DNA sequencing technique used to determine the human genome. By comparing the masses of the proteolytic peptides or their tandem mass spectra with those predicted from a sequence database (or peptide spectral library), peptides can be identified and multiple peptide identifications assembled into a protein identification.  Shotgun proteomics is the cornerstone of data-dependent selection (and acquisition) of precursor ions to generate fragment ion scans and is commonly referred to as DDA (data dependent acquisition). 

The techniques above describe how to qualitatively understand the proteome of a complex sample, but how do we quantitatively understand the proteome of many samples? Quantitative proteomics can help us understand the global protein expression and modifications underlying the molecular mechanisms of biological processes and disease states. It enables us to compare the biological system we are investigating and ultimately help in the development of targeted therapeutics. Our next article will describe the most commonly-used relative quantitation methods. Stay tuned! 

Platinum Discovery

Featured image – “Cellular Landscapes” created by Evan Ingersoll & Gael McGill (Digizyme Inc, Brookline MA)

Latest Platinum Informatics News

Start your
journey today.