Thierry Le Bihan1, Paul Taylor1, Zac McDonald1, Qixin Liu1, Jianqiao Shen1, Kathleen Gorospe1, Xin Xu1, Chris Hosfield2, Bin Ma1,3
1Rapid Novor, Inc, Kitchener, Ontario, Canada
2Promega Corporation, Madison, WI
3University of Waterloo, Waterloo, Ontario, Canada
Abstract
In this study, we conducted a large-scale statistical analysis of protein sequencing data from samples digested with multiple proteases to understand the impact of using different combinations of proteases to improve the depth of sequence coverage in the application of de novo protein sequencing. MS data for 166 monoclonal antibodies were compiled for use in this study. Each antibody protein sample was digested separately with different proteases and analyzed by LC-MS/MS. The new Pro/Ala (A/P) protease was tested to characterize NIST and MAB04HC antibody standards. De novo peptide sequencing was performed with Novor search algorithm. Protein sequences were assembled using REmAb®. The assembled mAbs demonstrate that a combination of existing proteases with orthogonal activities significantly increases confidence scores in de novo protein sequencing; however, there is a need for new proteases targeting specific amino acid(s) (a.a.) or a.a. sequences to increase antibody sequencing accuracy.
Key Takeaways
- A combination of proteases can maximize coverage during LC-MS/MS-based sequencing
- Other orthogonal approaches also contribute to an increase in accuracy for de novo protein sequencing
- REmAb®’s de novo protein sequencing established protocol includes orthogonal and protease cocktails optimized to deliver highly accurate and full-coverage protein sequences
Introduction
Sample preparation for complete LC-MS/MS sequencing of a pure protein sample differs substantially from the preparation used for protein identification in complex samples. To sequence a protein de novo, the experimental design should aim to identify each amino acid (a.a.) multiple times within different peptides, instead of relying on only a few peptides per protein for complex protein mixtures. Efforts to increase protein sequence coverage typically rely on using multiple proteases to digest the protein. In this study, we conducted a large-scale statistical analysis of protein sequencing data from samples digested with multiple proteases to understand the impact of using different combinations of proteases to improve the depth of sequence coverage in the application of de novo protein sequencing. The data presented here can help guide the choice of proteases for maximum coverage during protein sequencing.
Materials & Methods
Proteases and Target Proteins:
Trypsin, Lys-C, Chymotrypsin, Pepsin, A/P, and recombinant Asp-N proteases Promega (Promega, WI, US) were used to digest NISTmAb humanized IgG1 monoclonal antibody RM8671 (National Institute of Standards and technology, U.S. department of Commerce), monoclonal mAb04 and yeast extracts (Promega, WI, US).
LC-MS/MS:
Protein lysates were analyzed with an Orbitrap Fusion™ Series Tribid™ instrument (ThermoFisher Scientific, CA, US) coupled to the LC Evosep One (Evosep, Denmark) in both HCD and ETD mode. ETD spectra were acquired with three different collisions energies.
Data analysis:
De novo peptide sequencing was performed with Novor search algorithm and the protein sequences were assembled with REmAb® (Figure 1). Yeast complex samples were analyzed with Search GUI-3.3.13 and PeptideShaker-1.16.37 using no enzyme restriction, 25ppm with protein, peptide and 1% psm FDR, while considering a combination of X! tandem, MS-GF+ and Comet. Sequence motif analysis was performed using SeqtoLogo.
Results
The A/P Protease can be used to further validate protein sequencing
Use of diverse proteases facilitates maximum coverage of CDR3
Protease combinations are important to achieve maximum coverage
The use of many different proteases translates to wider coverage
The median depth of coverage for the HCDR3 region and the remaining portion of the variable regions are listed in the following table for each combination of the proteases used in this study. The first column lists the protease combination, where each protease is represented by a single letter code: P (Pepsin), T (Trypsin), C (Chymotrypsin), A (Asp-N), and L (Lys-C). The second and third columns list the median depths of coverage of the HCDR3 and the other variable portions, respectively. The “depth of coverage” for an amino acid is defined as the number of unique PSMs covering the amino acid. Repeated MS/MS scans of the same precursor were counted as a single PSM. We compared the media depth of coverage achieved by different combinations of proteases by examining all amino acids from 166 antibodies. As expected, a greater number of different proteases increase coverage. In all cases, the HCDR3 is less covered than any other variable region. Surprisingly, when a limited number of proteases is employed, pepsin seems to significantly contribute to improve amino acid coverage. We propose that this is most likely associated with the generation of a greater number of peptides with miss-cleaved sites and therefore a wider pool of different peptides.
Conclusions
Successful de novo antibody sequencing depends on full coverage of the protein of interest that is best achieved through repeated identification of amino acid in different peptides with overlapping sequences; different proteases with different cleavage site rules can be used to make de novo antibody protein sequencing a success.
Less specific proteases such as chymotrypsin and pepsin generate more overlapping peptides than more specific proteases such as trypsin and Lys-C. This explains why studies have shown that employing a lower number of proteases can result in a higher amino acid coverage.
However, we observed that the presence of proline can result in inefficient cutting by trypsin, chymotrypsin, Asp-N, and pepsin. The recently commercially available A/P protease is capable of cutting peptides at C1 proline sites. Our findings show that the A/P protease can be used as a complementary tool to de novo protein sequencing. Particularly, in the case of antibody sequencing, additional proteases will be important for targeting conserved amino acid or specific motifs to facilitate their sequencing. We found that this was especially important for the CDR3 region, which is often a difficult-to-sequence antibody area.
This case study was adapted, with permission, from Le Bihan, T., Taylor, P., McDonald, Z., Liu, Q., Shen, J., Gorospe, K., Xu, X., Hosfield, C., Ma, B. (2019). Increased De Novo Protein Sequencing Coverage with Optimal Protease Cocktail. ASMS 2019 Atlanta, TP 020, with permission.
Talk to Our Scientists.
We Have Sequenced 9000+ Antibodies and We Are Eager to Help You.
Through next generation protein sequencing, Rapid Novor enables reliable discovery and development of novel reagents, diagnostics, and therapeutics. Thanks to our Next Generation Protein Sequencing and antibody discovery services, researchers have furthered thousands of projects, patented antibody therapeutics, and developed the first recombinant polyclonal antibody diagnostics.
Talk to Our Scientists.
We Have Sequenced 9000+ Antibodies and We Are Eager to Help You.
Through next generation protein sequencing, Rapid Novor enables timely and reliable discovery and development of novel reagents, diagnostics, and therapeutics. Thanks to our Next Generation Protein Sequencing and antibody discovery services, researchers have furthered thousands of projects, patented antibody therapeutics, and ran the first recombinant polyclonal antibody diagnostics