Introducing CCSondemand and the power of machine learning for CCS prediction

Collision cross section (CCS) values have long been established as a complementary identification metric that can aid the characterization of complex samples. The question is whether CCS prediction, in the absence of experimental data, can significantly help customers during semi-targeted and untargeted screening of compounds.

To find out, Waters has worked in close consultation with customers to develop its CCSondemand software. The research-grade model has been developed using machine learning algorithms that can significantly accelerate the prediction of CCS values. In this blog post, Waters scientists, Mike McCullagh and Russell Mortishire-Smith discuss the fundamentals of CCS prediction and the exciting potential of the CCSondemand software.

Russell Mortishire-Smith
Mike McCullagh

Where are CCS predictions most useful?

Mike: There are a large number of application areas where it can be utilized, from metabolomics, natural product analysis, impurity profiling to petroleomics. CCS prediction can be used for characterizing unknowns, where there are a number of possible chemical explanations for a given component. It is particularly useful for cases without standards, as it can ultimately help you hone-in on the correct identification by providing scope to narrow down the possibilities. CCS prediction, therefore, supplies a “sanity check” in these situations.

Russell: I completely agree that a “sanity check” is the key phrase here. The prediction gives you a reference point to anchor your measurement, meaning that if your measurement and predicted value are a long way apart, then the structure you are working from is probably incorrect. On the other hand, if they’re close together you can have confidence your answer is at least plausible.

What was the inspiration behind the CCSondemand project?

Russell: Ever since Waters initially deployed instruments that were able to measure CCS values, we have received questions from customers regarding our ability to predict CCS values from structures. Having watched the literature develop over the last couple of years, it became obvious that the science of CCS prediction via machine learning had matured to a place where it was practically useful. It therefore seemed like the right time to build our own model to understand for ourselves how CCS predictions can be used to inform customer decision-making.

The biggest challenge we had was collecting and curating all of the available CCS data to ensure it was as robust and reliable as possible. Once that was done, creating the first model was relatively fast, and we began looking at the best way of putting the model into the hands of users. The final stage was to implement a web browser interface to showcase and evaluate the performance of CCSondemand in a user-friendly manner.

Which instruments will CCSondemand software be used with?

Russell: We know that the CCS data from all our ion mobility-enabled instruments is consistent and essentially independent of instrument geometries. Therefore, if we consider a full customer-facing implementation, then it should be possible for this functionality to be embedded into our core software as an application on the hub. It would then be available to analyze data generated on any of our T-Wave geometry ion mobility platforms, including Vion, SYNAPT XS and Cyclic IMS.

Has the CCS prediction approach been applied successfully in research?

Mike: We have been working in collaboration with the University of San Paolo to explore the application area of medicinal plant speciation. This is a complex analysis that uses accurate mass, retention time, and CCS values to profile the variants of four Passiflora species. This involves utilizing all three experimental data inputs (CCS, m/z and retention time) to determine whether we have the same components across all four species we’ve analyzed. These are labeled “known-unknowns” because even though we don’t know the identification of the compounds, we do have the measured CCS, retention time, and accurate mass; so, we know they can be consistently detected and characterized by multiple metrics.

At this point we knew there was commonality of the known-unknowns across four different plant species; however, we were still only using four identified analytes to characterize each plant species. We used the CCS prediction software to predict CCS values for structures that had previously been identified in Passiflora species. It enabled us to quickly exclude some identifications because the measured CCS values were significantly different from the predicted values, and we could use this information to sanity check the identification. By using CCS prediction to identify compounds that are present in these samples, we went from having four identified species based on purchase standards, to around 22. This research is an example of how CCS prediction can be used to solve the real challenge of characterizing complex samples, and the approach can be applied to other areas, including natural products discovery, food profiling, nutraceutical profiling, and metabolite identification.

What do you believe are the advantages of CCS prediction?

Mike: First of all, it’s very rapid. For our project, we were able to generate CCS prediction values for 18 analytes in less than a minute. This makes the whole approach much more accessible compared to structural molecular modeling. Accessibility is also increased by reducing the need for standards, which are used to generate compound libraries. Unfortunately, it can cost thousands of pounds for a few standards and there will be many times where the standard you required is not available and it isn’t possible to synthesize the compound. CCS prediction helps to get around these problems, as the information available in the literature and predicted CCS value can be used to confirm identification. Overall, the CCS prediction software can facilitate multi-factor authentication to provide greater confidence in identification.

How important is customer input for this type of project?

Russell: Customer input is absolutely essential. As Waters scientists, we have a good understanding of the kinds of things our customers do, and the applications where our software and hardware work best. However, there’s no substitute for hearing directly from customers on the strengths and weaknesses of our ideas and products. We are asking customers from a range of markets and application areas for feedback on the CCSondemand software,and whether there are further capabilities that they would like to see in a commercial product.

How will CCS prediction software impact researchers moving forwards?

Russell: This comes back to the point Mike made right at the very beginning; CCS prediction provides a sanity check for researchers to quickly evaluate how likely a given molecular explanation is for the component being characterized. This will hopefully translate into faster decision making and a lower cost of analysis.

Want to learn more? Check out the resources below or visit our pages on the fundamentals of CCS and CCS prediction.

Further Reading

  1. McCullagh, M., Goshawk,J., Mortishire-Smith, R.J., Pereira,C.AM., Yariwake, J.H. and Vissers, J.PC. (2020). Profiling of The Known-Unknowns Passiflora Complement by Liquid Chromatography – Ion Mobility – Mass Spectrometry. Elsevier. Vol 221. pp. 1-9. https://doi.org/10.1016/j.talanta.2020.121311
  2. Yu Yanling, Hans Vissers, Kate Yu. (2020). Investigation and Performance Evaluation of a Research Prototype Tool for CCSPrediction. https://www.waters.com/nextgen/us/en/library/application-notes/2020/investigation-and-performance-evaluation-of-a-research-prototype-tool-for-ccs-prediction.html