ProteoWizard Data Access Layer Design

The ProteoWizard data access layer library is pwiz/msdata, and the interface and data structure definitions are in MSData.hpp.

The data model is a one-to-one translation from mzML data elements to C++ structs. The root mzML element correspondes to an MSData struct, and the sub-elements correspond to structs with similar names. SpectrumList has a virtual interface, which allows for lazy evaluation backed by a data file.

The mzML controlled vocabulary (CV) is parsed at compile time, generating cv.hpp and cv.cpp. This allows CV terms to be used in a typesafe manner, and also makes the various CV relations and synonyms available to C++ client code.

Mapping from the various structs to mzML is done in the IO module, and diff calculations in Diff. Serializer_mzML and Serializer_mzXML allow serializations to/from iostreams in mzML and mzXML formats, respectively.

MSDataFile is a subclass of MSData that adds file I/O handling. MSDataFile::Reader provides a generic interface for file readers. By default, Readers for mzML, mzXML, and Thermo RAW files are provided. On instantiation with a filename, MSDataFile finds a Reader that will accept the file, and uses that Reader to fill in the internal data structures.

