The Softberry SMS program package allows to perform prediction of patient decease outcome (for example cancer/normal) based on MS data of the samples. To perform this analysis, the peak intensity of the training set of the MS data should be presented in table format. In this table rows correspond to samples, each column correspond to MS intensity for the peak groups identified at the preprocessing steps. The table can also contain additional information (that can be passed to table from additional files). For example for each sample Patient ID, time of sampling, patient status (cancer or non-cancer) can be added. Additional patient parameters that can be used for prognosis can be also added to the table as well. For example, it is known, that tumor marker serum CA125 is useful for early detection of ovarian cancer , also in combination with mass spectra data . The CreateTable program performs such a table creation. The table can be further used by CalculateLDAParameters function to calculate linear discriminant function (LDF) for sample classification.
Input: Set of MS data preprocessed by MSPreprocess function.
; calibration data; location of the calibrant markers
Output: Calibrated m/z - Intensity data in the same format as input data.
Infile - This parameter specify name of file with raw data.
FileFormat - This parameter specify file format. SSV-space separated values, CSV - comma separated values, TSV - tab separated values.
Calibration - Text file should contain calibration data in format identical to Infile data.
MZCalibration - This parameter should contain list of MZ values for calibration peaks separated by comma (calibrant markers). For example: 782.402,1047.20,1297.51
Mass separation - This parameter specify minimal mass separation for peaks assignment in the calibration data.
Mass spectra data represent the sets of following pairs of values: mass to charge relation (m/z, further, for more convenience, it will be referred to as m, mass) and corresponding signal intensity (I). On a spectrum plot, the mass corresponds to X coordinate, and signal intensity- to Y one. A typical spectrum consists of several thousand of such value pairs (points). Data are represented as text files, where for each pair (mi,Ii) of mass-intensity values the string is assigned, and data in this string are separated by special separator symbol. The SMS package allows several separators types: space (SSV, space separated values, file format), comma (CSV, comma separated values, file format) and tabulation (TSV, tab-separated values, file format). In files with data, the string with comments are allowed; during the file reading these strings are to be skipped. The commentary strings should begin with "#" symbol at the first position. In the figure 2 the example of file with data in CSV format is shown.
#M/Z,Intensity -7.8602611e-005,4.1126194 2.1773576e-007,4.0764203 9.6021472e-005,4.0040221 0.00036601382,4.1186526 0.00081019477,4.0040221 0.0014285643,3.9617898 .... 19742.941,4.077895 19745.564,4.0772248 19748.187,4.0772248
Figure 2. Example file with mass spectra data in CSV format.