![]() |
|
Data Formats:Example datasets available for download, including :
Zipped files (.zip) format datasets, including : Note: please refer to detailed instructions and screenshots listed below. |
Comma Separated Values (.csv) or Tab Delimited Text (.txt):These two formats are used for concentration data, peak intensity table, and MS/NMR spectral bins. Samples can be in either rows or columns. Note,
Zipped files (.zip)For NMR/MS peak list files and GC/LC-MS spectra data, users need to upload a zipped folder containing data files from different groups under study (one file per spectrum and one sub-folder for each group ). For paired comparison, users need to upload a separate text file specifying the paired information. GC/LC-MS spectra must be in either NetCDF, mzXML, or mzDATA format. The spectra should be stored in two separate folders according to their class labels then compressed into zip files. Please note, the program is not compatible with the most recent WinZip (v12.0) with default option. Make sure to select the Legacy compression (Zip 2.0 compatible) for compressing files. No space is allowed in either the folder names or the spectra names. The size limit for each uploaded zip file is 50M. Please contact the author if you wish to upload a bigger data size. The peak list data is composed of peak list files organized into separate folders named by their class labels. For example, if your data contains three groups, the peak list files should be organized into three folders accordingly. Compress these folders into a single zip file then upload them to MetaboAnalyst. NMR peak list files should contain two comma separated columns with the 1st column for peak positions (ppm) and the 2nd column for peak intensities; MS peak list files can be in either two-column (mass and intensities) or three-column format (mass, retention time and intensities), but not a mixture of both. The first line of each peak list file is reserved for column labels. The file must be saved in comma separated values (.csv) format. Samples in rows (unpaired)Each row represents data from a sample. The class label is in the second column. For unpaired comparisons, the class label can either be numeric (i.e. 0/1) or character (i.e. Healthy/Disease). ![]() Samples in rows (paired)For paired comparison, there must be an even (2n) number of samples. The class labels are required to be the numeric integers between -1 and -n/2 and between 1 and n/2. Samples with class labels of the same absolute values are considered to be pairs. In the example below, Patient1_d0 and Patient1_d3 are a pair. ![]() Samples in columns (unpaired)Samples can also be in columns and where each row represents a measured variable. The class label must be in the second row. The requirements for class label is the same as that for samples in rows for both paired and unpaired comparisons. The screenshot below shows the unpaired case. ![]() Samples in columns (paired)The screenshot below shows a subset of binned NMR spectra data (bin width 0.04 ppm). In this table, the samples from controls (e.g. Contr_1) are paired with the samples from the subjects in disease (Disease_1) based on some criteria (i.e. age, weight, gender). Each sample occupies a column and the second row is used for sample labels. ![]() Peak intensity tableThe screenshot below is a LC-MS peak intensity table. Each column represents peaks from a sample. These peaks are grouped and identified by their retention time and mass. The class labels are in the second row immediately following the sample names. ![]() Time-series data only (samples can be in rows or columns)This design requires two factors: the time points row must be labeled as Time; the other label is Subject containing subject IDs across different time points.
![]() Time-series + one experimental factor (samples in rows)The screenshot illustrates the appropriate structure of a time-series data table. In this example, there are three time points (the second column) and two experimental conditions (the third column). The data shown contains 12 samples measured at three time points from 4 subjects. Two subjects are wild type (WT). The other two subjects are mutant type (MT). There are two special requirements for time series data:
![]() Time-series + one experimental factor (samples in columns)This is the same data saved in a transposed form. Note, all samples measured from the same subjects must appear consecutively and ordered by the time points.The time points row must be labeled as Time. More detailed explanations can be obtained from the example above. ![]() Two-factor independent samples (samples in rows)The screenshot below is a compound concentration table. Note, the two factors (Phenotype and Gender) follow immediately after the sample names. ![]() Two-factor independent samples (samples in columns)The screenshot below is a compound concentration table. Note, the two factors (Phenotype and Gender) follow immediately after the sample names. ![]() For NMR/MS peak list files and GC/LC-MS spectra data, users need to upload a zipped folder containing data files from different groups under study (one file per spectrum and one sub-folder for each group ). For paired comparison, users need to upload a separate text file specifying the paired information. GC/LC-MS spectra must be in either NetCDF, mzXML, or mzDATA format. The spectra should be stored in two separate folders according to their class labels then compressed into zip files. Please note, the program is not compatible with the most recent WinZip (v12.0) with its default option. Make sure to select the Legacy compression (Zip 2.0 compatible) for compressing data files. No spaces are allowed in either the folder names or the spectra names. The size limit for each uploaded zip file is 50M. For larger datasets, we recommend the XCMS Online service for raw spectra processing. The peak list data is composed of peak list files organized into separate folders named by their class labels. For example, if your data contains three groups, the peak list files should be organized into three folders accordingly. Compress these folders into a single zip file and then upload it to MetaboAnalyst. NMR peak list files should contain two comma separated columns with the 1st column for peak positions (ppm) and the 2nd column for peak intensities; MS peak list files can be in either two-column (mass and intensities) or three-column format (mass, retention time and intensities), but not a mixture of both. The first line of each peak list file is reserved for column labels. The file must be saved in comma separated values (.csv) format. The paired sample information is encoded by using both sample names (without suffix) separated by a colon ":" with one pair per line, and uploaded as a text file (.txt). The screen shot below illustrates the data structure for peak list data as well as the specifications of paired samples: ![]() Exploratory biomarker analysisThe data format is same as the one-factor data with samples in rows or columns, followed immediately by class labels. Please note, ROC curve-based biomarker analysis is only defined for two-group analysis. If your data contains multiple groups, you need to specify which two groups you want to investigate. Creating biomarker models to predict new samplesYou can create biomarker models to predict new samples (with unknown class) using the ROC Tester. To do this, you need to upload a data that contains both the samples with class labels and the samples whose class label need to be predicted (leave their sample labels empty). A screenshot is shown below. ![]() The data format is the same as the one-factor data with samples in rows or columns, followed immediately by class labels. Before uploading your data to the module, please make sure that the names of your features (compound names, spectral bins, peaks) are consistent between the individual studies. At least 25% of the features must match between the studies. Also make sure that the group labels are also consistent between the studies, i.e. Cancer and Healthy. Finally, all uploaded sample identifiers must be unique. A screenshot example is shown below: ![]() Upload your data in either a tab-deliminted (.txt) format. Make sure that the uploaded table contains three columns with these exact names: m.z, p.value, and t.score. Each feature should be in a row. If p-values have not yet been calculated for their data, users can use the exploratory statistical analysis module to upload their raw peak tables, process the data, perform t-tests or fold-change analysis, and then upload these results into the module. An example dataset is shown below: ![]() Data format overviewMetabolite or gene list data: a list of metabolite or gene IDs with optional fold-changes. Each feature should be in in a row. Please refer to the example data for further details. Metabolite/Gene list labelsIt is critical for your data to be properly labeled so they can be uploaded into the Joint Pathway Analysis or Network Explorer module. The following common metabolite and gene IDs are supported:
An example of what your data should look like in any text editor (WordPad, TextEdit) is shown in the screenshot below. ![]() |
Xia Lab @ McGill (last updated 2019-12-05) |
You will be logged off in seconds.
Do you want to continue your session? |