Data Formats:

Various example datasets available for different analysis purposes. You can download to inspect their formats, or scroll down for more detailed instructions.
Analysis Path Title Download Description
From LC-MS Spectra to
Feature/Compound Table
Small test spectra (mzML) IBD_small.zip A trimmed small MS1 dataset (10 samples)
Malaria raw spectra (mzML) malaria_raw.zip An experimental raw MS1 spectra dataset (15 samples)
Blood samples (mzML) blood_samples.zip A blood spectra dataset (MS1+DDA), containing MS1 and DDA-based MS2
COVID-19 dataset (mzML) swath_dia_covid.zip An experimental raw MS1+SWATH-DIA spectra COVID-19 dataset (16 samples)
From MS Peaks to
Functions
MS peak table malaria_feature_table.csv Peak table of Malaria (MTBLS665) study
MS peak list mummichog_ibd.txt A MS peak list (3 columns: m/z, p value, and t-score) for functional analysis
Multiple peak tables A1_pos.csv 3 MS peak tables from a COVID-19 study for functional meta-analysis
B1_pos.csv
C1_pos.csv
Statistics [one factor]
and
Biomarker Analysis
Concentration table cow_diet.csv A metabolite concentration table from cow rumen samples with four groups
Concentration table human_cachexia.csv A metabolite concentration table from human urine samples with two groups
Peak Intensity table lcms_table.csv A peak intensity table from mice spinal cord samples with two groups
NMR/MS spectra data nmr_bins.csv A binned spectra data for statistical analysis
mzTab 2.0-M MouseLiver_negative.mzTab mzTab 2.0-M file example data
Zipped files nmr_peaks.zip NMR data with 2 columns (chemical shift and intensity)
lcms_peaks_2col.zip MS data with 2 columns (mass and intensity)
lcms_peaks_3col.zip MS data with 3 columns (mass, retention time, and intensity)
Statistics [metadata table]
and
Covariate Analysis
Time-series data cress_time.csv Peak table of a time-series study across two conditions
cress_time_meta.csv
Data and metadata TCE_feature_table.csv A peak intensity table from a trichloroethylene (TCE) exposure study for covariate analysis. Two files included (a peak table + metadata).
TCE_metadata.csv
Multi-omics Integration     Gene and compound lists integ_genes_1.txt Integration analysis of a transcriptomics and metabolomics data (compound) from a study of COVID-19.
integ_cmpds.txt
Gene and peak lists integ_genes_2.txt Integration analysis of a transcriptomics and metabolomics data (untargeted, peaks) from a study of Malaria.
integ_peaks.txt
Protein and compound lists     integ_genes_3.txt Integration analysis of a proteomics and metabolomics data (compounds, HMDB) from a study of COVID-19.
integ_cmpds_3.txt

Comma Separated Values (.csv) or Tab Delimited Text (.txt):

These two formats are used for concentration data, peak intensity table, and MS/NMR spectral bins. Samples can be in either rows or columns. Note,
  1. Both sample or feature names must be unique and consist of a combination of common English letters, underscores and numbers for naming purpose. Latin/Greek letters are not supported.
  2. Statistical Analysis [one factor] module: for statistical analysis with one factor (two or multiple groups), class labels must immediately follow sample names; Statistical Analysis [metadata table] module: for statistical analysis with multiple factors (including time series), users need to upload a separate metadata table;
  3. For time-series data, the time-point group must be named as Time. In addition, the samples collected from the same subjects at different time points should be consecutive; For more details, please see the screenshots demo for "Metadata / Time-series").
  4. Data values (concentrations, bins, peak intensities) should contain only numeric and positive values (using empty or NA for missing values). In addition, there should not be spaces between numbers. For instance, 1 600 should be formatted as 1600, if not the value will get read as 1.

mzTab 2.0-M files (.mzTab)

MetaboAnalyst now supports the upload of mzTab files in the Statistical Analysis module. MetaboAnalyst parses both the Metadata Table (MTD) and the Small Molecule Table (SML) to a MetaboAnalyst ready data table format. From the SML, users can either choose to have their features named using the "chemical_name" or "theoretical_neutral_mass". If too many of these are missing however, the features will be named with the "SML_ID". Further, if there are duplicate names, the "SML_ID" will be appended to the end of the selected feature identifier. From the MTD, "study_variable" labeled "Blank" will be excluded from the final data table. Note that MetaboAnalyst supports only mzTab-M 2.0 files that have been validated to ensure that the files can be read by our software.

Zipped files (.zip)

For NMR/MS peak list files and GC/LC-MS spectra data, users need to upload a zipped folder containing data files from different groups under study (one file per spectrum and one sub-folder for each group ). For paired comparison, users need to upload a separate text file specifying the paired information.

GC/LC-MS spectra must be in either NetCDF, mzXML, or mzDATA format. The spectra should be stored in two separate folders according to their class labels then compressed into zip files. Please note, the program is not compatible with the most recent WinZip (v12.0) with default option. Make sure to select the Legacy compression (Zip 2.0 compatible) for compressing files. No space is allowed in either the folder names or the spectra names. The size limit for each uploaded zip file is 50M. Please contact the author if you wish to upload a bigger data size.

The peak list data is composed of peak list files organized into separate folders named by their class labels. For example, if your data contains three groups, the peak list files should be organized into three folders accordingly. Compress these folders into a single zip file then upload them to MetaboAnalyst.

NMR peak list files should contain two comma separated columns with the 1st column for peak positions (ppm) and the 2nd column for peak intensities; MS peak list files can be in either two-column (mass and intensities) or three-column format (mass, retention time and intensities), but not a mixture of both. The first line of each peak list file is reserved for column labels. The file must be saved in comma separated values (.csv) format.

NSERC CRC CFI TMIC Genome Canada Genome Quebec NIH