Frequently Asked Questions (FAQs):

  1. When should I use MetaboAnalyst?
  2. What types of input does MetaboAnalyst accept?
  3. Is the data I uploaded kept confidential?
  4. What's the recommended way to perform GC/LC-MS spectra analysis?
  5. What's the recommended way to perform LC-MS/MS spectra analysis?
  6. I received a format error when uploading compressed GC/LC-MS spectra (.zip) ?
  7. How to deal with missing values?
  8. How to identify and deal with outliers?
  9. Why should I filter baseline noises (spectral binning)?
  10. Some images did not show up after I click the corresponding tab (PCA, PLS-DA)?
  11. Can I analyze unlabeled data?
  12. How to deal with technical replications?
  13. How does "normalization by a reference sample" work?
  14. What is generalized logarithm transformation?
  15. How does "Auto/Pareto/Range scaling" work?
  16. When should I use Data editor or Data filter ?
  1. When should I use MetaboAnalyst?

    The purpose of MetaboAnalyst is to provide a free, user-friendly, and easily accessible tool for analyzing data arising from high-throughput metabolomics data. It is designed to address two common types of problems: 1) to identify features that are significantly different between two conditions (biomarker discovery); 2) to use the metabolomic data to predict the conditions under study (classification). In addition, MetaboAnalyst also provide tools for compound identification and pathway mapping for annotating significant features. Note, MetaboAnalyst is designed for analyzing a large number of samples, very few samples (less than 10) will cause some functions work improperly.

  2. What types of input does MetaboAnalyst accept?

    MetaboAnalyst accepts data from either targeted profiling (concentration tables) or metabolic fingerprinting approaches (spectral bins, peak lists) produced from either NMR, LC-MS, or GC-MS. In addition, GC/LC-MS spectra saved as open data format (NetCDF, mzDATA, mzXML) can also be processed using the XCMS packages. For sample files and format specification, please check "Data Formats" for more details.

  3. Is the data I uploaded kept confidential?

    Yes. The data files you upload for analysis as well as any analysis results, are not downloaded or examined in any way by the administrators, unless required for system maintenance and troubleshooting. All files are deleted from the server after no more than 72 hours, and no archives or backups are kept. You are advised to download your results as an zip immediately after performing an analysis.
  4. What is the recommended way to perform GC/LC-MS spectra analysis?

    MetaboAnalyst supports GC/LC-MS spectra through the popular XCMS package. However, limited by the web interface, only the most commonly used procedures and parameters are enabled. The package also offers more advanced options for tuning parameters in peak picking and alignment, as well as other spectra visualization options. These options are either computationally very intensive or requires a lot of user interactions. Therefore, users are encouraged to first perform spectra processing either using the XCMS installed in their local machine or using tools provided by the manufacturers, and then upload the processed peak list files (example) or peak intensity table (example) for analysis.

    For a detailed step-by-step instructions on how to use R and the XCMS package to process raw spectra, please read the Raw LC-MS spectra processing using R and XCMS on our tutorial page. For web-based tool, we recommend the XCMS Online service for LC-MS spectra processing; and the MetabolomeExpress for GC-MS spectra processing. After you obtain a peaklist table, you can then upload to MetaboAnalyst for statistical analysis.

  5. What's the recommended way to perform LC-MS/MS spectra analysis?

    MetaboAnalyst currently does not support LC-MS/MS spectra data analysis. For such data, users can try the recent MetFamily tool. By integrating analysis of MS(1) abundances and MS/MS spectra, the tool is able to discover regulated metabolite families in untargetted metabolomics studies. For details, please refer to their orginal paper here

  6. I received a format error when uploading compressed spectra (.zip)

    There are two possible reasons when this error happens:

    The zip files cannot be decompressed by our program. This happens if you used the most recent WinZip (v12.0) with default option for compression. Make sure to selecting the Legacy compression (Zip 2.0 compatible).

    It can also be caused by the spectra itself. They must be in NetCDF (.CDF) or mzXML format. Also, please note, the program does not handle tandem-MS files. In addition, make sure there are no other files (for example, log file or other parameters files) included in the spectra folder.

  7. How to deal with missing values?

    Depending on the type of experiments, there may be significant amount of missing values present in the data set. Missing values should be presented either as empty values or NA without quotes in order to be accepted by MetaboAnalyst. Any other symbol will be treated as string character and will cause errors during data processing. MetaboAnalyst offers a variety of methods to deal with missing values. By default, the missing values are treated as the result of low signal intensity. They will be replaced by half of the minimum positive values detected in your data. Users can also specify other methods, such as replace by mean/median, Probabilistic PCA (PPCA), Bayesian PCA (BPCA) method, or Singular Value Decomposition (SVD) method to impute the missing values (Stacklies W. et al).

  8. How to detect and deal with outlier?

    Potential outliers can be identified from PCA or PLS-DA plots. The scores plot can be used to identify sample outliers, while the loadings plot can be used to identify feature outliers. The potential outlier will distinguish itself as the one located far away from the major clusters formed by the remaining.

    To deal with outliers, the first is to check if those samples / features are measured properly. In many cases, outliers are the result of operational errors during analytical process. If those values cannot be corrected, they should be removed from analysis. MetaboAnalyst provides DataEditor to enable easy removal of sample/feature outliers. Please note, you may need to re-normalize the data after outlier removal.

  9. Why should I filter baseline noises (spectral binning)?

    For NMR spectra, there are several regions where no known compounds in biofluids have a resonance peak. The corresponding bins contain only baseline noises. In addition, when signal approaches background, the relative errors increases and conclusions based on these data will be questionable. Therefore, it is best to first remove these noises before further analysis. The current implementation supports the use of absolute cut-off threshold for acceptable signal values. The default is a percentage based cut-off in which a fixed fraction (default 25%) of the bins is discarded. Users are provided with a visual guidance to select an appropriate value. In future, the program will estimate the baseline for each spectrum and its standard deviation. Signals less than two standard deviations above the baseline will be excluded.

  10. Some images did not show up after I click the corresponding tab?

    This implies MetaboAnalyst failed to execute the command using the given parameters. Users should try to adjust parameter values. We found in most cases, the problem is associated with sample size. In particular, if the sample size is very small (below 10), some unpredictable error may happen. For instance, by default PCA and PLSDA will try to generate summary/classification/permutation plot for the top 5 components, if the sample size is too small, it will fail to do so.

  11. Can I analyze unlabeled data?

    There are several unsupervised methods (PCA, hierarchical clustering, SOM, K-means) that can be used to detect inherent patterns in unlabeled data. However you need to trick MetaboAnalyst to accept the data by providing dummy two-group labels . In this case, results from feature selection or supervised classification methods will be meaningless.

  12. How to deal with technical replications?

    This depends on the biological questions under investigation. For example, if the purpose of technical replications is to see if there is systematic variance introduced by sample handling or instrumentation, the clustering programs such as PCA or hierarchical clustering can be used to investigate whether the same technical replicates tend to group together.

    After checking the clustering pattern of these technical replicates, if users decide to merge all sample replications (i.e. by averaging them). The processed data can be downloaded and processed with a different program (i.e. spreadsheet). The data can then be re-uploaded to MetaboAnalyst for further data analysis and annotation.

  13. How does "normalization by a reference sample" work?

    Normalization by a reference sample, also known as probabilistic quotient normalization (Dieterle F et al ), is a robust method to account for different dilution effects of biofluids. This method is based on the calculation of a most probable dilution factor (median) by looking at the distribution of the quotients of the amplitudes of a test spectrum by those of a reference spectrum.

  14. What is generalized logarithm transformation (glog)?

    Generalized logarithm (glog) is a simple variation of ordinary log in order to deal with zero or negative values in the data set. It has many desirable features (for details, see Durbin BP. et al Its formula is shown below:
    where a is a constant with a default value of 1.

  15. How does "Auto/Pareto/Range scaling" work?

    Please see the following summary table by van den Berg et al . Here Si is standard deviation.

  16. When should I use Data editor or Data filter ?

    The purposes of data editor and data filter are to help improve the quality of data for better separation, prediction or interpretation. In particular, user can use data editor to remove outlier(s) which can be visually identified from PCA or PLS-DA scores plots); user can use data filter to remove noisy or uninformative features (i.e. baseline noises, near-constant-features). These features tend to dilute the signal and decrease the performance of most the statistical procedures. Be removing outliers and low-quality features, the resulting data will be more consistent and reliable.

Processing ....
Your session is about to expire!

You will be logged off in seconds.

Do you want to continue your session?

You are about to log off!

You are about to log off. If not saved, updates of the loaded project can be lost. You can still create projects as a guest from home page.

Please choose one of the following options:

Your project updates can be lost!

If you did not save the project, please do so.

Do you want to save your work first?