Decodon - Exploring Life Exploring Life.

Statistical Methods: Overview and References

Delta2D and GasPedal include an adapted and seamlessly integrated version of the TIGR Multiple Experiment Viewer (MeV), which is a free and open-source software.

Clustering

  • Clustering can be applied to samples and / or expression profiles
  • Hierarchical clustering and k-Means / k-Medians clustering
  • Supports average linkage, complete linkage, and single linkage for determining cluster-to-cluster distances
  • Supported distance metrics: Euclidean distance, Manhattan distance, Pearson correlation, Pearson uncentered correlation, Pearson squared correlation, Average dot product, Cosine correlation, Covariance, Spearman's rank correlation, Kendall's tau.
  • Construction of support trees by resampling methods: bootstrapping (resampling with replacement), and jackknifing (resampling by leaving out one observation).

HCL - Hierarchical Clustering

Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95:14863-14868.

ST - Support trees (Bootstrapping)

Graur, D., and Li, W.-H. (2000). Fundamentals of Molecular Evolution. Second Edition. Sinauer Associates, Sunderland, MA. pp 209-210.

KMC - K-Means Clustering

Soukas, A., Cohen, P.,. Socci, N.D, and Friedman, J.M. (2000). Leptin-specific patterns of gene expression in white adipose tissue. Genes Dev. 14:963-980.

Template Matching

  • Templates can be defined for expression profiles and samples.
  • Templates can be defined interactively, from a given expression profile, or from a cluster.

PTM - Template matching

Pavlidis, P., and Noble, W.S. (2001). Analysis of strain and regional variation in gene expression in mouse brain. Genome Biology 2:research0042.1-0042.15.

Principal Component Analysis (PCA): Grouping and Visualization

  • Principal component analysis is available for both samples and expression profiles.
  • Three-dimensional and two-dimensional displays are available
  • New clusters can be defined by dragging in a two-dimensional display.

Raychaudhuri, S., Stuart, J. M., and Altman, R. B. (2000). Principal components analysis to summarize microarray experiments: application to sporulation time series. Pacific Symposium on Biocomputing 2000, Honolulu, Hawaii, 452-463.

Statistical Hypothesis Testing

T-Test

  • T-tests: one-sample, between samples, paired t-test
  • Assuming equal or different group variances
  • P-values can be computed based on normal distribution or using randomization.
  • Corrections for multiple testing: Bonferroni, adjusted Bonferroni, Westfall-Young
  • Control of false discovery rate
  • Volcano Plot

Pan, W. (2002). A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18: 546-554.

Dudoit, S., Yang,Y.H., Callow, M.J., and Speed, T. (2000). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Technical report 2000 Statistics Department, University of California, Berkeley.

Welch, B.L. (1947). The generalization of ‘students’ problem when several different population variances are involved. Biometrika 34: 28-35.

ANOVA - One-way Analysis of Variance

  • P-values can be computed based on F-distribution or using randomization.
  • Corrections for multiple testing: Bonferroni, adjusted Bonferroni, Westfall-Young
  • Control of false discovery rate

Zar, J.H. 1999. Biostatistical Analysis. 4th ed. Prentice Hall, NJ.

ANOVA2 - Two-factor Analysis of Variance

Keppel, G., and Zedeck, S. (1989). Data Analysis for Research Designs. W. H. Freeman and Co., NY.

Manly, B.F.J. (1997). Randomization, Bootstrap and Monte Carlo Methods in Biology. 2nd ed. Chapman and Hall / CRC , FL.

Zar, J.H. (1999). Biostatistical Analysis. 4th ed. Prentice Hall, NJ.

Further References

Saeed, A.I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J., Klapa, M., Currier, T., Thiagarajan, M., Sturn, A., Snuffin, M., Rezantsev, A., Popov, D., Ryltsov, A., Kostukovich, E., Borisovsky, I., Liu, Z., Vinsavich, A., Trush, V., Quackenbush, J. (2003). TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003 Feb;34(2):374-8.

Alter, O., Brown, P.O., Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A 97:10101–10106.

Holter, N.S., Mitra, M., Maritan, A., Cieplak, M., Banavar, J.R., Fedoroff, N.V. (2000). Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc Natl Acad Sci U S A 97:8409-8414.

TIGR Multiple Experiment Viewer (MeV):
https://mev.tm4.org/ or T-MeV on Sourceforge

TIGR MeV manual:
Manual for version 4.0

Subscribe our Newsletter