decodon - exploring life.
-
-
-
-
-
-
-
-
-
- exploring life.
-
-
-
- solutions      
-

--------
--------
Delta2D
Protecs
Services
--------
Downloads
Documentation
Helpdesk
--------
DECODON
Jobs
Press
--------
Contact
--------
--------
--------

Principal Component Analysis (PCA): Grouping and Visualization

When you do Principal Component Analysis (PCA) on a set of gel images, you get a two- or three-dimensional visualization of the image set that is optimal in certain sense, i.e. it preserves the variation as much as possible. PCA works by taking spot intensities on every gel image and assembling them into a vector. So an experiment of 24 gel images with 1200 spots each would be represented as a cloud of 24 points in a space with 1200 dimensions. The goal of principal component analysis is then to find a projection of the point cloud in two or three-dimensional space such that as much as possible of the variation of the point cloud is preserved. One hopes that the gels from different samples will be in separate regions of the resulting diagram. The principal components can then be interpreted as "typical spot pattern" or "eigengels". Their coordinates can be analyzed in order to determine which spots are contributing most to the variance, making them candidates for protein identification and biological interpretation.

Click on the PCA (Principal Component Analysis) button in the toolbar, or choose Analysis > Data Reduction > Principal Component Analysis from the menu.

Principal component analysis of 24 gel images in 3 dimensions. Parallels have the same color. The view can be rotated by dragging with the mouse. Again, replicates are placed close together.
The same principal component analysis of 24 gel images, projected onto the first two principal components. Treated and control samples (reddish vs greenish colors) can be separated.

When principal component analysis is applied to the expression profiles, in our example we would consider a point cloud of 1200 vectors (one vector for each expression profile) with 24 dimensions (the expression levels on the 24 gels). The result is a display of the proteins where (hopefully) proteins with close positions are biologically related. Consider a time series experiment, where proteins are switched on and off in stages. If there is a "hidden parameter", such as a stage in the cell cycle, it will have a systematic influence on the expression levels, and thus increase the variance for the genes taking part in it. This increased variance will then become part of the directions that are used for the projection (the principal components). The principal components can also be called "eigenprofiles", they can be seen as "classes of most prominent expression profiles " see, for example, Alter et al. 2000 and Holter et al. 2000.

Principal component analysis of expression profiles in three dimensions. Differentially expressed spots were determined by t-test and highlighted orange and blue, respectively. Inset: First principal component.
-