Principal Component Analysis (PCA): Grouping and Visualization
When you do Principal Component Analysis (PCA) on a set of gel images, you
get a two- or three-dimensional visualization of the image set that is optimal
in certain sense, i.e. it preserves the variation as much as possible. PCA works
by taking spot intensities on every gel image and assembling them into a vector.
So an experiment of 24 gel images with 1200 spots each would be represented as a
cloud of 24 points in a space with 1200 dimensions. The goal of principal
component analysis is then to find a projection of the point cloud in two or
three-dimensional space such that as much as possible of the variation of the
point cloud is preserved. One hopes that the gels from different samples will be
in separate regions of the resulting diagram. The principal components can then
be interpreted as "typical spot pattern" or "eigengels". Their
coordinates can be analyzed in order to determine which spots are contributing
most to the variance, making them candidates for protein identification and
biological interpretation.
Click on the PCA (Principal Component Analysis) button in the toolbar, or choose from the menu.
 |
| Principal component analysis of 24 gel images in 3 dimensions. Parallels have the same color. The view can be rotated by dragging with the mouse. Again, replicates are placed close together. |
 |
| The same principal component analysis of 24 gel images, projected onto the first two principal components. Treated and control samples (reddish vs greenish colors) can be separated. |
When principal component analysis is applied to the
expression profiles, in our example we would consider a point cloud of 1200
vectors (one vector for each expression profile) with 24 dimensions (the
expression levels on the 24 gels). The result is a display of the proteins where
(hopefully) proteins with close positions are biologically related. Consider a
time series experiment, where proteins are switched on and off in stages. If
there is a "hidden parameter", such as a stage in the cell cycle, it will have a
systematic influence on the expression levels, and thus increase the variance
for the genes taking part in it. This increased variance will then become part
of the directions that are used for the projection (the principal components).
The principal components can also be called "eigenprofiles", they can be seen as
"classes of most prominent expression profiles " see, for example, Alter et al.
2000 and Holter et al. 2000.
 |
| Principal component analysis of expression profiles in three
dimensions. Differentially expressed spots were determined by t-test and highlighted
orange and blue, respectively. Inset: First principal component. |
|