Skip to contents

Adapted from glab.library::PCA_from_file.

Usage

run_PCA(
  df,
  savename = NULL,
  summary = FALSE,
  center = TRUE,
  scale = FALSE,
  tol = 0.05,
  rank = NULL,
  screeplot = TRUE
)

Arguments

df

(path to) numeric dataframe; samples as columns, genes/features as rows

savename

string; filepath (no ext.) to save PCA scores, loadings, sdev under

summary

logical; output summary info

center

logical; indicate whether the variables should be shifted to be zero centered

scale

logical; indicate whether the variables should be scaled to have unit variance

tol

numeric; indicate the magnitude below which components should be omitted

rank

integer; a number specifying the maximal rank, i.e., maximal number of principal components to be used

screeplot

logical; output + save screeplot?

Value

prcomp obj

Details

In general, Z-score standardization (center = T; scale = T) before PCA is advised. For (transformed) gene expression data, genearlly, center but don't scale.

center = T: PCA maximizes the sum-of-squared deviations from the origin in the first PC. Variance is only maximized if the data is pre-centered.

scale = T: If one feature varies more than others, the feature will dominate resulting principal components. Scaling will also result in components in the same order of magnitude.

Use either tol or rank, but not both.

Examples

data(iris)
Rubrary::run_PCA(t(iris[,c(1:4)]))
#> ** Cumulative var. exp. >= 80% at PC 1 (92.5%)

#> Standard deviations (1, .., p=4):
#> [1] 2.0562689 0.4926162 0.2796596 0.1543862
#> 
#> Rotation (n x k) = (4 x 4):
#>                      PC1         PC2         PC3        PC4
#> Sepal.Length  0.36138659 -0.65658877  0.58202985  0.3154872
#> Sepal.Width  -0.08452251 -0.73016143 -0.59791083 -0.3197231
#> Petal.Length  0.85667061  0.17337266 -0.07623608 -0.4798390
#> Petal.Width   0.35828920  0.07548102 -0.54583143  0.7536574