在做GEO数据挖掘时,有一步操作是整合表达矩阵,即多个探针对应一个基因的情况下,只保留在所有样本里面平均表达量最大的那个探针。
tmp <- by(exprSet,ids$symbol,
function(x) rownames(x)[which.max(rowMeans(x))])
一开始不是很理解,所以去找了by函数的例子,如下。
类比了一下可以看出:
这个操作首先是根据symbol对exprSet进行分类;然后对同一类数据计算其行平均值;最后找出平均值最大的行,取其行名。
> library(stats)
# 看一下warpbreaks数据集的结构
> head(warpbreaks)
breaks wool tension
1 26 A L
2 30 A L
3 54 A L
4 25 A L
5 70 A L
6 52 A L
> table(warpbreaks$wool)
A B
27 27
> table(warpbreaks$tension)
L M H
18 18 18
# 可以看到by是根据tension的分类进行操作的
> by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
warpbreaks[, "tension"]: L
breaks wool
Min. :14.00 A:9
1st Qu.:26.00 B:9
Median :29.50
Mean :36.39
3rd Qu.:49.25
Max. :70.00
-----------------------------------------------------------------
warpbreaks[, "tension"]: M
breaks wool
Min. :12.00 A:9
1st Qu.:18.25 B:9
Median :27.00
Mean :26.39
3rd Qu.:33.75
Max. :42.00
-----------------------------------------------------------------
warpbreaks[, "tension"]: H
breaks wool
Min. :10.00 A:9
1st Qu.:15.25 B:9
Median :20.50
Mean :21.67
3rd Qu.:25.50
Max. :43.00
# warpbreaks[, -1]代表取除了第一列的数据
# 然后根据wool和tension的分类进行操作
> by(warpbreaks[, 1], warpbreaks[, -1], summary)
wool: A
tension: L
Min. 1st Qu. Median Mean 3rd Qu. Max.
25.00 26.00 51.00 44.56 54.00 70.00
-----------------------------------------------------------------
wool: B
tension: L
Min. 1st Qu. Median Mean 3rd Qu. Max.
14.00 20.00 29.00 28.22 31.00 44.00
-----------------------------------------------------------------
wool: A
tension: M
Min. 1st Qu. Median Mean 3rd Qu. Max.
12 18 21 24 30 36
-----------------------------------------------------------------
wool: B
tension: M
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.00 21.00 28.00 28.78 39.00 42.00
-----------------------------------------------------------------
wool: A
tension: H
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.00 18.00 24.00 24.56 28.00 43.00
-----------------------------------------------------------------
wool: B
tension: H
Min. 1st Qu. Median Mean 3rd Qu. Max.
13.00 15.00 17.00 18.78 21.00 28.00
# 根据tension进行分类,然后再对三类数据进行回归分析
> by(warpbreaks, warpbreaks[,"tension"],
+ function(x) lm(breaks ~ wool, data = x))
warpbreaks[, "tension"]: L
Call:
lm(formula = breaks ~ wool, data = x)
Coefficients:
(Intercept) woolB
44.56 -16.33
-----------------------------------------------------------------
warpbreaks[, "tension"]: M
Call:
lm(formula = breaks ~ wool, data = x)
Coefficients:
(Intercept) woolB
24.000 4.778
-----------------------------------------------------------------
warpbreaks[, "tension"]: H
Call:
lm(formula = breaks ~ wool, data = x)
Coefficients:
(Intercept) woolB
24.556 -5.778
版权声明:本文为narutodzx原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。