R语言学习——by函数的一些理解

  • Post author:
  • Post category:其他


在做GEO数据挖掘时,有一步操作是整合表达矩阵,即多个探针对应一个基因的情况下,只保留在所有样本里面平均表达量最大的那个探针。

tmp <- by(exprSet,ids$symbol,
          function(x) rownames(x)[which.max(rowMeans(x))])

一开始不是很理解,所以去找了by函数的例子,如下。

类比了一下可以看出:


这个操作首先是根据symbol对exprSet进行分类;然后对同一类数据计算其行平均值;最后找出平均值最大的行,取其行名。

> library(stats)

# 看一下warpbreaks数据集的结构
> head(warpbreaks)
  breaks wool tension
1     26    A       L
2     30    A       L
3     54    A       L
4     25    A       L
5     70    A       L
6     52    A       L

> table(warpbreaks$wool)
 A  B 
27 27 

> table(warpbreaks$tension)
 L  M  H 
18 18 18 

# 可以看到by是根据tension的分类进行操作的
> by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
warpbreaks[, "tension"]: L
     breaks      wool 
 Min.   :14.00   A:9  
 1st Qu.:26.00   B:9  
 Median :29.50        
 Mean   :36.39        
 3rd Qu.:49.25        
 Max.   :70.00        
----------------------------------------------------------------- 
warpbreaks[, "tension"]: M
     breaks      wool 
 Min.   :12.00   A:9  
 1st Qu.:18.25   B:9  
 Median :27.00        
 Mean   :26.39        
 3rd Qu.:33.75        
 Max.   :42.00        
----------------------------------------------------------------- 
warpbreaks[, "tension"]: H
     breaks      wool 
 Min.   :10.00   A:9  
 1st Qu.:15.25   B:9  
 Median :20.50        
 Mean   :21.67        
 3rd Qu.:25.50        
 Max.   :43.00 

# warpbreaks[, -1]代表取除了第一列的数据
# 然后根据wool和tension的分类进行操作
> by(warpbreaks[, 1],   warpbreaks[, -1],       summary)
wool: A
tension: L
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  25.00   26.00   51.00   44.56   54.00   70.00 
----------------------------------------------------------------- 
wool: B
tension: L
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  14.00   20.00   29.00   28.22   31.00   44.00 
----------------------------------------------------------------- 
wool: A
tension: M
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     12      18      21      24      30      36 
----------------------------------------------------------------- 
wool: B
tension: M
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  16.00   21.00   28.00   28.78   39.00   42.00 
----------------------------------------------------------------- 
wool: A
tension: H
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.00   18.00   24.00   24.56   28.00   43.00 
----------------------------------------------------------------- 
wool: B
tension: H
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  13.00   15.00   17.00   18.78   21.00   28.00


# 根据tension进行分类,然后再对三类数据进行回归分析
> by(warpbreaks, warpbreaks[,"tension"],
+    function(x) lm(breaks ~ wool, data = x))

warpbreaks[, "tension"]: L

Call:
lm(formula = breaks ~ wool, data = x)

Coefficients:
(Intercept)        woolB  
      44.56       -16.33  

----------------------------------------------------------------- 
warpbreaks[, "tension"]: M

Call:
lm(formula = breaks ~ wool, data = x)

Coefficients:
(Intercept)        woolB  
     24.000        4.778  

----------------------------------------------------------------- 
warpbreaks[, "tension"]: H

Call:
lm(formula = breaks ~ wool, data = x)

Coefficients:
(Intercept)        woolB  
     24.556       -5.778 



版权声明:本文为narutodzx原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。