2. 例子
示例数据:set.seed(123)
dat = data.frame(ID = paste0("ID_",1:10),y1 = rnorm(10),y2=rnorm(10),y3=rnorm(10),y4 = rnorm(10))
dat
结果> datID y1 y2 y3 y4
1 ID_1 -0.56047565 1.2240818 -1.0678237 0.42646422
2 ID_2 -0.23017749 0.3598138 -0.2179749 -0.29507148
3 ID_3 1.55870831 0.4007715 -1.0260044 0.89512566
4 ID_4 0.07050839 0.1106827 -0.7288912 0.87813349
5 ID_5 0.12928774 -0.5558411 -0.6250393 0.82158108
6 ID_6 1.71506499 1.7869131 -1.6866933 0.68864025
7 ID_7 0.46091621 0.4978505 0.8377870 0.55391765
8 ID_8 -1.26506123 -1.9666172 0.1533731 -0.06191171
9 ID_9 -0.68685285 0.7013559 -1.1381369 -0.30596266
10 ID_10 -0.44566197 -0.4727914 1.2538149 -0.38047100
3. 变为三列:ID,trait,y
melt 代码
re1 = melt(data = dat,id.vars=c("ID"),variable.name="Loc",value.name="y")
head(re1)
结果预览> head(re1)ID Loc y
1 ID_1 y1 -0.56047565
2 ID_2 y1 -0.23017749
3 ID_3 y1 1.55870831
4 ID_4 y1 0.07050839
5 ID_5 y1 0.12928774
6 ID_6 y1 1.71506499
4. dcast代码
dcast(data=re1,ID ~Loc)
结果> dcast(data=re1,ID ~Loc)
Using 'y' as value column. Use 'value.var' to overrideID y1 y2 y3 y4
1 ID_1 -0.56047565 1.2240818 -1.0678237 0.42646422
2 ID_10 -0.44566197 -0.4727914 1.2538149 -0.38047100
3 ID_2 -0.23017749 0.3598138 -0.2179749 -0.29507148
4 ID_3 1.55870831 0.4007715 -1.0260044 0.89512566
5 ID_4 0.07050839 0.1106827 -0.7288912 0.87813349
6 ID_5 0.12928774 -0.5558411 -0.6250393 0.82158108
7 ID_6 1.71506499 1.7869131 -1.6866933 0.68864025
8 ID_7 0.46091621 0.4978505 0.8377870 0.55391765
9 ID_8 -1.26506123 -1.9666172 0.1533731 -0.06191171
10 ID_9 -0.68685285 0.7013559 -1.1381369 -0.30596266
5.命令解析
melt是融合的意思,将宽的数据,变为长的数据。比如在田间数据中,ID,Loc,rep1, rep2, re3,这里的rep1,rep2,rep3是重复1,2,3的值,需要将数据变为:ID,Loc,Rep,y四列的数据。这样就可以用melt命令
melt(dat,c("ID","Loc"))
> ex1 = data.frame(Cul = rep(1:10,2),Loc=rep(1:2,each=10),rep1=rnorm(20),rep2=rnorm(20),rep3=rnorm(20))
> head(ex1)Cul Loc rep1 rep2 rep3
1 1 1 -0.71040656 0.1176466 0.7017843
2 2 1 0.25688371 -0.9474746 -0.2621975
3 3 1 -0.24669188 -0.4905574 -1.5721442
4 4 1 -0.34754260 -0.2560922 -1.5146677
5 5 1 -0.95161857 1.8438620 -1.6015362
6 6 1 -0.04502772 -0.6519499 -0.5309065
> ex1_re = melt(ex1,c("Cul","Loc"))
> head(ex1_re)Cul Loc variable value
1 1 1 rep1 -0.71040656
2 2 1 rep1 0.25688371
3 3 1 rep1 -0.24669188
4 4 1 rep1 -0.34754260
5 5 1 rep1 -0.95161857
6 6 1 rep1 -0.04502772
dcast是长数据,变宽数据,因此ex1_re如果想要变回去,用dcast(ex1_re, Cul + Loc ~ variable), ~号左边是保持不变的列名,~右边是需要扩展的列名, 省略的value是需要填充的数据。
> dcast(ex1_re,Cul+Loc~variable)Cul Loc rep1 rep2 rep3
1 1 1 -0.71040656 0.11764660 0.7017843
2 1 2 -0.57534696 1.44455086 0.7877388
3 2 1 0.25688371 -0.94747461 -0.2621975
4 2 2 0.60796432 0.45150405 0.7690422
5 3 1 -0.24669188 -0.49055744 -1.5721442
6 3 2 -1.61788271 0.04123292 0.3322026
7 4 1 -0.34754260 -0.25609219 -1.5146677
8 4 2 -0.05556197 -0.42249683 -1.0083766
9 5 1 -0.95161857 1.84386201 -1.6015362
10 5 2 0.51940720 -2.05324722 -0.1194526
11 6 1 -0.04502772 -0.65194990 -0.5309065
12 6 2 0.30115336 1.13133721 -0.2803953
13 7 1 -0.78490447 0.23538657 -1.4617556
14 7 2 0.10567619 -1.46064007 0.5629895
15 8 1 -1.66794194 0.07796085 0.6879168
16 8 2 -0.64070601 0.73994751 -0.3724388
17 9 1 -0.38022652 -0.96185663 2.1001089
18 9 2 -0.84970435 1.90910357 0.9769734
19 10 1 0.91899661 -0.07130809 -1.2870305
本文分享自微信公众号 - 育种数据分析之放飞自我(R-breeding),作者: