1
It can be done by using rollapply
from the zoo
package:
可以通过使用来自动物园包装的rollapply来完成:
library(zoo)
cat = c("A", "A", "A", "A", "B", "B", "B", "B")
year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993)
value = c(2, 3, 5, 6, 8, 9, 4, 5)
df = data.frame(cat, year, value)
df$stdev <- unlist(by(df, df$cat, function(x) {
c(NA, rollapply(x$value, width=2, sd))
}), use.names=FALSE)
print(df)
## cat year value stdev
## 1 A 1990 2 NA
## 2 A 1991 3 0.7071068
## 3 A 1992 5 1.4142136
## 4 A 1993 6 0.7071068
## 5 B 1990 8 NA
## 6 B 1991 9 0.7071068
## 7 B 1992 4 3.5355339
## 8 B 1993 5 0.7071068
You can also do it with ddply
if you'd rather use plyr
functions than by
:
如果您更愿意使用plyr函数而不是by:
df$stdev <- ddply(df, .(cat), summarise,
stdev=c(NA, rollapply(value, width=2, sd)))$stdev
As a lark, I did a system.time
(multiple times) comparison of the above two methods and also the ave
method pointed out by @thelatemail in the comment thread below this answer (starting with a "fresh" copy of the data frame).
作为一个云雀,我做了一个系统。以上两种方法的时间(多次)比较以及@thelatemail在下面的注释线程中指出的ave方法(以数据帧的“新鲜”副本开始)。
df <- data.frame(cat, year, value)
system.time(df$stdev <- with(df, ave(value, cat, FUN=function(x) c(NA, rollapply(x, width=2, sd)))))
df <- data.frame(cat, year, value)
system.time(df$stdev <- unlist(by(df, df$cat, function(x) c(NA, rollapply(x$value, width=2, sd))), use.names=FALSE))
df <- data.frame(cat, year, value)
system.time(df$stdev <- ddply(df, .(cat), summarise, stdev=c(NA, rollapply(value, width=2, sd)))$stdev)
Both the ave
and by
methods take:
ave和by方法都采取:
user system elapsed
0.002 0.000 0.002
and the ddply
version takes:
ddply版本采用:
user system elapsed
0.004 0.000 0.004
Not that speed is really an issue here, but it looks like the ave
and by
versions are the most efficient ways to do this.
在这里,速度并不是真正的问题,但是看起来ave和by版本是最有效的方法。