热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

将函数应用于R中面板数据中的滚动窗口-applyfunctiontorollingwindowinpaneldatainR

Imtryingtoapplyafunction(saystandarddeviation)inarollingwindow,bycategory:我正在尝试在滚动窗口

I'm trying to apply a function (say standard deviation) in a rolling window, by category:

我正在尝试在滚动窗口中按类别应用一个函数(比如标准差):

I have the following data:

我有以下数据:

cat = c("A", "A", "A", "A", "B", "B", "B", "B") 
year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993) 
value = c(2, 3, 5, 6, 8, 9, 4, 5) 
df = data.frame(cat, year, value)

I would like to create a new column (say sd) that estimates the standard deviation over two year window by cat.

我想要创建一个新的列(比如sd),它估计了cat在两年内的标准偏差。

Here's the result I'm thinking of:

这是我想到的结果:

enter image description here

Any advice on how to achieve this?

关于如何实现这一点,有什么建议吗?

1 个解决方案

#1


1  

It can be done by using rollapply from the zoo package:

可以通过使用来自动物园包装的rollapply来完成:

library(zoo)

cat = c("A", "A", "A", "A", "B", "B", "B", "B") 
year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993) 
value = c(2, 3, 5, 6, 8, 9, 4, 5) 
df = data.frame(cat, year, value)

df$stdev <- unlist(by(df, df$cat, function(x) {
  c(NA, rollapply(x$value, width=2, sd))
}), use.names=FALSE)

print(df)
##   cat year value     stdev
## 1   A 1990     2        NA
## 2   A 1991     3 0.7071068
## 3   A 1992     5 1.4142136
## 4   A 1993     6 0.7071068
## 5   B 1990     8        NA
## 6   B 1991     9 0.7071068
## 7   B 1992     4 3.5355339
## 8   B 1993     5 0.7071068

You can also do it with ddply if you'd rather use plyr functions than by:

如果您更愿意使用plyr函数而不是by:

df$stdev <- ddply(df, .(cat), summarise, 
                  stdev=c(NA, rollapply(value, width=2, sd)))$stdev

As a lark, I did a system.time (multiple times) comparison of the above two methods and also the ave method pointed out by @thelatemail in the comment thread below this answer (starting with a "fresh" copy of the data frame).

作为一个云雀,我做了一个系统。以上两种方法的时间(多次)比较以及@thelatemail在下面的注释线程中指出的ave方法(以数据帧的“新鲜”副本开始)。

df <- data.frame(cat, year, value)
system.time(df$stdev <- with(df, ave(value, cat, FUN=function(x) c(NA, rollapply(x, width=2, sd)))))

df <- data.frame(cat, year, value)
system.time(df$stdev <- unlist(by(df, df$cat, function(x) c(NA, rollapply(x$value, width=2, sd))), use.names=FALSE))

df <- data.frame(cat, year, value)
system.time(df$stdev <- ddply(df, .(cat), summarise, stdev=c(NA, rollapply(value, width=2, sd)))$stdev)

Both the ave and by methods take:

ave和by方法都采取:

   user  system elapsed 
  0.002   0.000   0.002 

and the ddply version takes:

ddply版本采用:

   user  system elapsed 
  0.004   0.000   0.004 

Not that speed is really an issue here, but it looks like the ave and by versions are the most efficient ways to do this.

在这里,速度并不是真正的问题,但是看起来ave和by版本是最有效的方法。


推荐阅读
author-avatar
nicknick-AUG
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有