作者:手机用户2602940113 | 来源:互联网 | 2024-09-26 19:03
IvedonesearchingsimilarproblemsandIhaveavagueideaaboutwhatshouldIdo:tovectorizeev
I've done searching similar problems and I have a vague idea about what should I do: to vectorize everything or use apply()
family. But I'm a beginner on R programming and both of the above methods are quite confusing.
我已经完成了类似的问题搜索,我对我应该做什么有一个模糊的想法:矢量化一切或使用apply()系列。但我是R编程的初学者,上述两种方法都令人困惑。
Here is my source code:
这是我的源代码:
x<-rlnorm(100,0,1.6)
j=0
k=0
i=0
h=0
lambda<-rep(0,200)
sum1<-rep(0,200)
cOnstjk=0
wj=0
wk=0
for (h in 1:200)
{
lambda[h]=2+h/12.5
N=ceiling(lambda[h]*max(x))
for (j in 0:N)
{
wj=(sum(x<=(j+1)/lambda[h])-sum(x<=j/lambda[h]))/100
for (k in 0:N)
{
cOnstjk=dbinom(k, j + k, 0.5)
wk=(sum(x<=(k+1)/lambda[h])-sum(x<=k/lambda[h]))/100
sum1[h]=sum1[h]+(lambda[h]/2)*constjk*wk*wj
}
}
}
Let me explain a bit. I want to collect 200 sum1 values (that's the first loop), and for every sum1 value, it is the summation of (lambda[h]/2)*constjk*wk*wj
, thus the other two loops. Most tedious is that N changes with h, so I have no idea how to vectorize the j-loop and the k-loop. But of course I can vectorize the h-loop with lambda<-seq()
and N<-ceiling()
, and that's the best I can do. Is there a way to further simplify the code?
让我解释一下。我想收集200个sum1值(这是第一个循环),对于每个sum1值,它是(lambda [h] / 2)* constjk * wk * wj的总和,因此是另外两个循环。最乏味的是N随h变化,所以我不知道如何对j循环和k循环进行向量化。但是我当然可以使用lambda <-seq()和N <-ceiling()来矢量化h循环,这是我能做的最好的。有没有办法进一步简化代码?
2 个解决方案
5
Your code can be perfectly verctorized with 3 nested sapply
calls. It might be a bit hard to read for the untrained eye, but the essence of it is that instead of adding one value at a time to sum1[h]
we calculate all the terms produced by the innermost loop in one go and sum them up.
您可以使用3个嵌套的sapply调用对您的代码进行完美的验证。对于未经训练的眼睛来说可能有点难以阅读,但其实质是不是一次向sum1 [h]添加一个值,而是一次性计算最内层循环产生的所有项并将它们相加。
Although this vectorized solution is faster than your tripple for
loop, the improvement is not dramatical. If you plan to use it many times I suggest you implement it in C or Fortran (with regular for
loops), which improves the speed a lot. Beware though that it has high time complexity and will scale badly with increased values of lambda
, ultimatelly reaching a point when it is not possible to compute within reasonable time regardless of the implementation.
虽然这种矢量化解决方案比你的循环算法更快,但改进并不是很明显。如果你打算多次使用它,我建议你用C或Fortran实现它(使用常规for循环),这样可以提高速度。请注意,虽然它具有很高的时间复杂度,并且随着lambda值的增加会严重缩放,但是无论实现如何,都无法在合理的时间内进行计算。
lambda <- 2 + 1:200/12.5
sum1 <- sapply(lambda, function(l){
N <- ceiling(l*max(x))
sum(sapply(0:N, function(j){
wj <- (sum(x <= (j+1)/l) - sum(x <= j/l))/100
sum(sapply(0:N, function(k){
constjk <- dbinom(k, j + k, 0.5)
wk <- (sum(x <= (k+1)/l) - sum(x <= k/l))/100
l/2*constjk*wk*wj
}))
}))
})
Btw, you don't need to predefine variables like h
, j
, k
, wj
and wk
. Especially since not when vectorizing, as assignments to them inside the functions fed to sapply
will create overlayered local variables with the same name (i.e. ignoring the ones you predefied).
顺便说一下,你不需要预定义像h,j,k,wj和wk这样的变量。特别是因为在向量化时没有,因为在向sapply提供的函数内对它们的赋值将创建具有相同名称的重叠局部变量(即忽略你预先定义的那些)。
2
Let`s wrap your simulation in a function and time it:
让我们在一个函数中包装您的模拟并计时:
sim1 <- function(num=20){
set.seed(42)
x<-rlnorm(100,0,1.6)
j=0
k=0
i=0
h=0
lambda<-rep(0,num)
sum1<-rep(0,num)
cOnstjk=0
wj=0
wk=0
for (h in 1:num)
{
lambda[h]=2+h/12.5
N=ceiling(lambda[h]*max(x))
for (j in 0:N)
{
wj=(sum(x<=(j+1)/lambda[h])-sum(x<=j/lambda[h]))/100
for (k in 0:N)
{
set.seed(42)
cOnstjk=dbinom(k, j + k, 0.5)
wk=(sum(x<=(k+1)/lambda[h])-sum(x<=k/lambda[h]))/100
sum1[h]=sum1[h]+(lambda[h]/2)*constjk*wk*wj
}
}
}
sum1
}
system.time(res1 <- sim1())
# user system elapsed
# 5.4 0.0 5.4
Now let's make it faster:
现在让我们让它更快:
sim2 <- function(num=20){
set.seed(42) #to make it reproducible
x <- rlnorm(100,0,1.6)
h <- 1:num
sum1 <- numeric(num)
lambda <- 2+1:num/12.5
N <- ceiling(lambda*max(x))
#functions for wj and wk
wjfun <- function(x,j,lambda,h){
(sum(x<=(j+1)/lambda[h])-sum(x<=j/lambda[h]))/100
}
wkfun <- function(x,k,lambda,h){
(sum(x<=(k+1)/lambda[h])-sum(x<=k/lambda[h]))/100
}
#function to calculate values of sum1
fun1 <- function(N,h,x,lambda) {
sum1 <- 0
set.seed(42) #to make it reproducible
#calculate constants using outer
const <- outer(0:N[h],0:N[h],FUN=function(j,k) dbinom(k, j + k, 0.5))
wk <- numeric(N[h]+1)
#loop only once to calculate wk
for (k in 0:N[h]){
wk[k+1] <- (sum(x<=(k+1)/lambda[h])-sum(x<=k/lambda[h]))/100
}
for (j in 0:N[h])
{
wj <- (sum(x<=(j+1)/lambda[h])-sum(x<=j/lambda[h]))/100
for (k in 0:N[h])
{
sum1 <- sum1+(lambda[h]/2)*const[j+1,k+1]*wk[k+1]*wj
}
}
sum1
}
for (h in 1:num)
{
sum1[h] <- fun1(N,h,x,lambda)
}
sum1
}
system.time(res2 <- sim2())
#user system elapsed
#1.25 0.00 1.25
all.equal(res1,res2)
#[1] TRUE
Timings for @Backlin`s code (with 20 interations) for comparison:
@ Backlin的代码(有20个互动)的时间进行比较:
user system elapsed
3.30 0.00 3.29
If this is still too slow and you cannot or don't want to use another language, there is also the possibility of parallelization. As far as I see the outer loop is embarrassingly parallel. There are some nice and easy packages for parallelization.
如果这仍然太慢并且您不能或不想使用其他语言,则还可以并行化。据我所知,外环是令人尴尬的平行。有一些很好的和简单的并行化包。