  C RUBY-ON-RAILS MYSQL ASP.NET DEVELOPMENT RUBY .NET LINUX SQL-SERVER REGEX WINDOWS ALGORITHM ECLIPSE VISUAL-STUDIO STRING SVN PERFORMANCE APACHE-FLEX UNIT-TESTING SECURITY LINQ UNIX MATH EMAIL OOP LANGUAGE-AGNOSTIC VB6 MSBUILD # 3 nested loops: Optimizing a simple simulation for speed  » python » 3 nested loops: Optimizing a simple simulation for speed

By : Wild Shovel
Date : October 23 2020, 08:10 PM
hop of those help? You can vectorize your inner loop by generating n random integers all at once with numpy (much faster), and get rid of all your if statements using arithmatic instead of boolean logic. code :
``````while...:
#population changes by (-1, 0, +1, +2) for each alien
n += np.random.randint(-1,3, size=n).sum()
``````
``````from time import time
import numpy as np
from numpy.random import randint
from numba import njit, int32, prange

@njit('i4(i4)')
def simulate(pop_max): #move simulation of one population to a function for parallelization
n = 1
while 0 < n < pop_max:
n += np.sum(randint(-1,3,n))
return n

@njit('i4[:](i4,i4)', parallel=True)
def solve(pop_max, iter_max):
#this could be easily simplified to just return the raio of populations that die off vs survive to pop_max
# which would save you some ram (though the speed is about the same)
results = np.zeros(iter_max, dtype=int32) #numba needs int32 here rather than python int
for i in prange(iter_max): #prange specifies that this loop can be parallelized
results[i] = simulate(pop_max)
return results

pop_max = 100
iter_max = 100000

t = time()
print( np.bincount(solve(pop_max, iter_max)) / iter_max )
print('time elapsed: ', time()-t)
`````` ## Optimizing nested loops

By : Kevin
Date : March 29 2020, 07:55 AM
this will help One type of optimization is loop unrolling. Occasionally the pipeline needs to stall because there is a lot of activity around obtaining the index, updating it, and storing it back in memory. This is probably the primary reason your multithreaded implementation didn't do well, all the threads probably were fighting over access to the index.
If you want to reattempt a multithreaded implementation, have each thread know it's "offset" based on the thread count, and have each thread process a different remainder discovered via modulus division
code :
``````thread 0 works on i*rows+j % (thread count) = 0
(and so on)
``````
``````fDeepCopy(F2D* in)
{
int i, j;
F2D* out;
int rows, cols;

rows = in->height;
cols = in->width;

out = fMallocHandle(rows, cols);

for(i=0; i<rows; i++) {
// rewrite to ensure we don't walk off "4 long" pads
int j = 0;
int pads = (cols / 4)*4;
for(; j < pads; j = j + 4) {
subsref(out,i,j) = subsref(in,i,j);
subsref(out,i,j+1) = subsref(in,i,j+1);
subsref(out,i,j+2) = subsref(in,i,j+2);
subsref(out,i,j+3) = subsref(in,i,j+3);
}
// process the remainders
for(; j < pads; j++) {
subsref(out,i,j) = subsref(in,i,j);
}
}
return out;
}
``````
``````vec_res.x = v1.x + v2.x;
vec_res.y = v1.y + v2.y;
vec_res.z = v1.z + v2.z;
vec_res.w = v1.w + v2.w;
``````
``````movaps xmm0, [v1]          ;xmm0 = v1.w | v1.z | v1.y | v1.x
addps xmm0, [v2]           ;xmm0 = v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x
movaps [vec_res], xmm0
`````` ## speed up a monte carlo simulation with nested loop in R

By : JH P
Date : March 29 2020, 07:55 AM
around this issue The following should be faster.... but if you're locking up when A is large that might be a memory issue and the following is more memory intensive. More information, like what banks is, what x is, y, where you get dea from, and what the purpose is would be helpful.
Essentially all I've done is try to move as much as I can out of the inner loop. The shorter that is, the better off you'll be.
code :
``````A <- nrow(banks)
effm <- matrix(nrow = A, ncol = 2)
m <- 20
B <- 100
pb <- txtProgressBar(min = 0,
max = A, style=3)
for(a in 1:A) {
x1 <- x[-a,]
y1 <- y[-a,]
theta <- numeric(B)
xrefm <- x1[sample(1:nrow(x1), m * B, replace=TRUE),] # get all of your samples at once
yrefm <- y1[sample(1:nrow(y1), m * B, replace=TRUE),]
deaX <- matrix(x[a,], ncol=3)
deaY <- matrix(y[a,], ncol=3)

for(i in 1:B){
theta[i] <- dea(deaX, deaY, RTS = 'vrs', ORIENTATION = 'graph',
xrefm[(1:m) + (i-1) * m,], yrefm[(1:m) + (i-1) * m,], FAST=TRUE)
}

effm[a,1] <- mean(theta)
effm[a,2] <- sd(theta) / sqrt(B)
setTxtProgressBar(pb, a)
}
close(pb)
effm
`````` ## Using apply for simulation instead of nested for loops

By : INeedHelp
Date : March 29 2020, 07:55 AM
will help you You can improve the speed of your function by using data.table. However, you would still have to use for loops (which is not a bad thing).
code :
``````library(data.table)
simdiffuse <- function(a, b, c, d) {

endo <- 1/a        # innovation endogenous effect
endomacro <- 1/b   # category endogenous effect
appeal <- c        # innovation's ex ante appeal
ninnov <- d        # number of innovations in category

results <- data.table(catdensity = rep(0:ninnov, each = 25), t = 1:25,
endo = endo, endomacro = endomacro, appeal = appeal,

for (cc in 0:ninnov) {
diff <- data.table(prop = rnorm(1000), adopt = c(rep(1,5), rep(0, 995)))
for (tt in 1:25) {
diff[, rr := rnorm(1, prop), by="prop"]
diff[appeal + mean(adopt) * endo + cc * endomacro > rr, adopt := 1]
}
}
return(results)
}

results <- simdiffuse(.2, 20, -3, 60)
`````` ## simulation in R with nested loops run slow

By : Diana María Aristizá
Date : March 29 2020, 07:55 AM
Does that help Indeed, it would be more efficient to encode fertility with 0 and 1, and you could even have an integer matrix.
Anyhow, the code as it stands can be simplified a lot - so here is a vectorized solution, still using your data.frame:
code :
``````NextGen <- function(agent, N, S, A) {
excess <- runif(N)
v1 <- which(agent\$fertility == "Trad" & excess < S)
nextgen.agent <- agent[c(1:N, v1), ]
nextgen.agent[c(v1, seq.int(N+1, nrow(nextgen.agent))), "fertility"] <- ifelse(A > runif(length(v1)*2), "Plan", "Trad")
nextgen.agent
}
``````
``````agentDF <- data.frame(fertility = "Trad", lineage = 1:50, stringsAsFactors = FALSE)

# use microbenchmark library to compare performance
microbenchmark::microbenchmark(
base = {
res1 <- NextGeneration(agentDF, 50, 0.8, 0.8) # note I fixed the two variable typos in your function
},
new = {
res2 <- NextGen(agentDF, 50, 0.8, 0.8)
},
times = 100
)

## Unit: microseconds
## expr      min        lq     mean    median       uq       max neval
## base 1998.533 2163.8605 2446.561 2222.8200 2286.844 14413.173   100
##  new  282.032  304.1165  329.552  320.3255  348.488   467.217   100
`````` ## Optimizing nested for-loops

By : dliang
Date : March 29 2020, 07:55 AM
wish help you to fix your issue Here is a method that is about 10x faster than your reference code. It does nothing particularly clever, just pedestrian optimization.
code :
``````import numpy as np
import pandas as pd
df = pd.DataFrame()

np.random.seed(1)

df["A"] = np.random.randint(2, size=10)
df["B"] = np.random.randint(2, size=10)
df["C"] = np.random.randint(2, size=10)
df["D"] = np.random.randint(2, size=10)

df["AB"] = np.random.randint(2, size=10)
df["AC"] = np.random.randint(2, size=10)
df["BC"] = np.random.randint(2, size=10)
df["BD"] = np.random.randint(2, size=10)
df["CD"] = np.random.randint(2, size=10)

ls = ["A", "B", "C", "D"]

def op():
out = df.copy()
for i, a in enumerate(ls):
for j in range(i + 1, len(ls)):
b = ls[j]
for k in range(j + 1, len(ls)):
c = ls[k]
idx = a+b+c

idx_abc = (out[a]>0) & (out[b]>0) & (out[c]>0)
sum_abc = out[idx_abc][a+b] + out[idx_abc][b+c] + out[idx_abc][a+c]

out[a+b+c]=0
out.loc[sum_abc.index[sum_abc>=2], a+b+c] = 99
return out

import scipy.spatial.distance as ssd

def pp():
data = df.values
n = len(ls)
d1,d2 = np.split(data, [n], axis=1)
i,j = np.triu_indices(n,1)
d2 = d2 & d1[:,i] & d1[:,j]
k,i,j = np.ogrid[:n,:n,:n]
k,i,j = np.where((k<i)&(i<j))
lu = ssd.squareform(np.arange(n*(n-1)//2))
d3 = ((d2[:,lu[k,i]]+d2[:,lu[i,j]]+d2[:,lu[k,j]])>=2).view(np.uint8)*99
*triplets, = map("".join, combinations(ls,3))
out = df.copy()
out[triplets] = pd.DataFrame(d3, columns=triplets)
return out

from string import ascii_uppercase
from itertools import combinations, chain

def make(nl=8, nr=1000000, seed=1):
np.random.seed(seed)
letters = np.fromiter(ascii_uppercase, 'U1', nl)
df = pd.DataFrame()
for l in chain(letters, map("".join,combinations(letters,2))):
df[l] = np.random.randint(0,2,nr,dtype=np.uint8)
return letters, df

df1 = op()
df2 = pp()
assert (df1==df2).all().all()

ls, df = make(8,1000)

df1 = op()
df2 = pp()
assert (df1==df2).all().all()

from timeit import timeit

print(timeit(op,number=10))
print(timeit(pp,number=10))

ls, df = make(26,250000)
import time

t0 = time.perf_counter()
df2 = pp()
t1 = time.perf_counter()
print(t1-t0)
``````
``````3.2022583668585867 # op 8 symbols, 1000 rows, 10 repeats
0.2772211490664631 # pp 8 symbols, 1000 rows, 10 repeats
12.412292044842616 # pp 26 symbols, 250,000 rows, single run
`````` 