Optimizing nested loops
By : Kevin
Date : March 29 2020, 07:55 AM
this will help One type of optimization is loop unrolling. Occasionally the pipeline needs to stall because there is a lot of activity around obtaining the index, updating it, and storing it back in memory. This is probably the primary reason your multithreaded implementation didn't do well, all the threads probably were fighting over access to the index. If you want to reattempt a multithreaded implementation, have each thread know it's "offset" based on the thread count, and have each thread process a different remainder discovered via modulus division code :
thread 0 works on i*rows+j % (thread count) = 0
thread 1 works on i*rows+j % (thread count) = 1
(and so on)
fDeepCopy(F2D* in)
{
int i, j;
F2D* out;
int rows, cols;
rows = in>height;
cols = in>width;
out = fMallocHandle(rows, cols);
for(i=0; i<rows; i++) {
// rewrite to ensure we don't walk off "4 long" pads
int j = 0;
int pads = (cols / 4)*4;
for(; j < pads; j = j + 4) {
subsref(out,i,j) = subsref(in,i,j);
subsref(out,i,j+1) = subsref(in,i,j+1);
subsref(out,i,j+2) = subsref(in,i,j+2);
subsref(out,i,j+3) = subsref(in,i,j+3);
}
// process the remainders
for(; j < pads; j++) {
subsref(out,i,j) = subsref(in,i,j);
}
}
return out;
}
vec_res.x = v1.x + v2.x;
vec_res.y = v1.y + v2.y;
vec_res.z = v1.z + v2.z;
vec_res.w = v1.w + v2.w;
movaps xmm0, [v1] ;xmm0 = v1.w  v1.z  v1.y  v1.x
addps xmm0, [v2] ;xmm0 = v1.w+v2.w  v1.z+v2.z  v1.y+v2.y  v1.x+v2.x
movaps [vec_res], xmm0

speed up a monte carlo simulation with nested loop in R
By : JH P
Date : March 29 2020, 07:55 AM
around this issue The following should be faster.... but if you're locking up when A is large that might be a memory issue and the following is more memory intensive. More information, like what banks is, what x is, y, where you get dea from, and what the purpose is would be helpful. Essentially all I've done is try to move as much as I can out of the inner loop. The shorter that is, the better off you'll be. code :
A < nrow(banks)
effm < matrix(nrow = A, ncol = 2)
m < 20
B < 100
pb < txtProgressBar(min = 0,
max = A, style=3)
for(a in 1:A) {
x1 < x[a,]
y1 < y[a,]
theta < numeric(B)
xrefm < x1[sample(1:nrow(x1), m * B, replace=TRUE),] # get all of your samples at once
yrefm < y1[sample(1:nrow(y1), m * B, replace=TRUE),]
deaX < matrix(x[a,], ncol=3)
deaY < matrix(y[a,], ncol=3)
for(i in 1:B){
theta[i] < dea(deaX, deaY, RTS = 'vrs', ORIENTATION = 'graph',
xrefm[(1:m) + (i1) * m,], yrefm[(1:m) + (i1) * m,], FAST=TRUE)
}
effm[a,1] < mean(theta)
effm[a,2] < sd(theta) / sqrt(B)
setTxtProgressBar(pb, a)
}
close(pb)
effm

Using apply for simulation instead of nested for loops
By : INeedHelp
Date : March 29 2020, 07:55 AM
will help you You can improve the speed of your function by using data.table. However, you would still have to use for loops (which is not a bad thing). code :
library(data.table)
simdiffuse < function(a, b, c, d) {
endo < 1/a # innovation endogenous effect
endomacro < 1/b # category endogenous effect
appeal < c # innovation's ex ante appeal
ninnov < d # number of innovations in category
results < data.table(catdensity = rep(0:ninnov, each = 25), t = 1:25,
endo = endo, endomacro = endomacro, appeal = appeal,
adopt = as.numeric(NA))
for (cc in 0:ninnov) {
diff < data.table(prop = rnorm(1000), adopt = c(rep(1,5), rep(0, 995)))
for (tt in 1:25) {
results[catdensity == cc & t == tt, adopt := diff[, mean(adopt)]]
diff[, rr := rnorm(1, prop), by="prop"]
diff[appeal + mean(adopt) * endo + cc * endomacro > rr, adopt := 1]
}
}
return(results)
}
results < simdiffuse(.2, 20, 3, 60)

simulation in R with nested loops run slow
By : Diana María Aristizá
Date : March 29 2020, 07:55 AM
Does that help Indeed, it would be more efficient to encode fertility with 0 and 1, and you could even have an integer matrix. Anyhow, the code as it stands can be simplified a lot  so here is a vectorized solution, still using your data.frame: code :
NextGen < function(agent, N, S, A) {
excess < runif(N)
v1 < which(agent$fertility == "Trad" & excess < S)
nextgen.agent < agent[c(1:N, v1), ]
nextgen.agent[c(v1, seq.int(N+1, nrow(nextgen.agent))), "fertility"] < ifelse(A > runif(length(v1)*2), "Plan", "Trad")
nextgen.agent
}
agentDF < data.frame(fertility = "Trad", lineage = 1:50, stringsAsFactors = FALSE)
# use microbenchmark library to compare performance
microbenchmark::microbenchmark(
base = {
res1 < NextGeneration(agentDF, 50, 0.8, 0.8) # note I fixed the two variable typos in your function
},
new = {
res2 < NextGen(agentDF, 50, 0.8, 0.8)
},
times = 100
)
## Unit: microseconds
## expr min lq mean median uq max neval
## base 1998.533 2163.8605 2446.561 2222.8200 2286.844 14413.173 100
## new 282.032 304.1165 329.552 320.3255 348.488 467.217 100

Optimizing nested forloops
By : dliang
Date : March 29 2020, 07:55 AM
wish help you to fix your issue Here is a method that is about 10x faster than your reference code. It does nothing particularly clever, just pedestrian optimization. code :
import numpy as np
import pandas as pd
df = pd.DataFrame()
np.random.seed(1)
df["A"] = np.random.randint(2, size=10)
df["B"] = np.random.randint(2, size=10)
df["C"] = np.random.randint(2, size=10)
df["D"] = np.random.randint(2, size=10)
df["AB"] = np.random.randint(2, size=10)
df["AC"] = np.random.randint(2, size=10)
df["AD"] = np.random.randint(2, size=10)
df["BC"] = np.random.randint(2, size=10)
df["BD"] = np.random.randint(2, size=10)
df["CD"] = np.random.randint(2, size=10)
ls = ["A", "B", "C", "D"]
def op():
out = df.copy()
for i, a in enumerate(ls):
for j in range(i + 1, len(ls)):
b = ls[j]
for k in range(j + 1, len(ls)):
c = ls[k]
idx = a+b+c
idx_abc = (out[a]>0) & (out[b]>0) & (out[c]>0)
sum_abc = out[idx_abc][a+b] + out[idx_abc][b+c] + out[idx_abc][a+c]
out[a+b+c]=0
out.loc[sum_abc.index[sum_abc>=2], a+b+c] = 99
return out
import scipy.spatial.distance as ssd
def pp():
data = df.values
n = len(ls)
d1,d2 = np.split(data, [n], axis=1)
i,j = np.triu_indices(n,1)
d2 = d2 & d1[:,i] & d1[:,j]
k,i,j = np.ogrid[:n,:n,:n]
k,i,j = np.where((k<i)&(i<j))
lu = ssd.squareform(np.arange(n*(n1)//2))
d3 = ((d2[:,lu[k,i]]+d2[:,lu[i,j]]+d2[:,lu[k,j]])>=2).view(np.uint8)*99
*triplets, = map("".join, combinations(ls,3))
out = df.copy()
out[triplets] = pd.DataFrame(d3, columns=triplets)
return out
from string import ascii_uppercase
from itertools import combinations, chain
def make(nl=8, nr=1000000, seed=1):
np.random.seed(seed)
letters = np.fromiter(ascii_uppercase, 'U1', nl)
df = pd.DataFrame()
for l in chain(letters, map("".join,combinations(letters,2))):
df[l] = np.random.randint(0,2,nr,dtype=np.uint8)
return letters, df
df1 = op()
df2 = pp()
assert (df1==df2).all().all()
ls, df = make(8,1000)
df1 = op()
df2 = pp()
assert (df1==df2).all().all()
from timeit import timeit
print(timeit(op,number=10))
print(timeit(pp,number=10))
ls, df = make(26,250000)
import time
t0 = time.perf_counter()
df2 = pp()
t1 = time.perf_counter()
print(t1t0)
3.2022583668585867 # op 8 symbols, 1000 rows, 10 repeats
0.2772211490664631 # pp 8 symbols, 1000 rows, 10 repeats
12.412292044842616 # pp 26 symbols, 250,000 rows, single run

