logo
down
shadow

3 nested loops: Optimizing a simple simulation for speed


3 nested loops: Optimizing a simple simulation for speed

By : Wild Shovel
Date : October 23 2020, 08:10 PM
hop of those help? You can vectorize your inner loop by generating n random integers all at once with numpy (much faster), and get rid of all your if statements using arithmatic instead of boolean logic.
code :
while...: 
    #population changes by (-1, 0, +1, +2) for each alien
    n += np.random.randint(-1,3, size=n).sum()
from time import time
import numpy as np
from numpy.random import randint
from numba import njit, int32, prange

@njit('i4(i4)')
def simulate(pop_max): #move simulation of one population to a function for parallelization
    n = 1
    while 0 < n < pop_max:
        n += np.sum(randint(-1,3,n))
    return n

@njit('i4[:](i4,i4)', parallel=True)
def solve(pop_max, iter_max):
    #this could be easily simplified to just return the raio of populations that die off vs survive to pop_max
    # which would save you some ram (though the speed is about the same)
    results = np.zeros(iter_max, dtype=int32) #numba needs int32 here rather than python int
    for i in prange(iter_max): #prange specifies that this loop can be parallelized
        results[i] = simulate(pop_max)
    return results

pop_max = 100
iter_max = 100000

t = time()
print( np.bincount(solve(pop_max, iter_max))[0] / iter_max )
print('time elapsed: ', time()-t)


Share : facebook icon twitter icon
Optimizing nested loops

Optimizing nested loops


By : Kevin
Date : March 29 2020, 07:55 AM
this will help One type of optimization is loop unrolling. Occasionally the pipeline needs to stall because there is a lot of activity around obtaining the index, updating it, and storing it back in memory. This is probably the primary reason your multithreaded implementation didn't do well, all the threads probably were fighting over access to the index.
If you want to reattempt a multithreaded implementation, have each thread know it's "offset" based on the thread count, and have each thread process a different remainder discovered via modulus division
code :
thread 0 works on i*rows+j % (thread count) = 0
thread 1 works on i*rows+j % (thread count) = 1
(and so on)
fDeepCopy(F2D* in)
{
    int i, j;
    F2D* out;
    int rows, cols;

    rows = in->height;
    cols = in->width;

    out = fMallocHandle(rows, cols);

    for(i=0; i<rows; i++) {
      // rewrite to ensure we don't walk off "4 long" pads
      int j = 0;
      int pads = (cols / 4)*4;
      for(; j < pads; j = j + 4) {
        subsref(out,i,j) = subsref(in,i,j);
        subsref(out,i,j+1) = subsref(in,i,j+1);
        subsref(out,i,j+2) = subsref(in,i,j+2);
        subsref(out,i,j+3) = subsref(in,i,j+3);
      }
      // process the remainders
      for(; j < pads; j++) {
        subsref(out,i,j) = subsref(in,i,j);
      }
    }
    return out;
}
vec_res.x = v1.x + v2.x;
vec_res.y = v1.y + v2.y;
vec_res.z = v1.z + v2.z;
vec_res.w = v1.w + v2.w;
movaps xmm0, [v1]          ;xmm0 = v1.w | v1.z | v1.y | v1.x 
addps xmm0, [v2]           ;xmm0 = v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x               
movaps [vec_res], xmm0
speed up a monte carlo simulation with nested loop in R

speed up a monte carlo simulation with nested loop in R


By : JH P
Date : March 29 2020, 07:55 AM
around this issue The following should be faster.... but if you're locking up when A is large that might be a memory issue and the following is more memory intensive. More information, like what banks is, what x is, y, where you get dea from, and what the purpose is would be helpful.
Essentially all I've done is try to move as much as I can out of the inner loop. The shorter that is, the better off you'll be.
code :
A <- nrow(banks)
effm <- matrix(nrow = A, ncol = 2)
m <- 20
B <- 100
pb <- txtProgressBar(min = 0,
                     max = A, style=3)
for(a in 1:A) {
  x1 <- x[-a,]
  y1 <- y[-a,]
  theta <- numeric(B)
  xrefm <- x1[sample(1:nrow(x1), m * B, replace=TRUE),] # get all of your samples at once
  yrefm <- y1[sample(1:nrow(y1), m * B, replace=TRUE),]
  deaX <- matrix(x[a,], ncol=3)
  deaY <- matrix(y[a,], ncol=3)

  for(i in 1:B){
    theta[i] <- dea(deaX, deaY, RTS = 'vrs', ORIENTATION = 'graph',
                   xrefm[(1:m) + (i-1) * m,], yrefm[(1:m) + (i-1) * m,], FAST=TRUE)
  }

  effm[a,1] <- mean(theta)
  effm[a,2] <- sd(theta) / sqrt(B)
  setTxtProgressBar(pb, a) 
}
close(pb)
effm 
Using apply for simulation instead of nested for loops

Using apply for simulation instead of nested for loops


By : INeedHelp
Date : March 29 2020, 07:55 AM
will help you You can improve the speed of your function by using data.table. However, you would still have to use for loops (which is not a bad thing).
code :
library(data.table)
simdiffuse <- function(a, b, c, d) {

  endo <- 1/a        # innovation endogenous effect
  endomacro <- 1/b   # category endogenous effect
  appeal <- c        # innovation's ex ante appeal
  ninnov <- d        # number of innovations in category 

  results <- data.table(catdensity = rep(0:ninnov, each = 25), t = 1:25, 
                        endo = endo, endomacro = endomacro, appeal = appeal, 
                        adopt = as.numeric(NA))    


  for (cc in 0:ninnov) {
    diff <- data.table(prop = rnorm(1000), adopt = c(rep(1,5), rep(0, 995)))
    for (tt in 1:25) {
      results[catdensity == cc & t == tt, adopt := diff[, mean(adopt)]]
      diff[, rr := rnorm(1, prop), by="prop"]
      diff[appeal + mean(adopt) * endo + cc * endomacro > rr, adopt := 1]
    }
  }
  return(results)
}

results <- simdiffuse(.2, 20, -3, 60)
simulation in R with nested loops run slow

simulation in R with nested loops run slow


By : Diana María Aristizá
Date : March 29 2020, 07:55 AM
Does that help Indeed, it would be more efficient to encode fertility with 0 and 1, and you could even have an integer matrix.
Anyhow, the code as it stands can be simplified a lot - so here is a vectorized solution, still using your data.frame:
code :
NextGen <- function(agent, N, S, A) {
  excess <- runif(N)
  v1 <- which(agent$fertility == "Trad" & excess < S)
  nextgen.agent <- agent[c(1:N, v1), ]
  nextgen.agent[c(v1, seq.int(N+1, nrow(nextgen.agent))), "fertility"] <- ifelse(A > runif(length(v1)*2), "Plan", "Trad")
  nextgen.agent
}
agentDF <- data.frame(fertility = "Trad", lineage = 1:50, stringsAsFactors = FALSE)

# use microbenchmark library to compare performance
microbenchmark::microbenchmark(
  base = {
    res1 <- NextGeneration(agentDF, 50, 0.8, 0.8) # note I fixed the two variable typos in your function
  }, 
  new = {
    res2 <- NextGen(agentDF, 50, 0.8, 0.8)
  }, 
  times = 100
)

## Unit: microseconds
## expr      min        lq     mean    median       uq       max neval
## base 1998.533 2163.8605 2446.561 2222.8200 2286.844 14413.173   100
##  new  282.032  304.1165  329.552  320.3255  348.488   467.217   100
Optimizing nested for-loops

Optimizing nested for-loops


By : dliang
Date : March 29 2020, 07:55 AM
wish help you to fix your issue Here is a method that is about 10x faster than your reference code. It does nothing particularly clever, just pedestrian optimization.
code :
import numpy as np
import pandas as pd
df = pd.DataFrame()

np.random.seed(1)

df["A"] = np.random.randint(2, size=10)
df["B"] = np.random.randint(2, size=10)
df["C"] = np.random.randint(2, size=10)
df["D"] = np.random.randint(2, size=10)

df["AB"] = np.random.randint(2, size=10)
df["AC"] = np.random.randint(2, size=10)
df["AD"] = np.random.randint(2, size=10)
df["BC"] = np.random.randint(2, size=10)
df["BD"] = np.random.randint(2, size=10)
df["CD"] = np.random.randint(2, size=10)

ls = ["A", "B", "C", "D"]

def op():
    out = df.copy()
    for i, a in enumerate(ls):
        for j in range(i + 1, len(ls)):
            b = ls[j]
            for k in range(j + 1, len(ls)):
                c = ls[k]
                idx = a+b+c

                idx_abc = (out[a]>0) & (out[b]>0) & (out[c]>0)
                sum_abc = out[idx_abc][a+b] + out[idx_abc][b+c] + out[idx_abc][a+c]

                out[a+b+c]=0
                out.loc[sum_abc.index[sum_abc>=2], a+b+c] = 99
    return out

import scipy.spatial.distance as ssd

def pp():
    data = df.values
    n = len(ls)
    d1,d2 = np.split(data, [n], axis=1)
    i,j = np.triu_indices(n,1)
    d2 = d2 & d1[:,i] & d1[:,j]
    k,i,j = np.ogrid[:n,:n,:n]
    k,i,j = np.where((k<i)&(i<j))
    lu = ssd.squareform(np.arange(n*(n-1)//2))
    d3 = ((d2[:,lu[k,i]]+d2[:,lu[i,j]]+d2[:,lu[k,j]])>=2).view(np.uint8)*99
    *triplets, = map("".join, combinations(ls,3))
    out = df.copy()
    out[triplets] = pd.DataFrame(d3, columns=triplets)
    return out

from string import ascii_uppercase
from itertools import combinations, chain

def make(nl=8, nr=1000000, seed=1):
    np.random.seed(seed)
    letters = np.fromiter(ascii_uppercase, 'U1', nl)
    df = pd.DataFrame()
    for l in chain(letters, map("".join,combinations(letters,2))):
        df[l] = np.random.randint(0,2,nr,dtype=np.uint8)
    return letters, df

df1 = op()
df2 = pp()
assert (df1==df2).all().all()

ls, df = make(8,1000)

df1 = op()
df2 = pp()
assert (df1==df2).all().all()

from timeit import timeit

print(timeit(op,number=10))
print(timeit(pp,number=10))

ls, df = make(26,250000)
import time

t0 = time.perf_counter()
df2 = pp()
t1 = time.perf_counter()
print(t1-t0)
3.2022583668585867 # op 8 symbols, 1000 rows, 10 repeats
0.2772211490664631 # pp 8 symbols, 1000 rows, 10 repeats
12.412292044842616 # pp 26 symbols, 250,000 rows, single run
Related Posts Related Posts :
  • Submitting login form with scrapy
  • How do i edit the favicon in the Browsable API in Django REST framework?
  • multiprocessing.Pool.map_async doesn't seem to... do anything at all?
  • Python Selenium: Stale Element Reference Exception Error
  • Datetime conversion - How to extract the inferred format?
  • Import YAML variables automatically?
  • How to create a powershell shortcut for my python file
  • Python's 'set' operator doesn't work with numpy.nan
  • Pass object fields and one2many fields on same method - Odoo v8
  • Select columns based on column name and location in Pandas
  • Standardizing timeseries in Pandas using interpolation
  • How many tweets can be collected?
  • how format specifier taking value while tuple list is passed
  • How to print a numpy array with data type?
  • Timeout child thread for python3
  • How can I regroup a dataframe and accumulate a colume's values?
  • Bulk Insert into SQL Server with Python not working
  • Removing last rows of each group based on condition in a pandas dataframe
  • Why the css file can not be found in Django template?
  • targeting center of mass - scipy / numpy
  • Foursquare - get tips from VENUE_ID
  • Unpack a dictionary to format
  • encoding special characters in python2
  • Replacing integers with NaN results in the entire column becoming float dtype
  • Python 3.6 - BeautifulSoup4, parse table AttributeError: ResultSet object has no attribute 'findAll'
  • Convert panda date list to python list of date strings
  • escape response from Scrapy to parse json
  • How to create a same dropdown menu for different labels?
  • Why are some python variables uppercase whereas others are lowercase?
  • Machine Learning, What are the common techniques for feature engineering and presenting the model?
  • Modify value of a Django form field during clean() and validate again
  • Heroku Django app can't start up -- 'No module named site'
  • Getting list of dates (excluding weekends)
  • Im trying to create the regular expression to include the text and not the href
  • Python file.readline(2) reads first 2 charectars
  • Groupby with handling empty bin in python
  • Modifying Gcode
  • calling a value in a dictionary within a dictionary (reading a json file)
  • Bouncing ball invalid syntax why is that?
  • Python making a counter
  • Python rstrip and split
  • What does the String mean in numpy.r_?
  • How to correctly extend variable __all__ in a __init__.py?
  • Python behaves weird with piped input
  • Python 3 two dimensional list comprehension
  • How to slice image by broadcasting slices? Error: 'only integer scalar arrays can be converted to a scalar index' in pyt
  • (Python Beginner) Need a start on classes
  • IndexError: At least one sheet must be visible
  • How to solve a system of linear equations over the nonnegative integers?
  • Pandas keep the most complete rows
  • "List index out of range" error in Python Memory Match game
  • Numpy: how to use argmax results to get the actual max?
  • Google Cloud Dataflow can't import 'google.cloud.datastore'
  • Calculate pandas DataFrame column by custom routine which accepts dictionary as input
  • Connect to a Class Method by it's method name holded into a var in a for loop in python
  • PyQt5 signals and threading.Timer
  • Replace 2 characters in a string in python
  • Passing command line arguments from a folder script to a file script
  • Understand the syntaxe X[Y == c] in Numpy
  • Optimize beginner python script about substring replacement
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org