wish of those help The looping isn't any problem, but appending lists to lists is very slow. To avoid this you can either allocate an array large enough for the data and shrink it afterwards (or copy the data in an array which has the exact size you need) or you can implement your function using std:vector. In this answer I use Numba because I'm not that experienced in performant Cython coding, but a Cython implementation should be straight forward. Numba has also a limited internal representation of list and tuples, but I don't know if the same is available within Cython. code :
import numpy as np
import numba as nb
@nb.njit()
def get_image_data_arr(image_data):
array_text = np.empty((image_data.shape[0]*image_data.shape[1],2),dtype=np.int64)
ii=0
for y in range(image_data.shape[0]):
for x in range(image_data.shape[1]):
if image_data[y, x] < 210:
array_text[ii,0]=x
array_text[ii,1]=y
ii+=1
return array_text[:ii,:]
@nb.njit()
def get_image_data(image_data):
list_text = []
for y in range(image_data.shape[0]):
for x in range(image_data.shape[1]):
if image_data[y, x] < 210:
#appending lists
list_text.append([x, y])
#appending tuples
#list_text.append((x, y))
return list_text
#Create some data
image_data=np.random.rand(1683*1240).reshape(1683,1240)*255
image_data=image_data.astype(np.uint8)
get_image_data (Pure Python) : 3.4s
get_image_data (naive Numba, appending lists) : 1.1s
get_image_data (naive Numba, appending tuples) : 0.3s
get_image_data_arr: : 0.012s
np.argwhere(image_data<210) : 0.035s
Share :

Slow division in cython
By : Selvin
Date : March 29 2020, 07:55 AM
it fixes the issue Firstly, you need to call the functions many (>1000) times, and take an average of the time spent in each, to get an accurate idea of how different they are. Calling each function once will not be accurate enough. Secondly, the time spent in the function will be affected by other things, not just the loop with divisions. Calling a def i.e. Python function like this involves some overhead in passing and returning the arguments. Also, creating a numpy array in the function will take time, so any differences in the loops in the two functions will be less obvious. code :
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.cdivision(True)
@cython.profile(True)
cdef double example1(double[:] xi, double[:] a, double[:] b, int D):
cdef int k
cdef double theSum = 0.0
for k in range(D):
theSum += (xi[k]  a[k]) / (b[k]  a[k])
return theSum
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.profile(True)
@cython.cdivision(False)
cdef double example2(double[:] xi, double[:] a, double[:] b, int D):
cdef int k
cdef double theSum = 0.0
for k in range(D):
theSum += (xi[k]  a[k]) / (b[k]  a[k])
return theSum
def testExamples():
D = 100000
x = np.random.rand(D)
a = np.zeros(D)
b = np.random.rand(D) + 1
for i in xrange(10000):
example1(x, a, b, D)
example2(x, a, b,D)
ncalls tottime percall cumtime percall filename:lineno(function)
10000 1.546 0.000 1.546 0.000 test.pyx:26(example2)
10000 0.002 0.000 0.002 0.000 test.pyx:11(example1)

Slow multiprocessing with cython
By : Marina Lozovanu
Date : March 29 2020, 07:55 AM
wish helps you Cython can have translation costs if you go between C and Python types too much, which could contribute. There's also the fact that the speedup in Python will be higher, which hides overhead. One suggestion is to use nogil functions and see whether threading has a lower overhead.

vectorization of looping on an array from cython
By : Alessandro Magnolo
Date : March 29 2020, 07:55 AM
help you fix your problem The only difference for the generated C code is that in inplace_addlocal the end variable for the loop is an int, while in inplace_add it's a Py_ssize_t. Since your loop counter is an int, in the inplace_add version, there would be an aditional overhead due to casting between the two types when the comparison is performed. code :
Py_ssize_t __pyx_t_1;
int __pyx_t_2;
int __pyx_t_3;
int __pyx_t_4;
__pyx_t_1 = (__pyx_v_a.shape[0]);
for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) {
__pyx_v_i = __pyx_t_2;
int __pyx_t_1;
int __pyx_t_2;
int __pyx_t_3;
int __pyx_t_4;
__pyx_v_n = (__pyx_v_a.shape[0]);
__pyx_t_1 = __pyx_v_n;
for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) {
__pyx_v_i = __pyx_t_2;

Why is this loop so slow in Cython?
By : JotaCê
Date : March 29 2020, 07:55 AM
Hope this helps You're appending to a bytes object with +=. That's really slow, since it has to copy the whole existing bytes object every time. Don't do that. One better option would be to use a bytearray, and only build a bytes object out of the bytearray at the end.

Alternative to looping? Vectorisation, cython?
By : Satyavara
Date : March 29 2020, 07:55 AM
Does that help Here's another option, which separates the calculation of the rates/years matrix and appends it to the input df later on. Still does looping in the script itself (not "externalized" to some numpy / pandas function). Should be fine for 5k rows I'd guesstimate. code :
import pandas as pd
import numpy as np
# def gen_df1():
# create the inital df without years/rates
df = pd.DataFrame({'Total': [100, 20, 30, 40, 10],
'Yr_to_Use': [2020, 2021, 2021, 2019, 2020],
'First_Year_Del': [5, 2, 7, 9, 10],
'Del_rate': [10, 5, 16, 18, 30]})
# get number of rates + remainder
n, r = np.divmod((df['Total']df['First_Year_Del']), df['Del_rate'])
# get the year of the last rate considering all rows
max_year = np.max(n + r.astype(np.bool) + df['Yr_to_Use'])
# get the offsets for the start of delivery, year zero is 2019
offset = df['Yr_to_Use']  2019
# subtracting the year zero lets you use this as an index...
# get a year index; this determines the the columns that will be created
yrs = np.arange(2019, max_year+1)
# prepare a n*m array to hold the rates for all years, initalize with all zero
out = np.zeros((df['Total'].shape[0], yrs.shape[0]))
# n: number of rows of the df, m: number of years where rates will have to be payed
# calculate the rates for each year and insert them into the output array
for i in range(df['Total'].shape[0]):
# concatenate: year of the first rate, all yearly rates, a final rate if there was a remainder
if r[i]: # if rest is not zero, append it as well
rates = np.concatenate([[df['First_Year_Del'][i]], n[i]*[df['Del_rate'][i]], [r[i]]])
else: # rest is zero, skip it
rates = np.concatenate([[df['First_Year_Del'][i]], n[i]*[df['Del_rate'][i]]])
# insert the rates at the apropriate location of the output array:
out[i, offset[i]:offset[i]+rates.shape[0]] = rates
# add the years/rates matrix to the original df
df = pd.concat([df, pd.DataFrame(out, columns=yrs.astype(str))], axis=1, sort=False)

