should help you out I believe both of your summations can be removed, but I only removed the easier one for the time being. The summation over the second dimension is trivial, since it only affects the A_k array: code :
B_k = sum(A_k,2);
for k = 2:L
i = 2:k;
x(:,1,k+1) = x(:,1,k+1) + sum(bsxfun(@times,B_k(:,1,2:k),x(:,1,k+1i)),3);
end
L = 10000;
x = rand(4,L+1);
A_k = rand(4,4,L);
B_k = squeeze(sum(A_k,2)).';
tic
for k = 2:L
ii = 1:k1;
x(:,k+1) = x(:,k+1) + diag(x(:,ii)*B_k(k+1ii,:));
end
toc
Share :

Why is numpy's einsum faster than numpy's built in functions?
By : Wenjuan Luo
Date : March 29 2020, 07:55 AM
Hope this helps Now that numpy 1.8 is released, where according to the docs all ufuncs should use SSE2, I wanted to double check that Seberg's comment about SSE2 was valid. To perform the test a new python 2.7 install was created numpy 1.7 and 1.8 were compiled with icc using standard options on a AMD opteron core running Ubuntu. code :
import numpy as np
import timeit
arr_1D=np.arange(5000,dtype=np.double)
arr_2D=np.arange(500**2,dtype=np.double).reshape(500,500)
arr_3D=np.arange(500**3,dtype=np.double).reshape(500,500,500)
print 'Summation test:'
print timeit.timeit('np.sum(arr_3D)',
'import numpy as np; from __main__ import arr_1D, arr_2D, arr_3D',
number=5)/5
print timeit.timeit('np.einsum("ijk>", arr_3D)',
'import numpy as np; from __main__ import arr_1D, arr_2D, arr_3D',
number=5)/5
print '\n'
print 'Power test:'
print timeit.timeit('arr_3D*arr_3D*arr_3D',
'import numpy as np; from __main__ import arr_1D, arr_2D, arr_3D',
number=5)/5
print timeit.timeit('np.einsum("ijk,ijk,ijk>ijk", arr_3D, arr_3D, arr_3D)',
'import numpy as np; from __main__ import arr_1D, arr_2D, arr_3D',
number=5)/5
print '\n'
print 'Outer test:'
print timeit.timeit('np.outer(arr_1D, arr_1D)',
'import numpy as np; from __main__ import arr_1D, arr_2D, arr_3D',
number=5)/5
print timeit.timeit('np.einsum("i,k>ik", arr_1D, arr_1D)',
'import numpy as np; from __main__ import arr_1D, arr_2D, arr_3D',
number=5)/5
print '\n'
print 'Einsum test:'
print timeit.timeit('np.sum(arr_2D*arr_3D)',
'import numpy as np; from __main__ import arr_1D, arr_2D, arr_3D',
number=5)/5
print timeit.timeit('np.einsum("ij,oij>", arr_2D, arr_3D)',
'import numpy as np; from __main__ import arr_1D, arr_2D, arr_3D',
number=5)/5
print '\n'
Summation test:
0.172988510132
0.0934836149216

Power test:
1.93524689674
0.839519000053

Outer test:
0.130380821228
0.121401786804

Einsum test:
0.979052495956
0.126066613197
Summation test:
0.116551589966
0.0920487880707

Power test:
1.23683619499
0.815982818604

Outer test:
0.131808176041
0.127472200394

Einsum test:
0.781750011444
0.129271841049

Why is numpy's einsum slower than numpy's builtin functions?
By : jokke009
Date : March 29 2020, 07:55 AM
I wish did fix the issue. I've usually gotten good performance out of numpy's einsum function (and I like it's syntax). @Ophion's answer to this question shows that  for the cases tested  einsum consistently outperforms the "builtin" functions (sometimes by a little, sometimes by a lot). But I just encountered a case where einsum is much slower. Consider the following equivalent functions: , You can have the best of both worlds: code :
def func_dot_einsum(C, X):
Y = X.dot(C)
return np.einsum('ij,ij>i', Y, X)
In [7]: %timeit func_dot(C, X)
10 loops, best of 3: 31.1 ms per loop
In [8]: %timeit func_einsum(C, X)
10 loops, best of 3: 105 ms per loop
In [9]: %timeit func_einsum2(C, X)
10 loops, best of 3: 43.5 ms per loop
In [10]: %timeit func_dot_einsum(C, X)
10 loops, best of 3: 21 ms per loop
for i in range(I):
out[i] = 0
for j in range(J):
out[i] += a[i, j] * b[i, j]
np.einsum('ij,jk,ik>i', a, b, c)
np.sum(a[:, :, None] * b[None, :, :] * c[:, None, :], axis=(1, 2))
In [29]: a, b, c = np.random.rand(3, 100, 100)
In [30]: %timeit np.einsum('ij,jk,ik>i', a, b, c)
100 loops, best of 3: 2.41 ms per loop
In [31]: %timeit np.sum(a[:, :, None] * b[None, :, :] * c[:, None, :], axis=(1, 2))
100 loops, best of 3: 12.3 ms per loop
for i in range(I):
out[i] = 0
for j in range(J):
for k in range(K):
out[i] += a[i, j] * b[j, k] * c[i, k]
for i in range(I):
out[i] = 0
for j in range(J):
temp = 0
for k in range(K):
temp += b[j, k] * c[i, k]
out[i] += a[i, j] * temp

numpy einsum with '...'
By : SANJIB MAJUMDER
Date : November 18 2020, 03:42 PM
To fix the issue you can do Yep, it's a bug. It was fixed in this pull request: https://github.com/numpy/numpy/pull/4099This was only merged a month ago, so it'll be a while before it makes it to a stable release. code :
np.einsum('...ij,j...>i...', A, x)

What is the equivalent of Matlabs differentiate function in numpy?
By : W.Willi
Date : March 29 2020, 07:55 AM

Why is numpy.dot much faster than numpy.einsum?
By : thompsonja
Date : March 29 2020, 07:55 AM
wish of those help einsum parses the index string, and then constructs an nditer object, and uses that to perform a sumofproducts iteration. It has special cases where the indexes just perform axis swaps, and sums ('ii>i'). It may also have special cases for 2 and 3 variables (as opposed to more). But it does not make any attempt to invoke external libraries. I worked out a pure python workalike, but with more focus on the parsing than the calculation special cases.

