  C RUBY-ON-RAILS MYSQL ASP.NET DEVELOPMENT RUBY .NET LINUX SQL-SERVER REGEX WINDOWS ALGORITHM ECLIPSE VISUAL-STUDIO STRING SVN PERFORMANCE APACHE-FLEX UNIT-TESTING SECURITY LINQ UNIX MATH EMAIL OOP LANGUAGE-AGNOSTIC VB6 MSBUILD # PyTorch - applying attention efficiently  » python » PyTorch - applying attention efficiently

By : Srinu Sairi
Date : October 24 2020, 08:10 PM
fixed the issue. Will look into that further Ok, for clarity: I assume we only really care about vectorizing the for loop. What is the shape of x? Assuming x is 2-dimensional, I have the following code, where v1 executes your loop and v2 is a vectorized version: code :
``````import torch
import torch.nn.functional as F

torch.manual_seed(0)

x = torch.randn(3, 6)

def v1():
for i in range(1, x.size(0)):
prev = x[:i]
curr = x[i].view(1, -1)

prod = torch.mm(curr, prev.t())
attn = prod # same shape
context = torch.mm(attn, prev)
print(context)

def v2():
# we're going to unroll the loop by vectorizing over the new,
# 0-th dimension of `x`. We repeat it as many times as there
# are iterations in the for loop
repeated = x.unsqueeze(0).repeat(x.size(0), 1, 1)

# we're looking to build a `prevs` tensor such that
# prevs[i, x, y] == prev[x, y] at i-th iteration of the loop in v1,
# up to 0-padding necessary to make them all the same size.
# We need to build a higher-dimensional equivalent of torch.triu
xs = torch.arange(x.size(0)).reshape(1, -1, 1)
zs = torch.arange(x.size(0)).reshape(-1, 1, 1)
prevs = torch.where(zs < xs, torch.tensor(0.), repeated)

# this is an equivalent of the above iteration starting at 1
prevs = prevs[:-1]
currs = x[1:]

# a batched matrix multiplication
prod = torch.matmul(currs, prevs.transpose(1, 2))
attn = prod # same shape
context = torch.matmul(attn, prevs)
# equivalent of a higher dimensional torch.diagonal
contexts = torch.einsum('iij->ij', (context))
print(contexts)

print(x)

print('\n------ v1 -------\n')
v1()
print('\n------ v2 -------\n')
v2()
`````` ## How to access attention values in attention decoder of seq2seq_model to plot bleu score

By : Rafia Qutab
Date : March 29 2020, 07:55 AM
it fixes the issue You need first to save the reference to those tensors in a python list. And then pass the python list to the session.run function. The result will be a list with the numpy values of those tensors. ## Implementing Luong Attention in PyTorch

By : Brainary
Date : March 29 2020, 07:55 AM
hop of those help? This version works, and it follows the definition of Luong Attention (general), closely. The main difference from that in the question is the separation of embedding_size and hidden_size, which appears to be important for training after experimentation. Previously, I made both of them the same size (256), which creates trouble for learning, and it seems that the network could only learn half the sequence.
code :
``````class EncoderRNN(nn.Module):
def __init__(self, input_size, embedding_size, hidden_size,
num_layers=1, bidirectional=False, batch_size=1):
super(EncoderRNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.bidirectional = bidirectional
self.batch_size = batch_size

self.embedding = nn.Embedding(input_size, embedding_size)

self.gru = nn.GRU(embedding_size, hidden_size, num_layers,
bidirectional=bidirectional)

def forward(self, input, hidden):
embedded = self.embedding(input).view(1, 1, -1)
output, hidden = self.gru(embedded, hidden)
return output, hidden

def initHidden(self):
directions = 2 if self.bidirectional else 1
self.num_layers * directions,
self.batch_size,
self.hidden_size,
device=DEVICE
)

class AttnDecoderRNN(nn.Module):
def __init__(self, embedding_size, hidden_size, output_size, dropout_p=0):
super(AttnDecoderRNN, self).__init__()
self.embedding_size = embedding_size
self.hidden_size = hidden_size
self.output_size = output_size
self.dropout_p = dropout_p

self.embedding = nn.Embedding(
num_embeddings=output_size,
embedding_dim=embedding_size
)
self.dropout = nn.Dropout(self.dropout_p)
self.gru = nn.GRU(embedding_size, hidden_size)
self.attn = nn.Linear(hidden_size, hidden_size)
# hc: [hidden, context]
self.Whc = nn.Linear(hidden_size * 2, hidden_size)
# s: softmax
self.Ws = nn.Linear(hidden_size, output_size)

def forward(self, input, hidden, encoder_outputs):
embedded = self.embedding(input).view(1, 1, -1)
embedded = self.dropout(embedded)

gru_out, hidden = self.gru(embedded, hidden)

attn_prod = torch.mm(self.attn(hidden), encoder_outputs.t())
attn_weights = F.softmax(attn_prod, dim=1)
context = torch.mm(attn_weights, encoder_outputs)

# hc: [hidden: context]
hc = torch.cat([hidden, context], dim=1)
out_hc = F.tanh(self.Whc(hc))
output = F.log_softmax(self.Ws(out_hc), dim=1)

return output, hidden, attn_weights
`````` ## How can I get data from the buffer efficiently onto the GPU in PyTorch?

By : user2899997
Date : March 29 2020, 07:55 AM
Hope that helps You can call .cuda() and push the loaded samples onto a python Queue, and in another thread consume those GPU samples from the queue.
This is how the Ape-X implementation in Ray manages concurrent data loading for TensorFlow. ## How visualize attention LSTM using keras-self-attention package?

By : xl factor
Date : March 29 2020, 07:55 AM
wish help you to fix your issue One approach is to fetch the outputs of SeqSelfAttention for a given input, and organize them so to display predictions per-channel (see below). For something more advanced, have a look at the iNNvestigate library (usage examples included).
Update: I can also recommend See RNN, a package I wrote.
code :
``````from keras.layers import Input, Dense, LSTM, Flatten, concatenate
from keras.models import Model
from keras_self_attention import SeqSelfAttention
import numpy as np

ipt   = Input(shape=(240,4))
x     = LSTM(60, activation='tanh', return_sequences=True)(ipt)
x     = SeqSelfAttention(return_attention=True)(x)
x     = concatenate(x)
x     = Flatten()(x)
out   = Dense(1, activation='sigmoid')(x)
model = Model(ipt,out)

X = np.random.rand(10,240,4) # dummy data
Y = np.random.randint(0,2,(10,1)) # dummy labels
model.train_on_batch(X, Y)

outs = get_layer_outputs(model, 'seq', X[0:1], 1)
outs_1 = outs
outs_2 = outs

show_features_1D(model,'lstm',X[0:1],max_timesteps=100,equate_axes=False,show_y_zero=False)
show_features_1D(model,'lstm',X[0:1],max_timesteps=100,equate_axes=True, show_y_zero=True)
show_features_2D(outs_2)  #  for 2D since 'outs_2' is 3D
``````
``````def show_features_1D(model=None, layer_name=None, input_data=None,
prefetched_outputs=None, max_timesteps=100,
max_col_subplots=10, equate_axes=False,
show_y_zero=True, channel_axis=-1,
scale_width=1, scale_height=1, dpi=76):
if prefetched_outputs is None:
layer_outputs = get_layer_outputs(model, layer_name, input_data, 1)
else:
layer_outputs = prefetched_outputs
n_features    = layer_outputs.shape[channel_axis]

for _int in range(1, max_col_subplots+1):
if (n_features/_int).is_integer():
n_cols = int(n_features/_int)
n_rows = int(n_features/n_cols)

fig, axes = plt.subplots(n_rows,n_cols,sharey=equate_axes,dpi=dpi)
fig.set_size_inches(24*scale_width,16*scale_height)

subplot_idx = 0
for row_idx in range(axes.shape):
for col_idx in range(axes.shape):
subplot_idx += 1
feature_output = layer_outputs[:,subplot_idx-1]
feature_output = feature_output[:max_timesteps]
ax = axes[row_idx,col_idx]

if show_y_zero:
ax.axhline(0,color='red')
ax.plot(feature_output)

ax.axis(xmin=0,xmax=len(feature_output))
ax.axis('off')

ax.annotate(str(subplot_idx),xy=(0,.99),xycoords='axes fraction',
weight='bold',fontsize=14,color='g')
if equate_axes:
y_new = []
for row_axis in axes:
y_new += [np.max(np.abs([col_axis.get_ylim() for
col_axis in row_axis]))]
y_new = np.max(y_new)
for row_axis in axes:
[col_axis.set_ylim(-y_new,y_new) for col_axis in row_axis]
plt.show()
``````
``````def show_features_2D(data, cmap='bwr', norm=None,
scale_width=1, scale_height=1):
if norm is not None:
vmin, vmax = norm
else:
vmin, vmax = None, None  # scale automatically per min-max of 'data'

plt.imshow(data, cmap=cmap, vmin=vmin, vmax=vmax)
plt.xlabel('Timesteps', weight='bold', fontsize=14)
plt.ylabel('Attention features', weight='bold', fontsize=14)
plt.colorbar(fraction=0.046, pad=0.04)  # works for any size plot

plt.gcf().set_size_inches(8*scale_width, 8*scale_height)
plt.show()
``````
``````def get_layer_outputs(model, layer_name, input_data, learning_phase=1):
outputs   = [layer.output for layer in model.layers if layer_name in layer.name]
layers_fn = K.function([model.input, K.learning_phase()], outputs)
return layers_fn([input_data, learning_phase])
``````
``````ipt   = Input(batch_shape=(10,240,4))
x     = LSTM(60, activation='tanh', return_sequences=True)(ipt)
x     = SeqWeightedAttention(return_attention=True)(x)
x     = concatenate(x)
out   = Dense(1, activation='sigmoid')(x)
model = Model(ipt,out)

X = np.random.rand(10,240,4) # dummy data
Y = np.random.randint(0,2,(10,1)) # dummy labels
model.train_on_batch(X, Y)

outs = get_layer_outputs(model, 'seq', X, 1)
outs_1 = outs # additional index since using batch_shape
outs_2 = outs

plt.hist(outs_1, bins=500); plt.show()
plt.hist(outs_2, bins=500); plt.show()
`````` ## How to solve size mismatch of Multi Head Attention in pytorch?

By : Kakerdu
Date : March 29 2020, 07:55 AM
Does that help Looks like the code expects to get the same dimensions for query, key, and value, so if you don't transpose it fixes the issue:
code :
``````query_ = X
key_ = X
value_ = X
`````` 