MACHINE LEARNING

APPLICATION OF SUPERVISED LEARNING

DEEP LEARNING

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
Which of these parts of the self-attention operation are calculated by passing inputs through an MLP?
A
Values
B
Keys
C
Queries
D
Word Embeddings
Explanation: 

Detailed explanation-1: -Steps to calculating Attention Take the query vector for a word and calculate it’s dot product with the transpose of the key vector of each word in the sequence-including itself. This is the attention score or attention weight . 2. Then divide each of the results by the square root of the dimension of the key vector.

Detailed explanation-2: -This quadratic complexity comes from the self-attention mechanism Attention(Q, K, V)=softmax(QK⊤√dk)V Attention ( Q, K, V ) = softmax ( Q K ⊤ d k ) V .

Detailed explanation-3: -Between the input and output elements (General Attention) Within the input elements (Self-Attention)

There is 1 question to complete.