APPLICATION OF SUPERVISED LEARNING
DEEP LEARNING
Question
[CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
|
Which of these parts of the self-attention operation are calculated by passing inputs through an MLP?
|
Values
|
|
Keys
|
|
Queries
|
|
Word Embeddings
|
Explanation:
Detailed explanation-1: -Steps to calculating Attention Take the query vector for a word and calculate it’s dot product with the transpose of the key vector of each word in the sequence-including itself. This is the attention score or attention weight . 2. Then divide each of the results by the square root of the dimension of the key vector.
Detailed explanation-2: -This quadratic complexity comes from the self-attention mechanism Attention(Q, K, V)=softmax(QK⊤√dk)V Attention ( Q, K, V ) = softmax ( Q K ⊤ d k ) V .
Detailed explanation-3: -Between the input and output elements (General Attention) Within the input elements (Self-Attention)
There is 1 question to complete.