MACHINE LEARNING

APPLICATION OF SUPERVISED LEARNING

DEEP LEARNING

Please wait while the activity loads.
If this activity does not load, try refreshing your browser. Also, this page requires javascript. Please visit using a browser with javascript enabled.

If loading fails, click here to try again

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]

Which of these parts of the self-attention operation are calculated by passing inputs through an MLP?

A	Values
B	Keys
C	Queries
D	Word Embeddings

Explanation:

Detailed explanation-1: -Steps to calculating Attention Take the query vector for a word and calculate it’s dot product with the transpose of the key vector of each word in the sequence-including itself. This is the attention score or attention weight . 2. Then divide each of the results by the square root of the dimension of the key vector.

Detailed explanation-2: -This quadratic complexity comes from the self-attention mechanism Attention(Q, K, V)=softmax(QK⊤√dk)V Attention ( Q, K, V ) = softmax ( Q K ⊤ d k ) V .

Detailed explanation-3: -Between the input and output elements (General Attention) Within the input elements (Self-Attention)

There is 1 question to complete.