MACHINE LEARNING

APPLICATION OF SUPERVISED LEARNING

DEEP LEARNING

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
True or False:In practice we often use Multi-headed self attention. This intuitively works as it is more useful to learn multiple simpler transformations into distinct sub-spaces than one complicated transformation into a richer sub-space.
A
True
B
False You might imagine a situation in which our word embedding somehow captures whether a word is a noun or a verb. Then it might be easier to learn long-range noun-noun and verb-verb relations if we learn two distinct, simpler subspaces (by using 2 attention heads) than one richer subspace, where the MLP would have to disentangle these notions. Additionally, it might make it easier for the transformed value-space to amplify certain semantically distinct properties which can be leveraged by subsequent layers.
C
Either A or B
D
None of the above
Explanation: 

Detailed explanation-1: -Multi-head Attention is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension.

Detailed explanation-2: -Then, we suggest the main advantage of the multi-head attention is the training stability, since it has fewer layers than the single-head attention when using the same number of subspaces.

Detailed explanation-3: -Self-attention, sometimes called intra-attention, is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.

There is 1 question to complete.