APPLICATION OF SUPERVISED LEARNING
DEEP LEARNING
Question
[CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
|
|
Because the dot-product attention operation is agnostic to token ordering
|
|
To increase robustness to adversarial attacks in the token embedding space The soft dot-product attention operation is equivariant to the ordering of the tokens in the sequence. As such, if we want transformers to pay attention to absolute and relative positions, we can either add this information directly to the input tokens, or within the layer. The former is more straight-forward
|
|
Either A or B
|
|
None of the above
|
Detailed explanation-1: -Therefore, the positional embedding doesn’t mess with the work embedding information, and adding them is a more efficient way of adding the positional information that concatenates them.
Detailed explanation-2: -Because transformers do not process data in order at the encoder (input) layer, they do not understand order on their own (which RNNs understand by design). To solve this, transformers use positional encoding to maintain order in output (discussed further in sections 3.1) [13].
Detailed explanation-3: -This mechanism relates different positions of a single sequence to compute a representation of the same sequence. It is instrumental in machine reading, abstractive summarization, and even image description generation.