MACHINE LEARNING

APPLICATION OF SUPERVISED LEARNING

DEEP LEARNING

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
Why do we add (or concatenate etc.) position encodings to Transformer Inputs?
A
Because the dot-product attention operation is agnostic to token ordering
B
To increase robustness to adversarial attacks in the token embedding space The soft dot-product attention operation is equivariant to the ordering of the tokens in the sequence. As such, if we want transformers to pay attention to absolute and relative positions, we can either add this information directly to the input tokens, or within the layer. The former is more straight-forward
C
Either A or B
D
None of the above
Explanation: 

Detailed explanation-1: -Therefore, the positional embedding doesn’t mess with the work embedding information, and adding them is a more efficient way of adding the positional information that concatenates them.

Detailed explanation-2: -Because transformers do not process data in order at the encoder (input) layer, they do not understand order on their own (which RNNs understand by design). To solve this, transformers use positional encoding to maintain order in output (discussed further in sections 3.1) [13].

Detailed explanation-3: -This mechanism relates different positions of a single sequence to compute a representation of the same sequence. It is instrumental in machine reading, abstractive summarization, and even image description generation.

There is 1 question to complete.