APPLICATION OF SUPERVISED LEARNING
DEEP LEARNING
Question
[CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
|
|
Treating neighboring pixels as an input sequence and autoregressing on a next-pixel-prediction problem
|
|
Learning to map sequences of flattened noise vectors (matching image size) to images See slides. Image Transformers achieved state of the art image generation on ImageNet by posing image generation as an auto-regressive sequence problem. A proposal similar to c) using VQ-VAEs to learn discretized latent spaces as inputs to transformers has been successful (see Dalle)
|
|
Either A or B
|
|
None of the above
|
Detailed explanation-1: -Vision transformer (ViT) is a transformer used in the field of computer vision that works based on the working nature of the transformers used in the field of natural language processing. Internally, the transformer learns by measuring the relationship between input token pairs.
Detailed explanation-2: -The decoder is autoregressive, it begins with a start token, and it takes in a list of previous outputs as inputs, as well as the encoder outputs that contain the attention information from the input. The decoder stops decoding when it generates a token as an output.
Detailed explanation-3: -The Transformer Architecture The decoder, on the right half of the architecture, receives the output of the encoder together with the decoder output at the previous time step to generate an output sequence.