MACHINE LEARNING

APPLICATION OF SUPERVISED LEARNING

DEEP LEARNING

Please wait while the activity loads.
If this activity does not load, try refreshing your browser. Also, this page requires javascript. Please visit using a browser with javascript enabled.

If loading fails, click here to try again

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]

How have Transformers been successfully applied to generate images?

A	Treating neighboring pixels as an input sequence and autoregressing on a next-pixel-prediction problem
B	Learning to map sequences of flattened noise vectors (matching image size) to images See slides. Image Transformers achieved state of the art image generation on ImageNet by posing image generation as an auto-regressive sequence problem. A proposal similar to c) using VQ-VAEs to learn discretized latent spaces as inputs to transformers has been successful (see Dalle)
C	Either A or B
D	None of the above

Explanation:

Detailed explanation-1: -Vision transformer (ViT) is a transformer used in the field of computer vision that works based on the working nature of the transformers used in the field of natural language processing. Internally, the transformer learns by measuring the relationship between input token pairs.

Detailed explanation-2: -The decoder is autoregressive, it begins with a start token, and it takes in a list of previous outputs as inputs, as well as the encoder outputs that contain the attention information from the input. The decoder stops decoding when it generates a token as an output.

Detailed explanation-3: -The Transformer Architecture The decoder, on the right half of the architecture, receives the output of the encoder together with the decoder output at the previous time step to generate an output sequence.

There is 1 question to complete.