The purpose of the masked self-attention layer in a Transformer decoder is to allow the model to "attend" to previously generated tokens. This is done by preventing the model from seeing future tokens in the output sequence during training, which simulates the conditions during prediction. This way, the model learns to generate the next token based on the previous ones.

Question

Knowee AI · Accepted Answer

The purpose of the masked self-attention layer in a Transformer decoder is to allow the model to "attend" to previously generated tokens. This is done by preventing the model from seeing future tokens in the output sequence during training, which simulates the conditions during prediction. This way, the model learns to generate the next token based on the previous ones.

In a Transformer decoder, what is the purpose of the masked self-attention layer?Question 2Answera.Assign weights to relevant parts of the input sequence.b.None of thesec.Generate a representation of the entire output sequence.d.Allow the model to "attend" to previously generated tokens.

Question

Solution

Similar Questions

Upgrade your grade with Knowee