#### 8.2.4 Other

- [M] An autoencoder is a neural network that learns to copy its input to its output. When would this be useful?
- Self-attention.
- [E] What’s the motivation for self-attention?
- [E] Why would you choose a self-attention architecture over RNNs or CNNs?
- [M] Why would you need multi-headed attention instead of just one head for attention?
- [M] How would changing the number of heads in multi-headed attention affect the model’s performance?

- Transfer learning
- [E] You want to build a classifier to predict sentiment in tweets but you have very little labeled data (say 1000). What do you do?
- [M] What’s gradual unfreezing? How might it help with transfer learning?

- Bayesian methods.
- [M] How do Bayesian methods differ from the mainstream deep learning approach?
- [M] How are the pros and cons of Bayesian neural networks compared to the mainstream neural networks?
- [M] Why do we say that Bayesian neural networks are natural ensembles?

- GANs.
- [E] What do GANs converge to?
- [M] Why are GANs so hard to train?