8.2.2 Computer vision
[M[ For neural networks that work with images like VGG-19, InceptionNet, you often see a visualization of what type of features each filter captures. How are these visualizations created?
Hint: check out this Distill post on Feature Visualization.
- Filter size.
- [M] How are your model’s accuracy and computational efficiency affected when you decrease or increase its filter size?
- [E] How do you choose the ideal filter size?
- [M] Convolutional layers are also known as “locally connected.” Explain what it means.
- [M] When we use CNNs for text data, what would the number of channels be for the first conv layer?
- [E] What is the role of zero padding?
- [E] Why do we need upsampling? How to do it?
- [M] What does a 1x1 convolutional layer do?
- Pooling.
- [E] What happens when you use max-pooling instead of average pooling?
- [E] When should we use one instead of the other?
- [E] What happens when pooling is removed completely?
- [M] What happens if we replace a 2 x 2 max pool layer with a conv layer of stride 2?
- [M] When we replace a normal convolutional layer with a depthwise separable convolutional layer, the number of parameters can go down. How does this happen? Give an example to illustrate this.
- [M] Can you use a base model trained on ImageNet (image size 256 x 256) for an object classification task on images of size 320 x 360? How?
- [H] How can a fully-connected layer be converted to a convolutional layer?
[H] Pros and cons of FFT-based convolution and Winograd-based convolution.
Hint: Read Fast Algorithms for Convolutional Neural Networks (Andrew Lavin and Scott Gray, 2015)