8.2.2 Computer vision

  1. [M[ For neural networks that work with images like VGG-19, InceptionNet, you often see a visualization of what type of features each filter captures. How are these visualizations created?

    Hint: check out this Distill post on Feature Visualization.

  2. Filter size.
    1. [M] How are your model’s accuracy and computational efficiency affected when you decrease or increase its filter size?
    2. [E] How do you choose the ideal filter size?
  3. [M] Convolutional layers are also known as “locally connected.” Explain what it means.
  4. [M] When we use CNNs for text data, what would the number of channels be for the first conv layer?
  5. [E] What is the role of zero padding?
  6. [E] Why do we need upsampling? How to do it?
  7. [M] What does a 1x1 convolutional layer do?
  8. Pooling.
    1. [E] What happens when you use max-pooling instead of average pooling?
    2. [E] When should we use one instead of the other?
    3. [E] What happens when pooling is removed completely?
    4. [M] What happens if we replace a 2 x 2 max pool layer with a conv layer of stride 2?
  9. [M] When we replace a normal convolutional layer with a depthwise separable convolutional layer, the number of parameters can go down. How does this happen? Give an example to illustrate this.
  10. [M] Can you use a base model trained on ImageNet (image size 256 x 256) for an object classification task on images of size 320 x 360? How?
  11. [H] How can a fully-connected layer be converted to a convolutional layer?
  12. [H] Pros and cons of FFT-based convolution and Winograd-based convolution.

    Hint: Read Fast Algorithms for Convolutional Neural Networks (Andrew Lavin and Scott Gray, 2015)

results matching ""

    No results matching ""