6.2 Complexity and numerical analysis

If some characters seem to be missing, it's because MathJax is not loaded correctly. Refreshing the page should fix it.

Given that most of the recent breakthroughs in machine learning come from bigger models that require massive memory and computational power, it’s important to not only know how to implement a model but also how to scale it. To scale a model, we’d need to be able to estimate memory requirement and computational cost, as well as mitigate numerical instability when training and serving machine learning models. Here are some of the questions that can be asked to evaluate your understanding of numerical stability and scalability.

Matrix multiplication
1. [E] You have three matrices: $A \in R^{100 \times 5}, B \in R^{5 \times 200}, C \in R^{200 \times 20}$ and you need to calculate the product $ABC$ . In what order would you perform your multiplication and why?
2. [M] Now you need to calculate the product of $N$ matrices $A_1A_2...A_n$ . How would you determine the order in which to perform the multiplication?
[E] What are some of the causes for numerical instability in deep learning?
[E] In many machine learning techniques (e.g. batch norm), we often see a small term $\epsilon$ added to the calculation. What’s the purpose of that term?
[E] What made GPUs popular for deep learning? How are they compared to TPUs?
[M] What does it mean when we say a problem is intractable?
[H] What are the time and space complexity for doing backpropagation on a recurrent neural network?
[H] Is knowing a model’s architecture and its hyperparameters enough to calculate the memory requirements for that model?
[H] Your model works fine on a single GPU but gives poor results when you train it on 8 GPUs. What might be the cause of this? What would you do to address it?
[H] What benefits do we get from reducing the precision of our model? What problems might we run into? How to solve these problems?
[H] How to calculate the average of 1M floating-point numbers with minimal loss of precision?
[H] How should we implement batch normalization if a batch is spread out over multiple GPUs?
[M] Given the following code snippet. What might be a problem with it? How would you improve it? Hint: this is an actual question asked on StackOverflow.

import numpy as np

def within_radius(a, b, radius):
    if np.linalg.norm(a - b) < radius:
        return 1
    return 0

def make_mask(volume, roi, radius):
    mask = np.zeros(volume.shape)
    for x in range(volume.shape[0]):
        for y in range(volume.shape[1]):
            for z in range(volume.shape[2]):
                mask[x, y, z] = within_radius((x, y, z), roi, radius)
    return mask

6.2 Complexity and numerical analysis

6.2 Complexity and numerical analysis

results matching ""

No results matching ""