8.2.3 Reinforcement learning

🌳 Tip 🌳
To refresh your knowledge on deep RL, checkout Spinning Up in Deep RL (OpenAI)

  1. [E] Explain the explore vs exploit tradeoff with examples.
  2. [E] How would a finite or infinite horizon affect our algorithms?
  3. [E] Why do we need the discount term for objective functions?
  4. [E] Fill in the empty circles using the minimax algorithm.

    Minimax algorithm
  5. [M] Fill in the alpha and beta values as you traverse the minimax tree from left to right.

    Alpha-beta pruning
  6. [E] Given a policy, derive the reward function.

  7. [M] Pros and cons of on-policy vs. off-policy.
  8. [M] What’s the difference between model-based and model-free? Which one is more data-efficient?

results matching ""

    No results matching ""