#### 8.2.3 Reinforcement learning

🌳

Tip🌳

To refresh your knowledge on deep RL, checkout Spinning Up in Deep RL (OpenAI)

- [E] Explain the explore vs exploit tradeoff with examples.
- [E] How would a finite or infinite horizon affect our algorithms?
- [E] Why do we need the discount term for objective functions?
[E] Fill in the empty circles using the minimax algorithm.

[M] Fill in the alpha and beta values as you traverse the minimax tree from left to right.

[E] Given a policy, derive the reward function.

- [M] Pros and cons of on-policy vs. off-policy.
- [M] What’s the difference between model-based and model-free? Which one is more data-efficient?