To maximize the total reward it receives over time.
To take actions randomly.
To take actions based on complete information.
To minimize the total reward it receives over time.
All courses were automatically generated using OpenAI's GPT-3. Your feedback helps us improve as we cannot manually review every course. Thank you!