To take actions based on complete information.
To maximize the total reward it receives over time.
To minimize the total reward it receives over time.
To take actions randomly.
All courses were automatically generated using OpenAI's GPT-3. Your feedback helps us improve as we cannot manually review every course. Thank you!