Commit f0d07563 authored by Carlos Riquelme's avatar Carlos Riquelme
Browse files

Readme changes.

parent 013910f8
......@@ -41,7 +41,7 @@ the process if we use algorithm **A** is as follows:
```
At time t = 1, ..., T:
1. Observe new context: X<sub>t</sub>
1. Observe new context: X_t
2. Choose action: a_t = A.action(X_t)
3. Observe reward: r_t
4. Update internal state of the algorithm: A.update((X_t, a_t, r_t))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment