Readme changes.

abfa50a4 · Carlos Riquelme · e0ef14fb · abfa50a4
Commit abfa50a4 authored Jul 23, 2018 by Carlos Riquelme
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 1 deletion

research/deep_contextual_bandits/README.md research/deep_contextual_bandits/README.md +2 -1

No files found.
--- a/research/deep_contextual_bandits/README.md
+++ b/research/deep_contextual_bandits/README.md
@@ -63,7 +63,8 @@ while attempting to incur low cost. Informally speaking, we assume the expected
 reward is given by some function
 **E**[r<sub>t</sub> | X<sub>t</sub>, a<sub>t</sub>] = f(X<sub>t</sub>, a<sub>t</sub>).
 Unfortunately, function **f** is unknown, as otherwise we could just choose the
-action with highest expected value: ![equation](https://latex.codecogs.com/gif.download?%5Cinline%20a_t%5E*%20%3D%20%5Carg%20%5Cmax_i%20f%28X_t%2C%20a_i%29).
+action with highest expected value:
+a<sub>t</sub><sup>*</sup> = arg max<sub>i</sub> f(X<sub>t</sub>, a<sub>t</sub>).
 The idea behind Thompson Sampling is based on keeping a posterior distribution
 ![equation](https://latex.codecogs.com/gif.download?%5Cinline%20%5Cpi_t) over functions in some family ![equation](https://latex.codecogs.com/gif.download?%5Cinline%20f%20%5Cin%20F) after observing the first