@@ -60,7 +60,9 @@ beneficial personalized action under some metric (the reward).
Thompson Sampling is a meta-algorithm that chooses an action for the contextual
bandit in a statistically efficient manner, simultaneously finding the best arm
while attempting to incur low cost. Informally speaking, we assume the expected
reward is given by some function . Unfortunately, function **f** is unknown, as otherwise we could just choose the