@@ -60,7 +60,9 @@ beneficial personalized action under some metric (the reward).
...
@@ -60,7 +60,9 @@ beneficial personalized action under some metric (the reward).
Thompson Sampling is a meta-algorithm that chooses an action for the contextual
Thompson Sampling is a meta-algorithm that chooses an action for the contextual
bandit in a statistically efficient manner, simultaneously finding the best arm
bandit in a statistically efficient manner, simultaneously finding the best arm
while attempting to incur low cost. Informally speaking, we assume the expected
while attempting to incur low cost. Informally speaking, we assume the expected
reward is given by some function . Unfortunately, function **f** is unknown, as otherwise we could just choose the