The idea behind Thompson Sampling is based on keeping a posterior distribution
The idea behind Thompson Sampling is based on keeping a posterior distribution
π<sub>t</sub> over functions in some family f∈F after observing the first
π<sub>t</sub> over functions in some family f∈F after observing the first
*t-1* datapoints. Then, at time *t*, we sample one potential explanation of
*t-1* datapoints. Then, at time *t*, we sample one potential explanation of
the underlying process: , and act optimally (i.e., greedily)
the underlying process: f<sub>t</sub>∼π<sub>t</sub>, and act optimally (i.e., greedily)
*according to f_t*. In other words, we choose . Finally, we update our posterior distribution with the new collected
*according to f<sub>t</sub>*. In other words, we choose