The idea behind Thompson Sampling is based on keeping a posterior distribution
 over functions in some family  after observing the first
π<sub>t</sub> over functions in some family f∈F after observing the first
*t-1* datapoints. Then, at time *t*, we sample one potential explanation of
the underlying process: , and act optimally (i.e., greedily)
*according to f_t*. In other words, we choose . Finally, we update our posterior distribution with the new collected