@@ -201,8 +201,8 @@ The Deep Bayesian Bandits library includes the following algorithms (see the
...
@@ -201,8 +201,8 @@ The Deep Bayesian Bandits library includes the following algorithms (see the
neural networks (or more generally, models) that map contexts to rewards,
neural networks (or more generally, models) that map contexts to rewards,
consists in randomly perturbing a point estimate trained by Stochastic
consists in randomly perturbing a point estimate trained by Stochastic
Gradient Descent on the data. The Parameter-Noise algorithm uses a heuristic
Gradient Descent on the data. The Parameter-Noise algorithm uses a heuristic
to control the amount of noise  it adds independently to the
to control the amount of noise σ<sub>t</sub><sup>2</sup> it adds independently to the
parameters representing a neural network:  where .
parameters representing a neural network: θ<sub>t</sub><sup>'</sup> where .
After using  for decision making, the following SGD
After using  for decision making, the following SGD
training steps start again from . The key hyperparameters to set
training steps start again from . The key hyperparameters to set