@@ -201,8 +201,8 @@ The Deep Bayesian Bandits library includes the following algorithms (see the
neural networks (or more generally, models) that map contexts to rewards,
consists in randomly perturbing a point estimate trained by Stochastic
Gradient Descent on the data. The Parameter-Noise algorithm uses a heuristic
to control the amount of noise  it adds independently to the
parameters representing a neural network:  where .
to control the amount of noise σ<sub>t</sub><sup>2</sup> it adds independently to the
parameters representing a neural network: θ<sub>t</sub><sup>'</sup> where .
After using  for decision making, the following SGD
training steps start again from . The key hyperparameters to set