Parameter Description | Value |
---|---|
Gaussian proposal variance (σ2) | 0.1 |
learning rate (for ℓ < 500) | 10−4 |
learning rate (for ℓ > 500) | 5 · 10−5 |
hidden layer size | 2K ℓ |
number hidden layers | 2 |
output activation | tanh |
network optimizer | Adam (Kingma & Ba, 2015) |
batch size (N) | 100 |
number iterations | 10,000 |