weight_optimizer – Selection of weight optimizers¶
Description¶
A weight optimizer is an algorithm that adjusts the synaptic weights in a network during training to minimize the loss function and thus improve the network’s performance on a given task.
This method is an essential part of plasticity rules like e-prop plasticity.
Currently two weight optimizers are implemented: gradient descent and the Adam optimizer.
In gradient descent [1] the weights are optimized via:
where \(\eta\) denotes the learning rate and \(g_t\) the gradient of the current time step \(t\).
In the Adam scheme [2] the weights are optimized via:
Note that the implementation follows the implementation in TensorFlow [3] for comparability. The TensorFlow implementation deviates from [1] in that it assumes \(\hat{\epsilon} = \epsilon \sqrt{ 1 - \beta_2^t }\) to be constant, whereas [1] assumes \(\epsilon = \hat{\epsilon} \sqrt{ 1 - \beta_2^t }\) to be constant.
When optimize_each_step is set to True, the weights are optimized at every time step. If set to False, optimization occurs once per spike, resulting in a significant speed-up. For gradient descent, both settings yield the same results under exact arithmetic; however, small numerical differences may be observed due to floating point precision. For the Adam optimizer, only setting optimize_each_step to True precisely implements the algorithm as described in [2]. The impact of this setting on learning performance may vary depending on the task.
Parameters¶
The following parameters can be set in the status dictionary.
Common optimizer parameters |
||||
---|---|---|---|---|
Parameter |
Unit |
Math equivalent |
Default |
Description |
|
1 |
Size of batch |
||
|
\(\eta\) |
1e-4 |
Learning rate |
|
|
|
|||
|
pA |
\(W_{ji}^\text{max}\) |
100.0 |
Maximal value for synaptic weight |
|
pA |
\(W_{ji}^\text{min}\) |
-100.0 |
Minimal value for synaptic weight |
Gradient descent parameters (default optimizer) |
||||
---|---|---|---|---|
Parameter |
Unit |
Math equivalent |
Default |
Description |
|
“gradient_descent” |
Optimizer type |
Adam optimizer parameters |
||||
---|---|---|---|---|
Parameter |
Unit |
Math equivalent |
Default |
Description |
|
“adam” |
Optimizer type |
||
|
\(\beta_1\) |
0.9 |
Exponential decay rate for first moment estimate |
|
|
\(\beta_2\) |
0.999 |
Exponential decay rate for second moment estimate |
|
|
\(\epsilon\) |
1e-7 |
Small constant for numerical stability |
The following state variables evolve during simulation.
Adam optimizer state variables for individual synapses |
||||
---|---|---|---|---|
State variable |
Unit |
Math equivalent |
Initial value |
Description |
|
\(m\) |
0.0 |
First moment estimate |
|
|
\(v\) |
0.0 |
Second moment raw estimate |
References¶
See also¶
Examples using this model¶
Tutorial on learning to accumulate evidence with e-prop after Bellec et al. (2020)
Tutorial on learning to generate a lemniscate with e-prop after Bellec et al. (2020)
Tutorial on learning to generate handwritten text with e-prop after Bellec et al. (2020)
Tutorial on learning to generate sine waves with e-prop after Bellec et al. (2020)