weight_optimizer – Selection of weight optimizers¶

Description¶

A weight optimizer is an algorithm that adjusts the synaptic weights in a network during training to minimize the loss function and thus improve the network’s performance on a given task.

This method is an essential part of plasticity rules like e-prop plasticity.

Currently two weight optimizers are implemented: gradient descent and the Adam optimizer.

In gradient descent [1] the weights are optimized via:

\[\begin{split}W_t = W_{t-1} - \eta g_t \,, \\\end{split}\]

where \(\eta\) denotes the learning rate and \(g_t\) the gradient of the current time step \(t\).

In the Adam scheme [2] the weights are optimized via:

\[\begin{split}m_0 &= 0, v_0 = 0, t = 1 \,, \\ m_t &= \beta_1 m_{t-1} + \left( 1- \beta_1 \right) g_t \,, \\ v_t &= \beta_2 v_{t-1} + \left( 1 - \beta_2 \right) g_t^2 \,, \\ \alpha_t &= \eta \frac{ \sqrt{ 1- \beta_2^t } }{ 1 - \beta_1^t } \,, \\ W_t &= W_{t-1} - \alpha_t \frac{ m_t }{ \sqrt{v_t} + \hat{\epsilon} } \,. \\\end{split}\]

Note that the implementation follows the implementation in TensorFlow [3] for comparability. The TensorFlow implementation deviates from [1] in that it assumes \(\hat{\epsilon} = \epsilon \sqrt{ 1 - \beta_2^t }\) to be constant, whereas [1] assumes \(\epsilon = \hat{\epsilon} \sqrt{ 1 - \beta_2^t }\) to be constant.

When optimize_each_step is set to True, the weights are optimized at every time step. If set to False, optimization occurs once per spike, resulting in a significant speed-up. For gradient descent, both settings yield the same results under exact arithmetic; however, small numerical differences may be observed due to floating point precision. For the Adam optimizer, only setting optimize_each_step to True precisely implements the algorithm as described in [2]. The impact of this setting on learning performance may vary depending on the task.

Parameters¶

The following parameters can be set in the status dictionary.

Common optimizer parameters
Parameter	Unit	Math equivalent	Default	Description
`batch_size`			1	Size of batch
`eta`		\(\eta\)	1e-4	Learning rate
`optimize_each_step`			`True`
`Wmax`	pA	\(W_{ji}^\text{max}\)	100.0	Maximal value for synaptic weight
`Wmin`	pA	\(W_{ji}^\text{min}\)	-100.0	Minimal value for synaptic weight

Gradient descent parameters (default optimizer)
Parameter	Unit	Math equivalent	Default	Description
`type`			“gradient_descent”	Optimizer type

Adam optimizer parameters
Parameter	Unit	Math equivalent	Default	Description
`type`			“adam”	Optimizer type
`beta_1`		\(\beta_1\)	0.9	Exponential decay rate for first moment estimate
`beta_2`		\(\beta_2\)	0.999	Exponential decay rate for second moment estimate
`epsilon`		\(\epsilon\)	1e-7	Small constant for numerical stability

The following state variables evolve during simulation.

Adam optimizer state variables for individual synapses
State variable	Unit	Math equivalent	Initial value	Description
`m`		\(m\)	0.0	First moment estimate
`v`		\(v\)	0.0	Second moment raw estimate

weight_optimizer – Selection of weight optimizers¶

Description¶

Parameters¶

References¶

See also¶

Examples using this model¶