Gated recurrent unit

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al.^[1] The GRU is like a long short-term memory (LSTM) with a forget gate,^[2] but has fewer parameters than LSTM, as it lacks an output gate.^[3] GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM.^[4]^[5] GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets.^[6]^[7]

Architecture[]

There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit.^[8]

The operator $\odot$ denotes the Hadamard product in the following.

Fully gated unit[]

Gated Recurrent Unit, fully gated version

Initially, for $t=0$ , the output vector is $h_{0}=0$ .

{\begin{aligned}z_{t}&=\sigma _{g}(W_{z}x_{t}+U_{z}h_{t-1}+b_{z})\\r_{t}&=\sigma _{g}(W_{r}x_{t}+U_{r}h_{t-1}+b_{r})\\{\hat {h}}_{t}&=\phi _{h}(W_{h}x_{t}+U_{h}(r_{t}\odot h_{t-1})+b_{h})\\h_{t}&=(1-z_{t})\odot h_{t-1}+z_{t}\odot {\hat {h}}_{t}\end{aligned}}

Variables

$x_{t}$ : input vector
$h_{t}$ : output vector
${\hat {h}}_{t}$ : candidate activation vector
$z_{t}$ : update gate vector
$r_{t}$ : reset gate vector
$W$ , $U$ and $b$ : parameter matrices and vector

Activation functions

$\sigma _{g}$ : The original is a sigmoid function.
$\phi _{h}$ : The original is a hyperbolic tangent.

Alternative activation functions are possible, provided that $\sigma _{g}(x)\in [0,1]$ .

Type 1

Type 2

Type 3

Alternate forms can be created by changing $z_{t}$ and $r_{t}$ ^[9]

Type 1, each gate depends only on the previous hidden state and the bias.
${\begin{aligned}z_{t}&=\sigma _{g}(U_{z}h_{t-1}+b_{z})\\r_{t}&=\sigma _{g}(U_{r}h_{t-1}+b_{r})\\\end{aligned}}$
Type 2, each gate depends only on the previous hidden state.
${\begin{aligned}z_{t}&=\sigma _{g}(U_{z}h_{t-1})\\r_{t}&=\sigma _{g}(U_{r}h_{t-1})\\\end{aligned}}$
Type 3, each gate is computed using only the bias.
${\begin{aligned}z_{t}&=\sigma _{g}(b_{z})\\r_{t}&=\sigma _{g}(b_{r})\\\end{aligned}}$

Minimal gated unit[]

The minimal gated unit is similar to the fully gated unit, except the update and reset gate vector is merged into a forget gate. This also implies that the equation for the output vector must be changed:^[10]

{\begin{aligned}f_{t}&=\sigma _{g}(W_{f}x_{t}+U_{f}h_{t-1}+b_{f})\\{\hat {h}}_{t}&=\phi _{h}(W_{h}x_{t}+U_{h}(f_{t}\odot h_{t-1})+b_{h})\\h_{t}&=(1-f_{t})\odot h_{t-1}+f_{t}\odot {\hat {h}}_{t}\end{aligned}}

Variables

$x_{t}$ : input vector
$h_{t}$ : output vector
${\hat {h}}_{t}$ : candidate activation vector
$f_{t}$ : forget vector
$W$ , $U$ and $b$ : parameter matrices and vector

References[]

^ Cho, Kyunghyun; van Merrienboer, Bart; Bahdanau, DZmitry; Bengio, Yoshua (2014). "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches". arXiv:1409.1259. Cite journal requires |journal= (help)
^ Felix Gers; Jürgen Schmidhuber; Fred Cummins (1999). "Learning to Forget: Continual Prediction with LSTM". Proc. ICANN'99, IEE, London. 1999: 850–855. doi:10.1049/cp:19991218. ISBN 0-85296-721-7.
^ "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. 2015-10-27. Retrieved May 18, 2016.
^ Ravanelli, Mirco; Brakel, Philemon; Omologo, Maurizio; Bengio, Yoshua (2018). "Light Gated Recurrent Units for Speech Recognition". IEEE Transactions on Emerging Topics in Computational Intelligence. 2 (2): 92–102. arXiv:1803.10225. doi:10.1109/TETCI.2017.2762739. S2CID 4402991.
^ Su, Yuahang; Kuo, Jay (2019). "On extended long short-term memory and dependent bidirectional recurrent neural network". Neurocomputing. 356: 151–161. arXiv:1803.01686. doi:10.1016/j.neucom.2019.04.044. S2CID 3675055.
^ Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].
^ Gruber, N.; Jockisch, A. (2020), "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?", Frontiers in Artificial Intelligence, 3: 40, doi:10.3389/frai.2020.00040, PMC 7861254, PMID 33733157, S2CID 220252321
^ Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].
^ Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE].
^ Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE].

[1] Cho, Kyunghyun; van Merrienboer, Bart; Bahdanau, DZmitry; Bengio, Yoshua (2014). "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches". arXiv:1409.1259. Cite journal requires |journal= (help)

[lstm1999-2] Felix Gers; Jürgen Schmidhuber; Fred Cummins (1999). "Learning to Forget: Continual Prediction with LSTM". Proc. ICANN'99, IEE, London. 1999: 850–855. doi:10.1049/cp:19991218. ISBN 0-85296-721-7.

[MyUser_Wildml.com_May_18_2016c-3] "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. 2015-10-27. Retrieved May 18, 2016.

[Ravalli2018-4] Ravanelli, Mirco; Brakel, Philemon; Omologo, Maurizio; Bengio, Yoshua (2018). "Light Gated Recurrent Units for Speech Recognition". IEEE Transactions on Emerging Topics in Computational Intelligence. 2 (2): 92–102. arXiv:1803.10225. doi:10.1109/TETCI.2017.2762739. S2CID 4402991.

[Su2019-5] Su, Yuahang; Kuo, Jay (2019). "On extended long short-term memory and dependent bidirectional recurrent neural network". Neurocomputing. 356: 151–161. arXiv:1803.01686. doi:10.1016/j.neucom.2019.04.044. S2CID 3675055.

[MyUser_Arxiv.org_May_18_2016c-6] Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].

[gruber_jockisch-7] Gruber, N.; Jockisch, A. (2020), "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?", Frontiers in Artificial Intelligence, 3: 40, doi:10.3389/frai.2020.00040, PMC 7861254, PMID 33733157, S2CID 220252321

[Chung_18_2016c-8] Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].

[9] Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE].

[10] Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]