Hi,
From the paper "layer normalization", in section 3.1, layer normalization for rnn is used for the sum of the weighted input and weighted hidden. In the code of lngru, it seems ln is only applied to the weighted hidden instead of the sum?
Is there anything particular about this?
Thanks.