piątek, 20 kwietnia 2018

RNN (Recurrent Neural Networks) and LSTM (Long short-term memory)

RNN (Recurrent Neural Networks)

In the below diagram, a chunk of neural network, A, looks at some input xt and outputs a value ht. A loop allows information to be passed from one step of the network to the next.


These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.

This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They’re the natural architecture of neural network to use for such data.


One of the appeals of RNNs is the idea that they might be able to connect previous information to the present task, such as using previous video frames might inform the understanding of the present frame.

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer:


In theory, RNNs are absolutely capable of handling such “long-term dependencies.” Sadly, in practice, RNNs don’t seem to be able to learn them.


Limitations of RNNs:
Recurrent Neural Networks work just fine when we are dealing with short-term dependencies.

RNN remembers things for just small durations of time, i.e. if we need the information after a small time it may be reproducible, but once a lot of informations are fed in, this information gets lost somewhere. 

RNN in order to add a new information, it transforms the existing information completely by applying a function. Because of this, the entire information is modified, on the whole, i. e. there is no consideration for ‘important’ information and ‘not so important’ information. 

This issue can be resolved by applying a slightly tweaked version of RNNs – the Long Short-Term Memory Networks. 



LSTM (Long short-term memory)
LSTM is a very special kind of recurrent neural network which works, for many tasks, much much better than the standard version.
In LSTMs remembering information for long periods of time is practically their default behavior.


LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way:

In the above diagram, each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network layers. Lines merging denote concatenation, while a line forking denote its content being copied and the copies going to different locations. 

A typical LSTM network is comprised of different memory blocks called cells
(the rectangles that we see in the image).  There are two states that are being transferred to the next cell; the cell state and the hidden state. The memory blocks are responsible for remembering things and manipulations to this memory is done through three major mechanisms, called gates.

The key to LSTMs is the cell state, the horizontal line running through the top of the diagram.
The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions.

The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates. Gates are a way to optionally let information through.
LSTMs make small modifications to the information by multiplications and additions. With LSTMs, the information flows through a mechanism known as cell states. This way, LSTMs can selectively remember or forget things. The information at a particular cell state has three different dependencies.
 



Summary
LSTMs were a big step in what we can accomplish with RNNs.
What I’ve described so far is a pretty normal LSTM. But not all LSTMs are the same as the above. In fact, it seems like almost every paper involving LSTMs uses a slightly different version.

In comparisons with RTRL (Real-Time Recurrent Learning), BPTT (Back-Propagation Through Time), LSTM leads to many more successful runs and learns mouch faster.
LSTM also solves complex, artificial long time lag tasks that have never been solved by previous recurrent network algorithms.

The Long Short-Term Memory recurrent neural network has the promise of learning long sequences of observations.
It seems a perfect match for time series forecasting, and in fact, it may be.

LSTMs are a very promising solution to sequence and time series related problems. However, the one disadvantage that I find about them, is the difficulty in training them. A lot of time and system resources go into training even a simple model. But that is just a hardware constraint!

LSTM’s are really good but still face some issues for some problems so many people developed other methods also after LSTM’s like GRU (Gated Recurrent Units).
For fast GRU info see: https://datascience.stackexchange.com/questions/14581/when-to-use-gru-over-lstm


RCNN:

LSTM:

GRU:

Another:

Brak komentarzy:

Prześlij komentarz