Visual Reinforcement Learning with Imagined Goals

这篇文章的核心使用Variational Autoencoder配合高斯分布将图像转换到另一个空间下。使用编码器encoder的输出结果作为状态和目标。这种编码方式优于欧式空间的度量方法，称之为latent space。使用Variational Autoencoder的好处如下：

Provides a space where distances are more meaningful, and thus allows use of a well-structured reward function (ex. distance between encodings)
Inputs to the reinforcement learning network are structured (不理解）
New states can be sampled from the decoder output, allowing automated synthetic goal creation during training to allow the goal-conditioned policy to practice diverse policies

算法的流程如下：

state observations are collected by random exploration of the environment （使用随机的策略收集状态观测）
a variational autoencoder is trained from these observations （训练VA）
latent encodings for each state are obtained from the variational autoencoder（得到在laten space下的状态和目标）
(goal, state) encodings are sampled from existing set （采样（s，a，r，s‘，g））
a reinforcement learning algorithm is trained on latent encodings （基于Q-learning的都可以）
repeat steps 4–5 with the following conditions:
6.1) periodically retrain the autoencoder with newly generated image spaces.（间断性的重新训练VA，不同的状态下目标是有所变化的）
6.2)Generate new goals by feeding goal images through variational autoencoder.（生成新的目标）