这篇文章的核心使用Variational Autoencoder配合高斯分布将图像转换到另一个空间下。使用编码器encoder的输出结果作为状态和目标。这种编码方式优于欧式空间的度量方法,称之为latent space。使用Variational Autoencoder的好处如下:
- Provides a space where distances are more meaningful, and thus allows use of a well-structured reward function (ex. distance between encodings)
- Inputs to the reinforcement learning network are structured (不理解)
- New states can be sampled from the decoder output, allowing automated synthetic goal creation during training to allow the goal-conditioned policy to practice diverse policies
算法的流程如下:
- state observations are collected by random exploration of the environment (使用随机的策略收集状态观测)
- a variational autoencoder is trained from these observations (训练VA)
- latent encodings for each state are obtained from the variational autoencoder(得到在laten space下的状态和目标)
- (goal, state) encodings are sampled from existing set (采样(s,a,r,s‘,g))
- a reinforcement learning algorithm is trained on latent encodings (基于Q-learning的都可以)
- repeat steps 4–5 with the following conditions:
- 6.1) periodically retrain the autoencoder with newly generated image spaces.(间断性的重新训练VA,不同的状态下目标是有所变化的)
- 6.2)Generate new goals by feeding goal images through variational autoencoder.(生成新的目标)
版权声明:本文为liyaohhh原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。