I have an open question about Prioritized Experience Replay from [Schaul15]. From my experiments, it seems that an equation in the publication is wrong, but maybe I’m overlooking something. I’d appreciate input.
To introduce ourselves to reinforcement learning with Deep-Q Networks (DQN), we’ll visit a standard OpenAI Gym problem, CartPole. We’ll cover deeper RL theory in a later post, but let’s get our hands dirty first, to build some intuition!
Kicking off a series of posts on OpenAI’s gym environment, I’ll cover some light bootstrapping to get us up and running more quickly. I promise, it will be short and sweet! I’ll be referring back to this from later posts in the series.
The complete series can be found on the bottom of this post and the latest version of the GitHub repo can be found here.