Or, as I like to call it, “The black magic of discount factors” — I first discovered this problem thanks to my supervisor, Prof. Proutiere, while checking my proposal for the second lab session of the Reinforcement Learning (EL2805) course we held at KTH last year, in fall 2020. The problem is severe enough to affect most Deep Reinforcement Learning algorithms, including A3C [7]…