You wrote: "We then use those Q-Values to decide on our policy π (where a policy is simply each…

Nov 23, 2020

You wrote: "We then use those Q-Values to decide on our policy π (where a policy is simply each action’s probability to be selected)".

As I understand: with Q-Learning (or DQN) you do not get probabilities of actions but Q-Values for actions.

Is there any way to get probabilities then?

Written by Piotr Niewiński