Nov 23, 2020
You wrote: "We then use those Q-Values to decide on our policy π (where a policy is simply each action’s probability to be selected)".
As I understand: with Q-Learning (or DQN) you do not get probabilities of actions but Q-Values for actions.
Is there any way to get probabilities then?