1. 5
  1.  

  2. 1

    I admittedly don’t remember much about RL (I took a class which introduced it 5 years ago), but I remember building an agent for playing Pacman with it where our training was done by slowly changing the discount and alpha factors, which made the training go a bit quicker. Is that not something that can be done here? 6-7 hours seems like a lot!