1. 2
  1.  

  2. 2

    Am I correct in understanding that current RL techniques is equivalent to learning an automata? (i.e Markov decision processes require behavior to depend only on the current state, and there are only finitely many states).

    Is there research in learning a push down automata?