“Computational-Statistical Gaps in Reinforcement Learning”
Speaker: Gaurav Mahajan (UCSD)
Abstract: A fundamental assumption in theory of reinforcement learning is “RL with linear function approximation”. Under this assumption, the optimal value function (either Q*, or V*, or both) can be obtained as the linear combination of finitely many known basis functions. Even though it was observed as early as 1963 that there are empirical benefits of using linear function approximation, only recently a series of work designed sample efficient algorithms for this setting. These works posed an important open problem: Can we design polynomial time algorithms for this setting? In this talk, I will go over recent progress on this open problem: a surprising computational-statistical gap in reinforcement learning. Even though we have polynomial sample complexity algorithms, under standard hardness assumption (NP != RP) there are no polynomial time algorithms for this setting. I will start by going over a few algorithmic ideas for designing sample efficient algorithms in RL and then move on to show how to build hard MDPs which satisfy linear function approximation assumption from hard 3-SAT instances. I will end the talk by discussing a few open problems in RL and sequence modelling.
Remote presentation only.
Join from PC, Mac, Linux, iOS or Android: https://yale.zoom.us/j/94359913798
Or Telephone：203-432-9666 (2-ZOOM if on-campus) or 646 568 7788
One Tap Mobile: +12034329666,,94359913798# US (Bridgeport)
Meeting ID: 943 5991 3798
International numbers available: https://yale.zoom.us/u/ac1Gq3KLWp