Towards a Theory for Sample-efficient Reinforcement Learning with Rich Observations
How can we tractably solve sequential decision making problems where the learning agent receives rich observations? We begin with a new model called Contextual Decision Processes (CDPs) for studying such problems, and show that it…
Super-Human AI for Strategic Reasoning
Poker has been a challenge problem in game theory, operations research, and artificial intelligence for decades. As a game of imperfect information, it involves obstacles not present in games like chess and go, and requires…
Fully Online Matching
We introduce a fully online model of maximum cardinality matching in which all vertices arrive online. On the arrival of a vertex, its incident edges to previously-arrived vertices are revealed. Each vertex has a deadline…
LSH-Sampling Breaks the Computation Chicken-and-Egg Loop in Adaptive Stochastic Gradient Estimation
Stochastic Gradient Descent or SGD is the most popular algorithm for large-scale optimization. In SGD, the gradient is estimated by uniform sampling with sample size one. There have been several results that show better gradient…
Anthropomorphising AI Is an Impediment to a Stable Society
Computation, unlike mathematics, is a physical process that takes time, energy, and space. Humans have dominated this planet’s ecosystem by learning to share and consolidate the outcome of their computation in an unprecedented way. Now…
The Matching Problem in General Graphs is in Quasi-NC
We show that the perfect matching problem in general graphs is in Quasi-NC. That is, we give a deterministic parallel algorithm which runs in polylogarithmic time on quasi-polynomially many processors. The result is obtained by…