Research Roadmap

See Github projects for more

CER

extend: reproduce CER on replay for all algos and envs
new: use CER for PER as CPER and compare results
Exponentially decay sampling from replay (old OpenAI Lab memory ideas)
Research: SIL with CER and PER

Hybrid Policy

try Q with non-argmax output sampling like in PG
try PG/AC methods with boltzmann/epsilon greedy policy

Multitask with hydra

generalize hydra architecture to non-Q algos
run experiments on hydra algos
set up more multitask environment:
- strategy: solve a, b, a-a, b-b, a-b, a*b
- cartpole, 2dball
- cartpole, acrobot
- cartpole, gridworld
- lunar, acrobot
- 3dball
- gridworld
- arthur’s cartpole inside gridworld, i.e. a*b
Hydra experiment: compile a list of basic motor skills we think are crucial, train on each head-tail on disjoint tasks to master each head-tail individually. Then, switch training to using composite tasks and let it master them. Might need to add auxiliary network to prevent forgetting.
Canonical experiments: Get results for all implemented algorithms on cartpole, lunar, mountain car, acrobot, gridworld, 2d and 3d ball + 2 - 3 more
NN architecture - head, tail, restricted body connections, multi body weight sharing

Tobias’s intuitive theory research

check in on their env again for intuitive physics

Regularization

Requests for Research 2.0 Regularization as example that directly tackles the reproducibility question. lab provides code + data + run instruction. here's what we did, the results, and how to run it for yourself. Take a step back, here's why it's reproducible, because you are already seeing everything and can rerun it for yourself without extra work.

OpenAI Retro contest

Apparently there's still room for improvement. Theoretical max score is 10k, top performance is only 4692. https://blog.openai.com/first-retro-contest-retrospective/

Other competitions

crowdAI prosthetics https://www.crowdai.org/challenges/nips-2018-ai-for-prosthetics-challenge
crowdAI VizDoom https://www.crowdai.org/challenges/visual-doom-ai-competition-2018-track-1
crowdAI Minecraft https://www.crowdai.org/challenges/marlo-2018
NIPS Pommerman https://www.pommerman.com/competitions
GECCO GVG-AI http://www.gvgai.net/

Misc

Robust control, noisy matrix
Fake rollout data training like supervised
Multitask and architecture
Correspondence
Mutual Information ::Chong, see notebook::
NN Frankenstein
introspection vs reward signal
GA, neuroevolution
semantics grounding research. maybe do brain dump and research paper: ::Douwe Kiela::
curriculum learning
rewardless model-based like alpha go
capacity measure
homeostasis of NN
self-reward instead of human design
meta learning using data vs human ingenuity
focus on architecture design too, have memory.
implement the env distribution distance idea
implement the optimality/capacity for fitness idea.
Multihead
Breadboard dynamic graph

Research Roadmap

Research Roadmap

CER

Hybrid Policy

Multitask with hydra

Tobias’s intuitive theory research

Regularization

OpenAI Retro contest

Other competitions

Misc

results matching ""

No results matching ""