热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

重磅课程|MIPT推出深度强化学习进阶[AI前沿技术]

关注:决策智能与机器学习,深耕AI脱水干货作者|DeepRL来源|https:deeppavlov.ai报道|深度强化学习实验室莫斯科物理技术研究所&#

关注:决策智能与机器学习,深耕AI脱水干货

作者 | DeepRL

来源 | https://deeppavlov.ai

道 |  深度强化学习实验室

莫斯科物理技术研究所(MIPT,Moscow Institute of Physics and Technology)重磅推出强化学习进阶课程,本课程重点介绍深度强化学习近年来的最新研究进展,涉及强化学习中探索策略,模仿和反向强化学习,分层强化学习,强化学习中的进化策略,分布式强化学习,强化学习组合优化,多智能体强化学习,大规模强化学习,多任务和迁移强化学习,强化学习中的记忆机制,值得大家研究。

第一部分:课程

RL#1: 13.02.2020: Exploration in RL

Sergey Ivanov

  • Random Network Distillation [1]

  • Intrinsic Curiosity Module [2,3]

  • Episodic Curiosity through Reachability [4]

RL#2: 20.02.2020: Imitation and Inverse RL

Just Heuristic

  • Imitation Learning[5]

  • Inverse RL [6,7]

  • Learning from Human Preferences [8]

RL#3: 27.02.2020: Hierarchical Reinforcement Learning

Petr Kuderov

  • A framework for temporal abstraction in RL [9]

  • The Option-Critic Architecture [10]

  • FeUdal Networks for Hierarchical RL [11]

  • Data-Efficient Hierarchical RL [12]

  • Meta Learning Shared Hierarchies [13] 

RL#4: 5.03.2020: Evolutionary Strategies in RL

Evgenia Elistratova

  • A framework for temporal abstraction in reinforcement learning [14]

  • Improving Exploration in Evolution Strategies for Deep RL [15]

  • Paired Open-Ended Trailblazer (POET) [16]

  • Sim-to-Real: Learning Agile Locomotion For Quadruped Robots [17]

RL#5: 12.03.2020: Distributional Reinforcement Learning

Pavel Shvechikov

  • A Distributional Perspective on RL [18]

  • Distributional RL with Quantile Regression [19]

  • Implicit Quantile Networks for Distributional RL [20]

  • Fully Parameterized Quantile Function for Distributional RL [21]

RL#6: 19.03.2020:RL for Combinatorial optimization

Taras Khakhulin

  • RL for Solving the Vehicle Routing Problem [22]

  • Attention, Learn to Solve Routing Problems! [23]

  • Learning Improvement Heuristics for Solving the Travelling Salesman Problem [24]

  • Learning Combinatorial Optimization Algorithms over Graphs [25]

RL#7: 26.03.2020: RL as Probabilistic Inference

Pavel Termichev

  • RL and Control as Probabilistic Inference: Tutorial and Review [26]

  • RL with Deep Energy-Based Policies [27]

  • Soft Actor-Critic [28]

  • Variational Bayesian RL with Regret Bounds [29]

RL#8: 9.04.2020: Multi Agent Reinforcement Learning

Sergey Sviridov

  • Stabilising Experience Replay for Deep Multi-Agent RL [30]

  • Counterfactual Multi-Agent Policy Gradients [31]

  • Value-Decomposition Networks For Cooperative Multi-Agent Learning [32]

  • Monotonic Value Function Factorisation for Deep Multi-Agent RL [33]

  • Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments [34]

RL#9: 16.04.2020:  Model-Based Reinforcement Learning

Evgeny Kashin

  • DL for Real-Time Atari Game Play Using Offline MCTS Planning [35]

  • Mastering Chess and Shogi by Self-Play with a General RL Algorithm [36]

  • World Models [37]

  • Model-Based RL for Atari [38]

  • Learning Latent Dynamics for Planning from Pixels [39] 

RL#10: 23.04.2020: Reinforcement Learning at Scale

Aleksandr Panin

  • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour [40]

  • HOGWILD!: A Lock-Free Approach to Parallelizing SGD [41]

  • GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [42]

  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [43]

  • Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts [44]

RL#11: 30.04.2020: Multitask & Transfer RL

Dmitry Nikulin

  • Universal Value Function Approximators [45]

  • Hindsight Experience Replay [46]

  • PathNet: Evolution Channels Gradient Descent in Super Neural Networks [47]

  • Progressive Neural Networks [48]

  • Learning an Embedding Space for Transferable Robot Skills [49]

RL#12: 07.05.2020: Memory in Reinforcement Learning

Artyom Sorokin

  • Recurrent Experience Replay in Distributed RL [50]

  • AMRL: Aggregated Memory For RL [51]

  • Unsupervised Predictive Memory in a Goal-Directed Agent [52]

  • Stabilizing Transformers for RL [53]

  • Model-Free Episodic Control [54]

  • Neural Episodic Control [55]

RL#13: 14.05.2020: Distributed RL In the wild

Sergey Kolesnikov

  • Asynchronous Methods for Deep RL [56]

  • IMPALA: Scalable Distributed DRL with Importance Weighted Actor-Learner Architectures [57]

  • Distributed Prioritized Experience Replay [58]

  • Making Efficient Use of Demonstrations to Solve Hard Exploration Problems [59]

  • SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference [60]

第二部分:项目

【1】Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives (Hierarchical RL)

Implement the paper on the test environment of your choice.

【2】 HIRO with Hindsight Experience Replay (Hierarchical RL)

Add Hindsight experience replay to the HIRO algorithm.Compare with HIRO.

【3】 Meta Learning Shared Hierarchies on pytorch (Hierarchical RL)  

Implement the paper with pytorch (author's implementation uses TF). Check its results on the test environment of your choice (not from the paper).

【4】Fast deep Reinforcement learning using online adjustments from the past (Memory in RL)

Try to reproduce the paper or implement the algorithm on a different environment.

Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study.

【5】Episodic Reinforcement Learning with Associative Memory (Memory in  RL)

Try to reproduce the paper or implement the algorithm on a different environment.

Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study.

【6】Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization (Inverse RL)

Implement the algorithm and test it on Atari games. Compare results with common baselines.

【7】Non-Monotonic Sequential Text Generation on TF/chainer (Imitation Learning)

Implement the paper on tensorflow or chainer.

【8】Evolution Strategies as a Scalable Alternative to Reinforcement      Learning (Evolution Strategies)

Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.

【9】Improving Exploration in Evolution Strategies for DRL via a Population of Novelty-Seeking Agents (Evolution Strategies)

 Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.

【10】Comparative study of intrinsic motivations (Exploration in RL)

Using MountainCar-v0 compare:

1) curiosity on forward dynamics model loss;
2) curiosity on inverse dynamics model loss;
3) ICM;
4) RND.
Bonus points:
* Add motivation for off-policy RL algorithm (e.g. DQN or QR-DQN);
* Try MountainCarContinuous-v0.

【11】Solving Unity Pyramids (Exploration in RL)

Try to reproduce this experiment using any intrinsic motivation you like.

【12】RND Exploratory Behavior (Exploration in RL)

There was a study of exploratory behaviors for curiosity-based intrinsic motivation. Choose any environment, e.g. some Atari game, and discover exploratory behavior of RND.

【13】 Learning Improvement Heuristics for Solving the Travelling Salesman   Problem (RL for Combinatorial Opt.)

Implement the paper on any combinatorial opt. problem you like. Compare with avialable solvers.

【14】Dynamic Attention Model for Vehicle Routing Problems (RL for Combinatorial Opt.)

Implement the paper on any combinatorial opt. problem you like. Compare   with avialable solvers.

【15】Variational RL with Regret Bounds (Variational RL)

Try to reproduce K-learning algorithm from the paper. Pick a finite discrete environment of your choice. Use this paper as an addition to the main one.

Bonus points:
* Compare with exact version of soft actor-critic or soft q-learning from here. Hint: use message-passing algorithm;
* Propose approximate K-learning algorithm with the use of function approximators (neural networks).

第三部分:课程资源

课程主页:https://deeppavlov.ai/rl_course_2020

Bilibili: https://www.bilibili.com/video/av668428103/

Youtube:

https://www.youtube.com/playlist?list=PLt1IfGj6-_-eXjZDFBfnAhAJmCyX227ir

交流合作

请加微信号:yan_kylin_phenix,注明姓名+单位+从业方向+地点,非诚勿扰。


推荐阅读
  • 生成式对抗网络模型综述摘要生成式对抗网络模型(GAN)是基于深度学习的一种强大的生成模型,可以应用于计算机视觉、自然语言处理、半监督学习等重要领域。生成式对抗网络 ... [详细]
  • 本文讨论了在Windows 8上安装gvim中插件时出现的错误加载问题。作者将EasyMotion插件放在了正确的位置,但加载时却出现了错误。作者提供了下载链接和之前放置插件的位置,并列出了出现的错误信息。 ... [详细]
  • 知识图谱——机器大脑中的知识库
    本文介绍了知识图谱在机器大脑中的应用,以及搜索引擎在知识图谱方面的发展。以谷歌知识图谱为例,说明了知识图谱的智能化特点。通过搜索引擎用户可以获取更加智能化的答案,如搜索关键词"Marie Curie",会得到居里夫人的详细信息以及与之相关的历史人物。知识图谱的出现引起了搜索引擎行业的变革,不仅美国的微软必应,中国的百度、搜狗等搜索引擎公司也纷纷推出了自己的知识图谱。 ... [详细]
  • sklearn数据集库中的常用数据集类型介绍
    本文介绍了sklearn数据集库中常用的数据集类型,包括玩具数据集和样本生成器。其中详细介绍了波士顿房价数据集,包含了波士顿506处房屋的13种不同特征以及房屋价格,适用于回归任务。 ... [详细]
  • Python正则表达式学习记录及常用方法
    本文记录了学习Python正则表达式的过程,介绍了re模块的常用方法re.search,并解释了rawstring的作用。正则表达式是一种方便检查字符串匹配模式的工具,通过本文的学习可以掌握Python中使用正则表达式的基本方法。 ... [详细]
  • 本文讨论了在openwrt-17.01版本中,mt7628设备上初始化启动时eth0的mac地址总是随机生成的问题。每次随机生成的eth0的mac地址都会写到/sys/class/net/eth0/address目录下,而openwrt-17.01原版的SDK会根据随机生成的eth0的mac地址再生成eth0.1、eth0.2等,生成后的mac地址会保存在/etc/config/network下。 ... [详细]
  • Spring源码解密之默认标签的解析方式分析
    本文分析了Spring源码解密中默认标签的解析方式。通过对命名空间的判断,区分默认命名空间和自定义命名空间,并采用不同的解析方式。其中,bean标签的解析最为复杂和重要。 ... [详细]
  • 向QTextEdit拖放文件的方法及实现步骤
    本文介绍了在使用QTextEdit时如何实现拖放文件的功能,包括相关的方法和实现步骤。通过重写dragEnterEvent和dropEvent函数,并结合QMimeData和QUrl等类,可以轻松实现向QTextEdit拖放文件的功能。详细的代码实现和说明可以参考本文提供的示例代码。 ... [详细]
  • Java序列化对象传给PHP的方法及原理解析
    本文介绍了Java序列化对象传给PHP的方法及原理,包括Java对象传递的方式、序列化的方式、PHP中的序列化用法介绍、Java是否能反序列化PHP的数据、Java序列化的原理以及解决Java序列化中的问题。同时还解释了序列化的概念和作用,以及代码执行序列化所需要的权限。最后指出,序列化会将对象实例的所有字段都进行序列化,使得数据能够被表示为实例的序列化数据,但只有能够解释该格式的代码才能够确定数据的内容。 ... [详细]
  • 本文主要解析了Open judge C16H问题中涉及到的Magical Balls的快速幂和逆元算法,并给出了问题的解析和解决方法。详细介绍了问题的背景和规则,并给出了相应的算法解析和实现步骤。通过本文的解析,读者可以更好地理解和解决Open judge C16H问题中的Magical Balls部分。 ... [详细]
  • 本文介绍了作者在开发过程中遇到的问题,即播放框架内容安全策略设置不起作用的错误。作者通过使用编译时依赖注入的方式解决了这个问题,并分享了解决方案。文章详细描述了问题的出现情况、错误输出内容以及解决方案的具体步骤。如果你也遇到了类似的问题,本文可能对你有一定的参考价值。 ... [详细]
  • 预备知识可参考我整理的博客Windows编程之线程:https:www.cnblogs.comZhuSenlinp16662075.htmlWindows编程之线程同步:https ... [详细]
  • Givenasinglylinkedlist,returnarandomnode'svaluefromthelinkedlist.Eachnodemusthavethe s ... [详细]
  • 本文介绍了一个适用于PHP应用快速接入TRX和TRC20数字资产的开发包,该开发包支持使用自有Tron区块链节点的应用场景,也支持基于Tron官方公共API服务的轻量级部署场景。提供的功能包括生成地址、验证地址、查询余额、交易转账、查询最新区块和查询交易信息等。详细信息可参考tron-php的Github地址:https://github.com/Fenguoz/tron-php。 ... [详细]
  • 无详细内容MySQLmysqlmysqlDELIMITERmysqlCREATEFUNCTIONmyProc(costDECIMAL(10,2))-RETURNSDECIMAL(1 ... [详细]
author-avatar
1042613658_047ede
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有