重磅课程|MIPT推出深度强化学习进阶[AI前沿技术]

作者：1042613658_047ede | 来源：互联网 | 2023-09-18 20:04

关注：决策智能与机器学习，深耕AI脱水干货作者|DeepRL来源|https:deeppavlov.ai报道|深度强化学习实验室莫斯科物理技术研究所&#

关注&＃xff1a;决策智能与机器学习&＃xff0c;深耕AI脱水干货

作者 | DeepRL

来源 | https://deeppavlov.ai

报道 | 深度强化学习实验室

莫斯科物理技术研究所&＃xff08;MIPT&＃xff0c;Moscow Institute of Physics and Technology&＃xff09;重磅推出强化学习进阶课程&＃xff0c;本课程重点介绍深度强化学习近年来的最新研究进展&＃xff0c;涉及强化学习中探索策略&＃xff0c;模仿和反向强化学习&＃xff0c;分层强化学习&＃xff0c;强化学习中的进化策略&＃xff0c;分布式强化学习&＃xff0c;强化学习组合优化&＃xff0c;多智能体强化学习&＃xff0c;大规模强化学习&＃xff0c;多任务和迁移强化学习&＃xff0c;强化学习中的记忆机制&＃xff0c;值得大家研究。

第一部分&＃xff1a;课程

RL#1: 13.02.2020&＃xff1a; Exploration in RL

Sergey Ivanov

Random Network Distillation [1]
Intrinsic Curiosity Module [2,3]
Episodic Curiosity through Reachability [4]

RL#2: 20.02.2020&＃xff1a; Imitation and Inverse RL

Just Heuristic

Imitation Learning[5]
Inverse RL [6,7]
Learning from Human Preferences [8]

RL#3: 27.02.2020&＃xff1a; Hierarchical Reinforcement Learning

Petr Kuderov

A framework for temporal abstraction in RL [9]
The Option-Critic Architecture [10]
FeUdal Networks for Hierarchical RL [11]
Data-Efficient Hierarchical RL [12]
Meta Learning Shared Hierarchies [13]

RL#4: 5.03.2020&＃xff1a; Evolutionary Strategies in RL

Evgenia Elistratova

A framework for temporal abstraction in reinforcement learning [14]
Improving Exploration in Evolution Strategies for Deep RL [15]
Paired Open-Ended Trailblazer (POET) [16]
Sim-to-Real: Learning Agile Locomotion For Quadruped Robots [17]

RL#5: 12.03.2020&＃xff1a; Distributional Reinforcement Learning

Pavel Shvechikov

A Distributional Perspective on RL [18]
Distributional RL with Quantile Regression [19]
Implicit Quantile Networks for Distributional RL [20]
Fully Parameterized Quantile Function for Distributional RL [21]

RL#6: 19.03.2020&＃xff1a;RL for Combinatorial optimization

Taras Khakhulin

RL for Solving the Vehicle Routing Problem [22]
Attention, Learn to Solve Routing Problems! [23]
Learning Improvement Heuristics for Solving the Travelling Salesman Problem [24]
Learning Combinatorial Optimization Algorithms over Graphs [25]

RL#7: 26.03.2020&＃xff1a; RL as Probabilistic Inference

Pavel Termichev

RL and Control as Probabilistic Inference: Tutorial and Review [26]
RL with Deep Energy-Based Policies [27]
Soft Actor-Critic [28]
Variational Bayesian RL with Regret Bounds [29]

RL#8: 9.04.2020&＃xff1a; Multi Agent Reinforcement Learning

Sergey Sviridov

Stabilising Experience Replay for Deep Multi-Agent RL [30]
Counterfactual Multi-Agent Policy Gradients [31]
Value-Decomposition Networks For Cooperative Multi-Agent Learning [32]
Monotonic Value Function Factorisation for Deep Multi-Agent RL [33]
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments [34]

RL#9: 16.04.2020&＃xff1a; Model-Based Reinforcement Learning

Evgeny Kashin

DL for Real-Time Atari Game Play Using Offline MCTS Planning [35]
Mastering Chess and Shogi by Self-Play with a General RL Algorithm [36]
World Models [37]
Model-Based RL for Atari [38]
Learning Latent Dynamics for Planning from Pixels [39]

RL#10: 23.04.2020&＃xff1a; Reinforcement Learning at Scale

Aleksandr Panin

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour [40]
HOGWILD!: A Lock-Free Approach to Parallelizing SGD [41]
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [42]
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [43]
Learning&＃64;home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts [44]

RL#11: 30.04.2020&＃xff1a; Multitask & Transfer RL

Dmitry Nikulin

Universal Value Function Approximators [45]
Hindsight Experience Replay [46]
PathNet: Evolution Channels Gradient Descent in Super Neural Networks [47]
Progressive Neural Networks [48]
Learning an Embedding Space for Transferable Robot Skills [49]

RL#12: 07.05.2020&＃xff1a; Memory in Reinforcement Learning

Artyom Sorokin

Recurrent Experience Replay in Distributed RL [50]
AMRL: Aggregated Memory For RL [51]
Unsupervised Predictive Memory in a Goal-Directed Agent [52]
Stabilizing Transformers for RL [53]
Model-Free Episodic Control [54]
Neural Episodic Control [55]

RL#13: 14.05.2020&＃xff1a; Distributed RL In the wild

Sergey Kolesnikov

Asynchronous Methods for Deep RL [56]
IMPALA: Scalable Distributed DRL with Importance Weighted Actor-Learner Architectures [57]
Distributed Prioritized Experience Replay [58]
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems [59]
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference [60]

第二部分&＃xff1a;项目

【1】Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives (Hierarchical RL)

Implement the paper on the test environment of your choice.

【2】 HIRO with Hindsight Experience Replay (Hierarchical RL)

Add Hindsight experience replay to the HIRO algorithm.Compare with HIRO.

【3】 Meta Learning Shared Hierarchies on pytorch (Hierarchical RL)

Implement the paper with pytorch (author&＃39;s implementation uses TF). Check its results on the test environment of your choice (not from the paper).

【4】Fast deep Reinforcement learning using online adjustments from the past (Memory in RL)

Try to reproduce the paper or implement the algorithm on a different environment.

Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study.

【5】Episodic Reinforcement Learning with Associative Memory (Memory in RL)

Try to reproduce the paper or implement the algorithm on a different environment.

Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study.

【6】Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization (Inverse RL)

Implement the algorithm and test it on Atari games. Compare results with common baselines.

【7】Non-Monotonic Sequential Text Generation on TF/chainer (Imitation Learning)

Implement the paper on tensorflow or chainer.

【8】Evolution Strategies as a Scalable Alternative to Reinforcement Learning (Evolution Strategies)

Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.

【9】Improving Exploration in Evolution Strategies for DRL via a Population of Novelty-Seeking Agents (Evolution Strategies)

Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.

【10】Comparative study of intrinsic motivations (Exploration in RL)

Using MountainCar-v0 compare:

1) curiosity on forward dynamics model loss;
2) curiosity on inverse dynamics model loss;
3) ICM;
4) RND.
Bonus points:
* Add motivation for off-policy RL algorithm (e.g. DQN or QR-DQN);
* Try MountainCarContinuous-v0.

【11】Solving Unity Pyramids (Exploration in RL)

Try to reproduce this experiment using any intrinsic motivation you like.

【12】RND Exploratory Behavior (Exploration in RL)

There was a study of exploratory behaviors for curiosity-based intrinsic motivation. Choose any environment, e.g. some Atari game, and discover exploratory behavior of RND.

【13】 Learning Improvement Heuristics for Solving the Travelling Salesman Problem (RL for Combinatorial Opt.)

Implement the paper on any combinatorial opt. problem you like. Compare with avialable solvers.

【14】Dynamic Attention Model for Vehicle Routing Problems (RL for Combinatorial Opt.)

Implement the paper on any combinatorial opt. problem you like. Compare with avialable solvers.

【15】Variational RL with Regret Bounds (Variational RL)

Try to reproduce K-learning algorithm from the paper. Pick a finite discrete environment of your choice. Use this paper as an addition to the main one.

Bonus points:
* Compare with exact version of soft actor-critic or soft q-learning from here. Hint: use message-passing algorithm;
* Propose approximate K-learning algorithm with the use of function approximators (neural networks).

第三部分&＃xff1a;课程资源

课程主页&＃xff1a;https://deeppavlov.ai/rl_course_2020

Bilibili: https://www.bilibili.com/video/av668428103/

Youtube:

https://www.youtube.com/playlist?list&＃61;PLt1IfGj6-_-eXjZDFBfnAhAJmCyX227ir

交流合作

请加微信号&＃xff1a;yan_kylin_phenix&＃xff0c;注明姓名&＃43;单位&＃43;从业方向&＃43;地点&＃xff0c;非诚勿扰。

推荐阅读

select
利用树莓派畅享落网电台音乐体验

最近重新拾起了闲置已久的树莓派，这台小巧的开发板已经沉寂了半年多。上个月闲暇时间较多，我决定将其重新启用。恰逢落网电台进行了改版，回忆起之前在树莓派论坛上看到有人用它来播放豆瓣音乐，便萌生了同样的想法。通过一番调试，终于实现了在树莓派上流畅播放落网电台音乐的功能，带来了全新的音乐享受体验。 ... [详细]

蜡笔小新 2024-11-05 09:20:37
object
Java 并发编程：深入解析 AtomicInteger 和 CAS 无锁算法

在多线程并发环境中，普通变量的操作往往是线程不安全的。本文通过一个简单的例子，展示了如何使用 AtomicInteger 类及其核心的 CAS 无锁算法来保证线程安全。 ... [详细]

蜡笔小新 2024-11-12 16:40:04
select
Go (Golang) 语言Golang 定时器Timer和Ticker、time.AfterFunc、time.NewTicker()实例

文章目录Golang定时器Timer和Tickertime.Timertime.NewTimer()实例time.AfterFunctime.Tickertime.NewTicke ... [详细]

蜡笔小新 2024-11-12 09:39:10
require
大类|电阻器_使用Requests、Etree、BeautifulSoup、Pandas和Path库进行数据抓取与处理 | 将指定区域内容保存为HTML和Excel格式

大类|电阻器_使用Requests、Etree、BeautifulSoup、Pandas和Path库进行数据抓取与处理 | 将指定区域内容保存为HTML和Excel格式 ... [详细]

蜡笔小新 2024-11-11 19:05:59
select
深入理解 Java 控制结构的全面指南

深入理解 Java 控制结构的全面指南 ... [详细]

蜡笔小新 2024-11-06 16:40:43
require
使用JavaScript生成Java兼容的UUID代码实现与优化技巧

本文介绍了UUID（通用唯一标识符）的概念及其在JavaScript中生成Java兼容UUID的代码实现与优化技巧。UUID是一个128位的唯一标识符，广泛应用于分布式系统中以确保唯一性。文章详细探讨了如何利用JavaScript生成符合Java标准的UUID，并提供了多种优化方法，以提高生成效率和兼容性。 ... [详细]

蜡笔小新 2024-11-05 18:19:54
default
如何在R中得到矩阵的右特征向量? - How to obtain right eigenvectors of matrix in R?

Edition:theprobleminmyquestionwasIvetriedtofindmatrixSfromequation8butthisequati ... [详细]

蜡笔小新 2024-11-13 17:16:49
select
在范围[0..n-1]中产生m个不同的随机数 - Generating m distinct random numbers in the range [0..n-1]

Ihavetwomethodsofgeneratingmdistinctrandomnumbersintherange[0..n-1]我有两种方法在范围[0.n-1]中生 ... [详细]

蜡笔小新 2024-11-13 09:49:14
text
javascript分页类支持页码格式

前端时间因为项目需要，要对一个产品下所有的附属图片进行分页显示，没考虑ajax一张张请求，所以干脆一次性全部把图片out，然 ... [详细]

蜡笔小新 2024-11-12 14:58:57
text
Python错误重试让多少开发者头疼？高效解决方案出炉

### 优化后的摘要在处理 Python 开发中的错误重试问题时，许多开发者常常感到困扰。为了应对这一挑战，`tenacity` 库提供了一种高效的解决方案。首先，通过 `pip install tenacity` 安装该库。使用时，可以通过简单的规则配置重试策略。例如，可以设置多个重试条件，使用 `|`（或）和 `&`（与）操作符组合不同的参数，从而实现灵活的错误重试机制。此外，`tenacity` 还支持自定义等待时间、重试次数和异常处理，为开发者提供了强大的工具来提高代码的健壮性和可靠性。 ... [详细]

蜡笔小新 2024-11-11 10:33:20
php
OpenAI首席执行官Sam Altman展望：人工智能的未来发展方向与挑战

OpenAI首席执行官Sam Altman展望：人工智能的未来发展方向与挑战 ... [详细]

蜡笔小新 2024-11-11 09:47:50
php
POJ 2482 星空中的星星：利用线段树与扫描线算法解决

在《POJ 2482 星空中的星星》问题中，通过运用线段树和扫描线算法，可以高效地解决星星在窗口内的计数问题。该方法不仅能够快速处理大规模数据，还能确保时间复杂度的最优性，适用于各种复杂的星空模拟场景。 ... [详细]

蜡笔小新 2024-11-09 12:09:08
select
Python 数据库操作指南：MySQL 与 Redis 实战技巧

本文详细介绍了使用 Python 进行 MySQL 和 Redis 数据库操作的实战技巧。首先，针对 MySQL 数据库，通过 `pymysql` 模块展示了如何连接和操作数据库，包括建立连接、执行查询和更新等常见操作。接着，文章深入探讨了 Redis 的基本命令和高级功能，如键值存储、列表操作和事务处理。此外，还提供了多个实际案例，帮助读者更好地理解和应用这些技术。 ... [详细]

蜡笔小新 2024-11-07 12:55:01
format
如何使用mysql_nd：Python连接MySQL数据库的优雅指南

无论是进行机器学习、Web开发还是爬虫项目，数据库操作都是必不可少的一环。本文将详细介绍如何使用Python通过 `mysql_nd` 库与 MySQL 数据库进行高效连接和数据交互。内容涵盖以下几个方面： ... [详细]

蜡笔小新 2024-11-06 15:19:37
require
Netty框架中运用Protobuf实现高效通信协议

在Netty框架中，通过引入Protobuf来实现高效的通信协议。为了使用Protobuf，需要先准备好环境，包括下载并安装Protobuf的代码生成器`protoc`以及相应的源码包。具体资源可从官方下载页面获取，确保版本兼容性以充分发挥其性能优势。此外，配置好开发环境后，可以通过定义`.proto`文件来自动生成Java类，从而简化数据序列化和反序列化的操作，提高通信效率。 ... [详细]

蜡笔小新 2024-11-05 17:06:20

1042613658_047ede

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章