Pointer Networks
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. "Pointer Networks.", Conference on Neural Information Processing Systems abs/1506.03134. (2015): 2692-2700.
Sequence-to-Sequence Model
问题描述:比如对于
Content Based Input Attention
Ptr-Net
Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning
Ma Qiang, Ge Suwen, He Danyang, Thaker Darshan, and Drori Iddo. "Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning", arXiv preprint arXiv 1911.04936 (2019)
Reinforcement Learning for TSP
奖励的期望为:
Hierarchical RL for TSP
第
Hierarchical Policy Gradient
第
对于这个
GPN Architecture
Encoder
包含两部分,
Graph Embedding Layer
一般的
设
Experiments
小规模的问题:
大规模的问题:
Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem
Yan Jin, Yuandong Ding, Xuanhao Pan, Kun He, Li Zhao, Tao Qin, Lei Song, and Jiang Bian. "Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem", CoRR abs/2304.09407 (2023): 8132-8140.
目标为最大化奖励:
Reversible residual network based encoder
采取特征增强,将每个节点从
使用可逆残差网络,不同于剩余网络,其不需要存储所有剩余层的激活值来计算反向传播中的梯度。
因为可以从输出嵌入
Multi-pointer network based decoder
Enhanced Context Embedding
A Multi-pointer Network
在每一步中,
A modified REINFORCE algorithm
用
对于一个
对于一个包含