23ème congrès annuel de la Société Française de Recherche Opérationnelle et d'Aide à la Décision

sciencesconf.org:roadef2022:378344

Reinforcement Learning-based Large Neighborhood Search Approach to Dock Assignment and Truck Scheduling in Crossdocks

Shahin Gelareh 1, @ , Rahimeh Neamatian Monemi 3, 2, @ , Roozbeh Sanaei 4, @

1 : Université dÁrtois

IUT Béthune

3 : IESEG

IESEG School of Magement

2 : Predictim Globe

Predictim Globe

4 : SUTD - MIT International Design Centre

Reinforcement learning approach toward combinatorial optimization aims to substitute hand-crafted heuristics of conventional algorithms with data-driven agents tuning their parameters according to received rewards and punishments based on their constructive or destructive conduct toward optimizing the provided objectives. However, this strategy is often challenging to apply in presence of intricate side constraints and cannot exploit the advantages of state-of-the-art commercial solvers. A recently proposed resolution for tackling this subject is to utilize large neighborhood search to leverage conventional solvers as a generic black-box subroutine. where the agent iteratively fixes a subset of variables and optimizes the rest through the black-box solver. In this contribution, we investigate how choice of deep neural architecture and reinforcement learning algorithm and their respective hyper-parameters affects the course of optimization in terms of convergence time and optimality of final solution, In particular, we consider three major algorithms of deep reinforcement learning, namely, deep Q-learning (DQN) and reinforce and actor-critic policy gradient methods as well as multi-layered and convolutional neural networks and study differences in behaviors of these algorithms in neighborhood selection for a cross-docking optimization problem.
We focus on solving instances of dock assignment and truck scheduling problem arising within crossdocks.
As this problem is solved on a regular basis at every such facility, we assume that the problem for every day has been generated from the same statistical distribution. Therefore, we use the historical data to learn from and we also expect that the unseen (future) observations will follow a similar distribution.

Type :	:	Article
Thématiques	:	Session "RO et apprentissage" de l'action transverse DAAO
Mots-Clés	:	Logistics ; Cross Docking ; Assignment and Scheduling ; Reinforcement Learning ; Large Neighborhood Search

Personnes connectées : 2

Vie privée | Accessibilité