A Memory-oriented Optimization Approach to Reinforcement Learning on FPGA-based Embedded Systems.

Siva Satyendra Sahoo; Akhil Raj Baranwal; Salim Ullah; Akash Kumar

doi:10.1145/3453688.3461533

A Memory-oriented Optimization Approach to Reinforcement Learning on FPGA-based Embedded Systems.

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Siva Satyendra Sahoo - , Professur für Prozessorentwurf (Prozessor Design) (cfaed) (Autor:in)
Akhil Raj Baranwal - (Autor:in)
Salim Ullah - , Professur für Prozessorentwurf (Prozessor Design) (cfaed) (Autor:in)
Akash Kumar - , Professur für Prozessorentwurf (Prozessor Design) (cfaed) (Autor:in)

Abstract

Reinforcement Learning (RL) represents the machine learning method that has come closest to showing human-like learning. While Deep RL is becoming increasingly popular for complex applications such as AI-based gaming, it has a high implementation cost in terms of both power and latency. Q-Learning, on the other hand, is a much simpler method that makes it more feasible for implementation on resource-constrained embedded systems for control and navigation. However, the optimal policy search in Q-Learning is a compute-intensive and inherently sequential process and a software-only implementation may not be able to satisfy the latency and throughput constraints of such applications. To this end, we propose a novel accelerator design with multiple design trade-offs for implementing Q-Learning on FPGA-based SoCs. Specifically, we analyze the various stages of the Epsilon-Greedy algorithm for RL and propose a novel microarchitecture that reduces the latency by optimizing the memory access during each iteration. Consequently, we present multiple designs that provide varying trade-offs between performance, power dissipation, and resource utilization of the accelerator. With the proposed approach, we report considerable improvement in throughput with lower resource utilization over state-of-The-Art design implementations.

Details

Originalsprache	Englisch
Titel	GLSVLSI 2021 - Proceedings of the 2021 Great Lakes Symposium on VLSI
Seiten	339-346
Seitenumfang	8
Publikationsstatus	Veröffentlicht - 22 Juni 2021
Peer-Review-Status	Ja

Externe IDs

Scopus	85109211240

Schlagworte

Forschungsprofillinien der TU Dresden

Informationstechnologien und Mikroelektronik

ASJC Scopus Sachgebiete

Allgemeiner Maschinenbau

Schlagwörter

energy-efficient computing, fpga, hardware accelerators, high-level synthesis, memory-centric computing

Forschungsportal der TU Dresden