Reinforcement Learning for UAV Adaptive Conflict Detection and Resolution with VFR-dominated Aircraft

Mingchuan Luo; Thomas Zeh; Hannes Braßel; Martin Lindner; Hartmut Fricke

doi:10.23919/NTCA68808.2026.11524178

Reinforcement Learning for UAV Adaptive Conflict Detection and Resolution with VFR-dominated Aircraft

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Mingchuan Luo - , Professur für Technologie und Logistik des Luftverkehrs (Autor:in)
Thomas Zeh - , Professur für Technologie und Logistik des Luftverkehrs (Autor:in)
Hannes Braßel - , Professur für Technologie und Logistik des Luftverkehrs (Autor:in)
Martin Lindner - , Professur für Technologie und Logistik des Luftverkehrs (Autor:in)
Hartmut Fricke - , Professur für Technologie und Logistik des Luftverkehrs (Autor:in)

Abstract

Integrating Unmanned Aerial Vehicles (UAV) into Visual Flight Rules (VFR) airspace requires Conflict Detection and Resolution (CDR) mechanisms capable of tolerating uncertainty in pilot behavior. This paper proposes a Reinforcement Learning (RL) framework based on Proximal Policy Optimization (PPO), which learns how to avoid collisions with other crewed aircraft by interacting with the surrounding environmental information. We model the CDR problem as a Markov Decision Process (MDP) in which a highly agile UAV must generate efficient actions and safety-prioritized maneuvers against stochastic VFR-dominated aircraft. The UAV is modeled as a quaternion-based 6-degree-of-freedom rigid body model to capture its attitude and trajectory, and the VFR-dominated aircraft is a 3-degree-of-freedom point mass model with envelope-consistent speed. Observations encompass the flight performance of the UAV and VFR-dominated aircraft (such as position, distance, heading, etc.) along with contextual clues (i.e, risk warning indicators and recent avoidance maneuvers). Using PPO, the agent learns a policy that balances hard safety constraints with stable avoidance control. A safety-prioritized reward function severely penalizes violations of separation and stays in risk zone. The trained agent produces consistent, collision-free trajectories and effectively responds to sudden intrusions without manual reprogramming. Findings demonstrate RL's applicability for CDR in mixed airspace: the technique adjusts to VFR-dominated aircraft uncertainties and reliably clears collisions, suggesting that low-altitude airspace operations can be automated.

Details

Originalsprache	Englisch
Titel	2026 New Trends in Civil Aviation (NTCA)
Herausgeber (Verlag)	Institute of Electrical and Electronics Engineers (IEEE)
Seiten	65-73
ISBN (elektronisch)	978-80-01-07450-3
ISBN (Print)	979-8-3315-7779-7
Publikationsstatus	Veröffentlicht - Apr. 2026
Peer-Review-Status	Ja

Externe IDs

ORCID	/0009-0008-9640-3248/work/214455750
ORCID	/0009-0005-7833-7169/work/214455772
ORCID	/0000-0002-1118-3047/work/214455843

Schlagworte

Forschungsprofillinien der TU Dresden

Energie, Mobilität und Umwelt