



GEFÖRDERT VOM





INSTITUTE OF NUCLEAR AND PARTICLE PHYSICS

# Development of an FPGA Implementation of Convolutional Neural Networks for Signal Processing for the Liquid-Argon Calorimeter at ATLAS

#### Nick Fritzsche

Institute of nuclear and particle physics (IKTP), TU Dresden

23 March, 2022 DPG Spring Meeting Heidelberg

## The ATLAS Detector at the LHC



https://cds.cern.ch/record/1295244

#### The Large Hadron Collider (LHC)

- 27 km circular collider at CERN/Geneva
- achievements: Higgs-Boson, quark-gluon plasma and many more
- 25 ns spacing between proton bunches (40 MHz)

#### The ATLAS Detector

- inner detector with tracking system
- calorimeters
- muon spectrometer



https://cds.cern.ch/record/1095924

### Signal Readout of the ATLAS LAr Calorimeter



#### LAr Calorimeter

- absorber material (Pb, Cu, W) and electrodes in accordion geometry
- in between liquid argon as active medium

#### Signals

- energy deposits raise a triangular pulse
- shaped into bipolar pulse and digitized
- amplitude proportional to deposited energy
- energy reconstruction using Optimal Filter:  $E_t = \sum_i w_i \cdot S_{t-i}$

# High Luminosity LHC

THE REAL PROPERTY OF



## Digital Signal Processing on Field Programmable Gate Arrays

#### Field Programmable Gate Arrays (FPGAs)

- integrated circuit configurable by the designer after manufacturing
- reconfigurable hardware allows testing of different firmware
- use Intel Stratix 10



https://newsroom.intel.com/editorials/intels-stratix-10-fpga-supporting-smart-connected-revolution https://indico.cern.ch/event/773049/contributions/3474297

#### Advantages of Implementation on FPGAs

- real time data processing with high frequencies (100 MHz 1 GHz)
- parallel data processing
- signal processing algorithm can be reconfigured

#### Requirements

THE REAL PROPERTY OF

- $\bullet$  signal processing algorithm should be designed such that parameters are kept at a minimum ( $\approx$  50-100)
- $\bullet\,$  aim core frequency for Phase II data processing of 480 MHz = 12  $\times\,40\,\text{MHz}$
- minimize latency (< 150 ns): results are input to trigger system
- meet resource limitations of FPGA
- $\longrightarrow$  set pipeline registers as a compromise of the factors above

# Energy Reconstruction by Convolutional Neural Networks

#### Our Approach

2-step convolutional neural network for energy reconstruction





() LEADER

#### Connections subcomponent

- general connection component for all operations between neighboring layers of ANN
  - configurable #inputs, #outputs, activation functions
- multi-layer network component chains *connection* instances

THE REAL PROPERTY OF

- capable to implement different kernel sizes, #feature maps and dilated CNNs
- configurable with file produced after training (json)



## Optimization of calculations: DSP Chain

#### DSP Chain

- DSPs are chained up to accumulate products over whole kernel
- match timing of input ports to process data from multiple cells in one DSP chain instance
- load weights from RAM without recompilation

THE REAL PROPERTY OF





 $\tt https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stratix-10/ug-s10-dsp.pdf$ 

## FPGA Resource Usage

Process 384 to 512 calorimeter cells per FPGA  $\rightarrow$  In total 400 to 550 FPGAs needed

| Single Chan                         | nel           | Time-multiplexed |                                       |               |               |
|-------------------------------------|---------------|------------------|---------------------------------------|---------------|---------------|
|                                     | 3-Conv<br>CNN | 4-Conv<br>CNN    |                                       | 3-Conv<br>CNN | 4-Conv<br>CNN |
|                                     |               | <br>             | Multiplicity                          | 6             | 6             |
| Frequence<br>F <sub>max</sub> [MHz  | -y<br>] 493   | 480              | Frequency<br>F <sub>max</sub> [MHz]   | 344           | 334           |
| Latency<br>clk <sub>core</sub> cycl | es 62         | 58               | Latency<br>clk <sub>core</sub> cycles | 81            | 62            |
| Resource<br>Usage                   | 2             |                  | Max. Channels                         | 390           | 352           |
| DSPs                                | 46<br>0.8%    | 42<br>0.7%       | Resource Usage<br>DSPs                | 46<br>0.8%    | 42<br>0.7%    |
| Adaptive<br>logic modu              | 1les 5684     | 5702<br>0.6%     | Adaptive<br>logic modules             | 14235<br>1.5% | 15627<br>1.7% |

 $\longrightarrow$  Resource-efficient and short latency, but low execution frequency

- integrate CNNs in data processing core of LAr signal processor firmware
- output interfaces to trigger, readout and monitoring path

C INCREMENT



https://gitlab.cern.ch/atlas-lar-be-firmware/LASP/LASP-doc/

#### CNNs for Energy Reconstruction

- electronics of ATLAS LAr Calorimeters will be upgraded for HL-LHC until end of 2028
- harsh environment with up to 200 pile-up events
- energy reconstruction by convolutional neural networks shows promising results
- CNN implementation in FPGA successful
- next steps:
  - integrate CNNs in LAr signal processor data core firmware
  - perform hardware tests in test bench
  - create high level synthesis implementation and compare performance