forked from herolada/tth-mass-reco
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathherolada_bachelor_thesis.tex
1712 lines (1348 loc) · 120 KB
/
herolada_bachelor_thesis.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
%*-*-*-*-*-*-*-HEADER-*-*-*-*-*-*-*-*
%*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
\documentclass{ctuthesis}
\ctusetup{
xdoctype = B,
xfaculty = F3,
mainlanguage = english,
titlelanguage = english,
title-english = {Application of Machine Learning for the Higgs Boson Mass Reconstruction Using ATLAS Data},
title-czech = {Aplikace strojového učení pro odhad hmotnosti Higgsova bosonu z dat detektoru ATLAS},
department-english = {Department of Cybernetics},
author = {Adam Herold},
supervisor = {prof Dr. Ing. Jan Kybic},
supervisor-specialist = {doc. Dr. André Sopczak},
day = 4,
month = 01,
year = 2022,
keywords-czech = {CERN, ATLAS, Higgsův boson, rekonstrukce hmotnosti, neuronové sítě},
keywords-english = {CERN, ATLAS, Higgs boson, Mass reconstruction, Neural networks},
fieldofstudy-english = {Cybernetics and Robotics},
fieldofstudy-czech = {Kybernetika a robotika},
specification-file = {zadani.pdf},
}
\ctuprocess
\usepackage[sorting=none]{biblatex} %Imports biblatex package
\usepackage{cancel}
\usepackage{multirow}
\usepackage{array}
\usepackage{makecell}
\usepackage{bm}
\usepackage{amsbsy}
\usepackage{siunitx}
\usepackage[obeyspaces]{url}
\renewcommand\theadalign{tr}
%\renewcommand\theadfont{\bfseries}
\renewcommand\theadfont{\normalsize}
\renewcommand\theadgape{\Gape[4pt]}
\renewcommand\cellgape{\Gape[4pt]}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\newcommand{\PreserveBackslash}[1]{\let\temp=\\#1\let\\=\temp}
\newcolumntype{C}[1]{>{\PreserveBackslash\centering}p{#1}}
\newcolumntype{R}[1]{>{\PreserveBackslash\raggedleft}p{#1}}
\newcolumntype{L}[1]{>{\PreserveBackslash\raggedright}p{#1}}
\usepackage{float}
\floatstyle{plaintop}
\restylefloat{table}
\addbibresource{bibliography.bib}
\usepackage{algpseudocode}
\usepackage{seqsplit}
%*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
%*-*-*-*-*-*-*-MANDATORY STUFF-*-*-*-*-*-*-*-*
%*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
\ctutemplateset{maketitle twocolumn default}{
\begin{twocolumnfrontmatterpage}
\ctutemplate{twocolumn.thanks}
\ctutemplate{twocolumn.declaration}
\ctutemplate{twocolumn.abstract.in.titlelanguage}
\ctutemplate{twocolumn.abstract.in.secondlanguage}
\ctutemplate{twocolumn.tableofcontents}
\ctutemplate{twocolumn.listoffigures}
\end{twocolumnfrontmatterpage}
}
\begin{abstract-english}
This thesis deals with the reconstruction of the Higgs boson mass decaying in the $2lSS + 1 \tau _{had}$ channel in the $t\bar{t}H$ production. Based on the reconstructed mass, the goal is to separate the signal from background productions such as the $t\bar{t}Z$.
The data created by the full ATLAS detector simulation are used to develop two neural networks. First, a classification neural network that organizes the data by assigning detected particles to corresponding positions in the channel.
Second, a regression neural network that reconstructs the mass of the Higgs boson. The developed neural network is then tested on different data selections and is shown to outperform the Missing Mass Calculator technique.
Finally, the neural network is tested on real ATLAS data.
\end{abstract-english}
\begin{abstract-czech}
V této práci se zabýváme rekonstrukcí hmotnosti Higgsova bosonu v rozpadovém kanálu $2lSS + 1 \tau _{had}$ v produkci $t\bar{t}H$. Na základě rekonstruované hmotnosti separujeme signál od pozadí, kterým je například produkce $t\bar{t}Z$.
Na datech ze simulace detektoru ATLAS vyvineme dvě neuronové sítě. Nejprve klasifikační neuronovou síť, která data uspořádává přiřazením částic do jednotlivých pozic v kanále.
Poté neuronovou síť, která rekonstruuje hmotnost Higgsova bosonu. Tuto síť testujeme na různých selekcích dat a ukazujeme, že dosahuje lepších výsledků než technika Missing Mass Calculator.
Na závěr je proveden test na skutečných datech z detektoru ATLAS.
\end{abstract-czech}
\begin{thanks}
Thank you to my supervisor, Dr. André Sopczak, who has given me a lot of his time and advice for which I am grateful. Thank you to my supervisor, prof. Jan Kybic, who has consulted my work with me. And thank you to my family, especially my mom, whom I love dearly.
\end{thanks}
\begin{declaration}
I declare that the presented work was developed independently and that I have listed all sources of information used within it in accordance with the methodical instructions for observing the ethical principles in the preparation of university theses.
\medskip
Prague, \monthinlanguage{title} \ctufield{day}, \ctufield{year}
\vspace*{3cm}
Prohlašuji, že jsem předloženou práci vypracoval samostatně a že jsem uvedl veškeré použité informační zdroje v souladu s Metodickým pokynem o dodržování etických principů při přípravě vysokoškolských závěrečných prací.
\medskip
V Praze, \ctufield{day}.~\monthinlanguage{second}~\ctufield{year}
\end{declaration}
%*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
%*-*-*-*-*-*-*-MAIN PART OF THESIS-*-*-*-*-*-*-*-*
%*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
\begin{document}
%pkg-biblatex = true
\maketitle
%*-*-*-*-*-*-*-INTRODUCTION-*-*-*-*-*-*-*-*
\chapter*{Introduction}
In the ATLAS detector, the Higgs boson can be produced alongside a pair of top quarks. As the Higgs boson is short-lived, it decays before it can be detected \cite{higgs_3}. The decay products of the Higgs boson include visible particles such as quark jets and leptons but also the undetectable neutrinos, which make the reconstruction of the Higgs boson and its mass a challenging task.
While the Higgs boson mass is known to be $125.18 \pm 0.16$ GeV \cite{W_Z_decay}, the reconstruction of its mass can help us separate events, in which it is created, from background events in which Z boson, W boson or different particles are produced instead of the Higgs boson.
The Higgs boson, alongside the two top quarks, can decay in many different channels, and we will be focusing on a particular one — the $2lSS + 1 \tau _{had}$ channel in which two same-charged leptons and one hadronical tau candidate are produced. With the decay narrowed down, we will first assign the detected jets and leptons to the Higgs or one of the top quarks to make the data structured. Then we reconstruct the mass from the organized data.
Both of these tasks will be done using machine learning — in particular, neural networks. Our goal will be to develop such neural networks that will allow us to reconstruct the mass of the Higgs boson and separate it from the background.
%*-*-*-*-*-*-*-THEORETICAL BACKGROUND-*-*-*-*-*-*-*-*
\chapter{Theoretical background}
\section{CERN}
CERN (from French \emph{Conseil Européen pour la Recherche Nucléaire}\footnote{In English \emph{European Council for Nuclear Research}}) is an organization focused on research in fundamental physics most notably through the usage of their world-class particle accelerator facilities \cite{cern1}. It was established in the 1950s and since then has been a great contributor to the world of physics and science \cite{cern2}.
In 2008 the Large Hadron Collider (LHC) started up and to this day it remains the largest and most powerful particle accelerator in the world \cite{lhc1}. It consists of a two-ring hadron accelerator and collider built in a 27 km long tunnel and it is designed for proton beams collisions with a centre-of-mass energy of 14 TeV \cite{lhc2}. The schematic of the LHC is in Figure \ref{lhc_schematic}.
\begin{figure}[h]
\centering{
\resizebox{100mm}{!}{\includegraphics{images/lhc_schematic.png}}
\caption[Schematic of the LHC]
{Schematic of the LHC \par \small Schematic showing detectors CMS, LHCb, ATLAS and ALICE. Also showing other CERN accelerators — the Proton Synchrotron (PS) and the Super Proton Synchrotron (SPS). Figure modified from source \cite{lhc_schematic}.}
\label{lhc_schematic}
}
\end{figure}
\subsection{ATLAS}
There are eight experiments operating at the LHC, focusing on different particles and using different detectors. The two largest experiments are ATLAS (A Toroidal LHC Apparatus) and CMS (Compact Muon Solenoid), both being independent general-purpose detectors \cite{atlas1}.
The ATLAS detector is 44 meters long and 25 meters in diameter. Around 1 billion proton-proton collisions (events) occurs each second inside of the ATLAS detector. Each event produces multiple particles which are then detected by one of the many sensors of the detector. These measured events are then filtered by a hardware and a software trigger to a rate of around 2000 \emph{interesting} events per second. The ATLAS detector measures properties of the particles such as their direction, momentum, charge, energy and the particle type \cite{atlas2}.
%\textcolor{red}{Tady můžou být detailněji rozebrány části ATLAS detektoru. https://atlas.cern/resources/fact-sheets}
\begin{figure}[h]
\centering{
\resizebox{125mm}{!}{\includegraphics{images/atlas_detector.png}}
\caption[Model of the ATLAS detector with its distinct layers]
{Model of the ATLAS detector with its distinct layers \par \small Source \cite{atlas_image}.}
\label{atlas_detector_schematic}
}
\end{figure}
\section{Standard Model Particles}
"\emph{The Standard Model of particle physics is the theory used to describe the interactions of fundamental particles (or fermions) and fundamental forces (which are conveyed by particles called bosons)} \cite{standard_model}."
The Standard Model further divides fermions into quarks and leptons, each fermion also has an antimatter counterpart with opposite charge but otherwise same properties.
%\textcolor{red}{Možnost odstavce o historii, nejúspěšnější model, není plně ověřený, nezahrnuje %gravitaci...https://home.cern/science/physics/standard-model}
\begin{figure}[h]
\centering{
\resizebox{120mm}{!}{\includegraphics{images/standard_model.png}}
\caption[Particles of the Standard Model]{Particles of the Standard Model \par\small Source \cite{standard_model_image}.}
\label{standard_model_image}
}
\end{figure}
\subsection{Quarks}
\label{top_decay}
There are six different flavors of quarks: up (u) and down (d), charm (c) and strange (s), top (t) and bottom (b). In this thesis, we will differentiate between top, bottom and other quarks (abbreviated as \emph{non-b} quarks).
The top quark is heavy enough to decay into a W boson and a b quark, which is the dominant channel. The W boson then decays either into a pair of quarks or a lepton and a neutrino \cite[p. 638]{pdg_review}:
$$t \rightarrow W^{+} \: b \rightarrow q \: \overline{q}' \: b$$
$$t \rightarrow W^{+} \: b \rightarrow \ell^{+} \: \nu_{\ell} \: b$$
Quarks are never observed directly in the detector. Instead they are detected as sprays of hadrons called \emph{jets}. Besides quarks, there is another source of jets, which is gluons. Discriminating between quark and gluon jets is a complex task and a focus of studies at ATLAS and CMS \cite{jets_1}\cite{jets_2}.
\subsection{Leptons}
In the Standard Model, we differentiate between six leptons: electron (e), muon ($\mu$), tau ($\tau$) and their corresponding neutrinos ($\nu$). In this thesis, we will use a different nomenclature, where when referring to leptons (symbol $\ell$), we will only consider the two light leptons, electron and muon.
Much like quarks, taus are also not detected directly, as they have a short lifespan ($2.8\times 10^{-13}$ seconds) and decay into a tau neutrino and a virtual W boson, which then decays either \emph{leptonically}:
\begin{equation}\tau^{-} \rightarrow W^{-}\: \nu_{\tau} \rightarrow \ell^{-} \:\overline{\nu}_{\ell}\: \nu_{\tau}
\end{equation}
or \emph{hadronically}:
\begin{align}
\begin{split}
\tau^{-} &\rightarrow W^{-}\: \nu_{\tau} \rightarrow h^{-} \:\nu_{\tau} \\
\tau^{-} &\rightarrow W^{-}\: \nu_{\tau} \rightarrow h^{-}\:h^{+}\:h^{-} \:\nu_{\tau}
\end{split}
\end{align}
where $h$ is a hadron \cite{taus_1}\cite{taus_2}. In the case of the leptonic decay, it is a non-trivial task to associate the detected lepton to either the tau decay or a different decay process (e.g. leptonic decay of a top quark).
\subsection{Neutrinos}
Neutrinos do not have charge, are nearly massless and very hard to detect, as they only interact weakly \cite{neutrinos}. In ATLAS, neutrinos are not detected and, as such, are a source of missing energy in the detected decay process.
Because of this missing energy, it is a challenging task to reconstruct the mass of any particle with a neutrino as one of its decay products.
\subsection{Representation of Particles}
\label{particles_representation}
In this thesis, particles will be represented in two interchangeable ways, both of them being a vector of four values, that fully describes the particle kinematics.
First is the ($p_T$, $\eta$, $\phi$, $E$)$^T$ vector, where $p_T = |\vec{p}_T| = |(p_X, p_Y)^T|$ is the transverse momentum, $\eta$ is pseudorapidity, $\phi$ is azimuthal angle and $E$ is energy. This is the representation in which the data of the particles is stored.
Second is the momentum and energy vector ($p_X$, $p_Y$, $p_Z$, $E$)$^T$, also called the four-vector, where ($p_X$, $p_Y$, $p_Z$)$^T$ is the momentum of the particle in Cartesian coordinates. The variables in this representation will be used in the NN. This representation also has one considerable advantage, which is the fact that momenta and energy are conserved according to the laws of conservation. This allows us to add children particles together to create their parent particle. For example, we can write the following equation for the Higgs boson:
\begin{equation}
H = \tau^+ + \tau^-,
\end{equation}
where $H$, $\tau^+$ and $\tau^-$ are the four-vectors representing those particles.
The relations allowing us to switch between the two mentioned representations without losing information are the equations \cite[p.26]{decay_channel_image}
\begin{equation}
\begin{aligned}
p_X = p_T\cdot \cos \phi, \\
p_Y = p_T\cdot \sin \phi, \\
p_Z = p_T\cdot \sinh \eta.
\end{aligned}
\label{goniometrix}
\end{equation}
\subsection{Invariant mass and angular distance}
\label{mass_deltar_equations}
There are two more particle characteristics we will be using — invariant mass ($m_0$) and angular distance ($\Delta R$).
The invariant mass is a mass independent of the reference frame of the particle's momenta and energies \cite{invariant_mass}. It is calculated from the momenta and energy as \cite[p.26]{decay_channel_image}
\begin{equation}
\label{masses_equation}
m_{0} = \sqrt{E^2 - p_x^2 - p_y^2 - p_z^2}.
\end{equation}
Approximate invariant masses of notable particles are in Table \ref{masses} \cite{W_Z_decay}.
\begin{table}[h]
\begin{ctucolortab}
\begin{tabular}{ R{4cm} R{4cm} }
\toprule
Particle &Invariant mass \\
\midrule
Higgs boson &125.18 GeV \\
Z boson &91.19 GeV \\
W boson &80.38 GeV \\
Tau &1.78 GeV \\
Neutrinos &0.00 GeV \\
\bottomrule
\end{tabular}
\end{ctucolortab}
\caption{Approximate invariant masses of notable particles}
\label{masses}
\end{table}
The angular distance between two particles is the angle between their momentum vectors. It is calculated from the difference between their respective $\eta$ and $\phi$ as \cite[p.22]{decay_channel_image}
\begin{equation}
\Delta R = \sqrt{(\Delta \eta)^2 + (\Delta \phi)^2}
\end{equation}
%The transverse energy can be computed from the transverse momentum vector and the mass of a particle %as \cite{transverse_energy} \begin{equation}
% \vec{E_T} = E\frac{\vec{p}_T}{|\vec{p}|}
%\end{equation}
\subsection{Missing transverse energy}
The missing transverse energy is a characteristic of a whole event. It stems from the fact that the two protons which collide in the detector come along the beam pipe and thus only have non-zero momentum in the z-axis
\[\vec{p_p} = (0,0,p_Z)^T \;\;\;\; p_Z \neq 0.\]
After their collision the total momentum has to be conserved, meaning that the sum of the momentum vectors of all the particles that are created in the collision has to be equal to the sum of the two proton momentum vectors
\[\sum_{created} \vec{p} = \vec{p_{p_1}} + \vec{p_{p_2}}.\]
In reality, when we sum the momentum vectors of all particles detected in an event, the sum will usually have the x and y component non-zero:
\[ \sum_{detected} \vec{p} = (p_X,p_Y,p_Z)^T \;\;\;\; p_X,p_Y \neq 0.\]
This can be attributed to particles that escaped the detector undetected — notably the undetectable neutrinos. We then define the missing transverse energy as \cite{missing_transverse_energy}
\begin{equation}
\cancel{\mathbf{E}}_T = (E_{T_X}, E_{T_Y})^T = -\sum_{detected} \vec{p}_T,
\label{met_equation}
\end{equation}
where $detected$ symbolizes the set of all detected particles.
\section{Higgs Boson}
To explain why the carriers of the weak nuclear interaction — the W and Z bosons — have masses, while in theory, they should be massless, in 1964, the Brout-Englert-Higgs (BEH) mechanism was proposed. This mechanism required a new, yet-to-be-discovered field and its associated particle — the Higgs boson \cite{higgs}.
On 4 July 2012, the Higgs boson has been confirmed by the ATLAS and CMS experiments at CERN, when a new particle with a mass around 125 GeV was observed \cite{higgs}.
Since then, more experiments and studies have been carried out to further explore its properties.
\subsection{Production and Decay Channels}
\label{signal_background}
As our primary focus is reconstruction of the Higgs boson mass, we will be distinguishing between \emph{signal} and \emph{background} events. Signal events are the ones in which Higgs boson is produced and background events are the ones where either a Z boson, W boson or other particles are produced instead.
\subsubsection{Signal events}
The Higgs boson is produced in the LHC mainly through gluon fusion (ggF) or vector boson fusion \cite{higgs_2}. Along with the Higgs, there are often other particles produced. In this thesis, we focus on the case, where the Higgs is produced together with a pair of top quarks (\emph{$t\overline{t}H$ production}):
$$gg \rightarrow t\: \overline{t} \: H.$$
This is a production that has been observed recently in 2018 \cite{ttH_observation}.
The Higgs boson has a very short lifetime ($1.6\times 10^{-22}$ seconds) thus it decays before it can be detected \cite{higgs_3}. As the Higgs boson has a great invariant mass (approx. 125 GeV) it can decay into a pair of bosons or a pair of fermions, for example \cite{higgs_2}:
\begin{align}
\begin{split}
H &\rightarrow b\: \overline{b}, \\
H &\rightarrow W\: W, \\
H &\rightarrow \tau\: \tau.
\end{split}
\end{align}
The last mentioned channel will be the one we will be focusing on.
As the taus also decay (leptonically or hadronically), we will narrow our decay channel even more to a channel with two same charge leptons and a hadronically decaying tau (the $2\ell SS + 1 \tau_{had}$ channel) in its final state\footnote{Final state particles are the ones that are detected by ATLAS.}.
Lastly, we will narrow down the decay of the top quarks to the \emph{lepton+jets} case (Sec. \ref{top_decay} — one top decays into a pair of quarks and the other one into a lepton and a neutrino, both top decays also include a b quark):
\begin{equation}
\label{tt_decay}
t\:\overline{t} \rightarrow W^{+} \: b \: W^{-} \: \overline{b} \rightarrow q \: \overline{q}' \: b \: \ell'^{-} \: \overline{\nu}_{\ell} \: \overline{b}.
\end{equation}
An example\footnote{There can be slight differences on a case by case basis, such as a permutation of the top and anti-top pair decay or the permutation of the positively and negatively charged tau pair decay.} Feynman diagram of this channel is shown in Fig. \ref{decay_channel}.
\begin{figure}[h]
\centering{
\resizebox{126mm}{!}{\includegraphics{images/decay_channel.png}}
\caption[Diagram of the $2\ell SS + 1 \tau_{had}$ decay channel]{Diagram of the $2\ell SS + 1 \tau_{had}$ decay channel \par \small The final state particles on the right side of the diagram are the ones detected, except for the undetectable neutrinos. Figure modified from source \cite[p.23]{decay_channel_image}.}
\label{decay_channel}
}
\end{figure}
\subsubsection{Background events}
The Z and W bosons are produced in quark-antiquark annihilations in the LHC \cite{W_Z_production}. Similarly to the Higgs boson, they are also produced alongside a pair of top quarks (\emph{$t\overline{t}H$ production}):
\begin{align}
\begin{split}
q + \overline{q} &\rightarrow t\: \overline{t} \: Z, \\
q + \overline{q} &\rightarrow t\: \overline{t} \: W.
\end{split}
\end{align}
Another similarity between the Higgs boson and the W and Z bosons is their short lifetime (approx. $3 \times 10^{-25}$ seconds), which means they also decay before they can be detected.
The Z boson can decay into a lepton-lepton pair, a neutrino-neutrino pair, a tau-tau pair or into hadrons:
\begin{align}
\begin{split}
Z &\rightarrow \ell\: \overline{\ell}, \\
Z &\rightarrow \nu\: \nu, \\
Z &\rightarrow \tau\: \tau.
\end{split}
\end{align}
The W boson, on the other, hand mainly decays into a lepton-neutrino pair, a tau-neutrino pair or into hadrons \cite{W_Z_decay}:
\begin{align}
\begin{split}
W &\rightarrow \ell\: \nu_{\ell}, \\
W &\rightarrow \tau\: \nu_{\tau}, \\
W &\rightarrow \text{hadrons}.
\end{split}
\end{align}
The important decay mode here is the tau-tau pair, which is available for the Higgs and Z bosons, but not for the W boson. This means that the diagram \ref{decay_channel} can be also used to describe the decay of the $t\overline{t}Z$ (with the replacement of H for Z in the diagram), but it cannot be used for the $t\overline{t}W$.
Other background productions exist such as $t\overline{t}$. All these background productions can decay in the $2\ell SS + 1 \tau_{had}$ channel and as such cannot be easily separated from the signal. The separation of signal and background will be part of our task.
\subsection{Missing Mass Calculator}
\label{sec:mmc}
Methods for reconstruction of the $\tau\: \tau$ mass exist, and we will focus on one of them — the Missing Mass Calculator (MMC) — which outperforms other common methods \cite[p.18]{mmc_paper}.
The technique first assumes perfect detector resolution and no neutrinos outside of the $\tau\: \tau$ decay. We then have seven (for the case where one tau decays hadronically and one leptonically) unknowns: $p_X$, $p_Y$, $p_Z$ and $m$\footnote{Which is another representation of a particle, with mass instead of energy, similar to the ones we have described in Sec. \ref{particles_representation}.} for the invisible product of each of the two taus. For the hadronic tau the invisible product is just one neutrino, so we can put its $m=0$, reducing the eight variables to seven. For these seven unknowns we have the following four momentum and mass conservation equations \cite[p.5-6]{mmc_paper}:
\begin{align}
\begin{split}
\cancel{E}_{T_X} = p_{mis_1} \sin\theta_{mis_1} \cos \phi_{mis_1} + p_{mis_2} \sin \theta_{mis_2} cos\phi_{mis_2}, \\
\cancel{E}_{T_Y} = p_{mis_1} \sin\theta_{mis_1} \sin \phi_{mis_1} + p_{mis_2} sin \theta_{mis_2} \sin\phi_{mis_2}, \\
M^2_{\tau_1} = m^2_{mis_1} + m^2_{vis_1} + 2\sqrt{p^2_{vis_1}+m^2_{vis_1}}\sqrt{p^2_{mis_1}+m^2_{mis_1}} \\- 2 p_{vis_1}p_{mis_1}\cos \Delta\theta_{vm_1}, \\
M^2_{\tau_2} = m^2_{mis_2} + m^2_{vis_2} + 2\sqrt{p^2_{vis_2}+m^2_{vis_2}}\sqrt{p^2_{mis_2}+m^2_{mis_2}} \\- 2 p_{vis_2}p_{mis_2}\cos \Delta\theta_{vm_2},
\end{split}
\label{mmc_eqs}
\end{align}
where $mis$ and $vis$ symbolize the invisible and visible tau products respectively, $M_\tau = 1.777$ GeV as per Table \ref{masses} and $\Delta\theta_{vm_i}$ is the angular distance between the visible and invisible product of the $i$-th tau \cite[p.6]{mmc_paper}.
With seven unknowns and four equations, this is an under-constrained system and as such it does not have one exact solution. From all possible solutions, the MMC chooses the most likely one. It finds it with the help of additional information, such as "...\emph{the expected angular distance between the neutrino(s) and the visible decays products of the $\tau$ lepton}." \cite[p.6]{mmc_paper} The probability density function of such angular distance is obtained from simulated data \cite[p.7]{mmc_paper}.
The MMC will serve as a comparison to our mass reconstruction method. A very important thing to note is that the assumption of no neutrinos outside of the $\tau\: \tau$ decay is not satisfied in our decay channel (a neutrino is coming from the anti-top branch in Fig. \ref{decay_channel}). This results in a drop in efficiency of the MMC, but it does not make it unusable, as the MMC tries to mitigate the effects of resolution in the measurement of $\cancel{E}_T$ \cite[p.10]{mmc_paper} and the outside neutrino could be viewed as a source of larger resolution.
Lastly, the MMC reconstructs the $\tau\: \tau$ mass, therefore it is only applicable to events with:
\begin{align}
\begin{split}
Z &\rightarrow \tau\: \tau, \\
H &\rightarrow \tau\: \tau,
\end{split}
\end{align}
and not at all to the $t\overline{t}W$ or $t\overline{t}$ production.
\section{Artificial neural networks}
\emph{"Artificial neural networks are popular machine learning techniques that simulate the mechanism of learning in biological organisms."} \cite[p.2]{data_augmentation} The building stone of a neural network (NN) is a neuron. In the artificial neural network (ANN) the neuron is represented by a computational unit, which takes weighted signals as input \cite[p.3]{data_augmentation}, processes them and outputs another signal, which can then serve as an input for other neurons, creating a network. The processing of the inputs will in our case be their addition and subsequent use of an activation function:
\begin{equation}
y = \Phi(\mathbf{w}^T \mathbf{x}).
\end{equation}
To create a NN, the neurons are formed into layers. Besides an input and output layer, there will be other intermediate hidden layers \cite[p.5]{data_augmentation}. In the NNs we will be using, the hidden layers will always be fully connected layers, meaning that each neuron takes as an input the output of each neuron in the preceding layer:
\begin{equation}
y_{i_j} = \Phi(\mathbf{w}_j^T \mathbf{y}_{i-1}),
\end{equation}
where $i$ is the index of the layer and $j$ is the index of the neuron in the layer.
Neural networks can be used for different purposes. The two that will be relevant for us are a regression neural network (rNN\footnote{Not to be confused with the recurrent neural network.}), which predicts one or multiple numerical values (e.g. predicting the mass of a particle), and a classification neural network (CNN), which predicts into which of several known categories does the data sample from which the input was generated belong to (e.g. categorization of a particle into quarks and leptons).
\subsection{Activation functions}
Activation functions are used at the output of neurons. There are many different options, the ones we will be using are following (also illustrated in Fig. \ref{fig:activation_functions}).
\begin{itemize}
\item Linear function calculated as \cite[p.13]{data_augmentation}
\begin{equation}
\Phi(v) = v.
\end{equation}
\item Rectified linear unit function (ReLU) calculated as \cite[p.14]{data_augmentation}
\begin{equation}
\Phi(v) = \max\{v,0\}.
\end{equation}
\item Sigmoid function calculated as \cite[p.13]{data_augmentation}
\begin{equation}
\Phi(v) = \frac{1}{1+e^{-v}}.
\end{equation}
\end{itemize}
\begin{figure}[h]
\centering{
\resizebox{127mm}{!}{\includegraphics{images/activations.pdf}}
\caption{Activation functions}
\label{fig:activation_functions}
}
\end{figure}
\subsection{Loss function}
In the training process of a neural network, two phases can be distinguished. First is the forward phase, when the inputs are processed by the neural network to produce outputs.
In the second — backward — phase, the loss function takes in the outputs of the neural network (i.e. the predictions) and the desired outputs (i.e. the truth) and calculates a score that quantifies the quality of performance of the neural network. The neural network then backpropagates by computing the gradient of the loss function with respect to the weights of the layers. Finally, the weights are updated to minimize the loss \cite[p.22]{data_augmentation}\cite{backpropagation}:
\begin{equation}
\mathbf{w}' = \mathbf{w} - \alpha \cdot (\frac{\partial \mathcal{L}}{\partial \mathbf{w}})^T,
\end{equation}
where $\alpha$ is the learning rate.
The choice of the loss function defines to some extent the functionality of the neural network. %For example, an rNN and a CNN could have the same architecture except for the loss function, which would make them work as expected.
For the regression a typical loss function is the mean squared error (MSE) \cite[p.176]{data_augmentation}:
\begin{equation}
\mathcal{L} = \frac{1}{n}\sum_{k=1}^{n}({y}_k-{\hat{y}}_k)^2.
\end{equation}
For the classification into one of two classes (binary classification) the binary cross entropy is used \cite{binary_cross_entropy}.
\begin{equation}
\mathcal{L} = -\frac{1}{n}\sum_{k=1}^{n} {y}_k\cdot \log(\hat{y}_k) + (1-y_k)\cdot log(1-\hat{y}_k).
\label{bce}
\end{equation}
%*-*-*-*-*-*-*-DATA-*-*-*-*-*-*-*-*
\chapter{Data}
\section{Analysis levels}
\label{analysis_levels}
When it comes to the signal and background productions, there are two different levels of data on which we can study these events:
\begin{itemize}
\item \textbf{Real data} measured by the ATLAS detector.
\item \textbf{Generated data} produced by a program (e.g. the Monte Carlo generator PYTHIA \cite{pythia}). It can be further split into two levels:
\begin{itemize}
\item \textbf{Event generator data} (also called \emph{truth} data). It contains full information ($p_T$, $\eta$, $\phi$ and $E$) of each particle from the event as well as its \emph{children} and \emph{parent}\footnote{Children meaning particles it decays into and parent meaning the particle from which it came.} relations with other particles.
\item \textbf{Full ATLAS detector simulation} level of data. At this level, a program takes a generated event and aims to produce data similar to how it would be measured by the real ATLAS detector. The effects this has on the data will be presented in the next section. It is this level of data on which we will study the reconstruction of the Higgs boson mass before applying it to the real data.
\end{itemize}
\end{itemize}
\section{Detector effects}
\label{detector_effects}
The data produced by the detector simulation contains less information than we have on the event generator level. This is caused by different effects that are taking place in the detector.
The first group of effects stems from the physical nature of the decay process:
\begin{itemize}
\item We only detect final state particles (i.e. particles without children in Fig. \ref{decay_channel}).
\item Neutrinos are not detected.
\item We do not know the parents of the detected particles. That is, if we detect two leptons in an event, we do not know which lepton is coming from the \emph{Higgs branch}\footnote{Higgs branch meaning decay products of the Higgs and their further decay products.} and which is coming from the \emph{top branch}.
\item Quarks are detected as jets.
\item More jets than four can be detected, because of gluon jets.
\item Less jets than four can be detected, because two or more jets can overlap and be detected as one.
\end{itemize}
The second group consists of effects associated with the imperfection of the detector:
\begin{itemize}
\item There are no sensors in the beam pipe\footnote{The pipe through which the beam of protons travels in Fig. \ref{atlas_detector_schematic}.} and particles produced in the decay can escape the detector's sensors in this direction.
\item Particles in the detector can overlap with each other and can be misidentified or one object can be identified as multiple particles. To avoid the latter, there is an overlap-removal procedure, where only certain particles are kept, as part of the detector simulation data processing \cite{overlap_removal}, so in our case this is taken care of, but still it remains a source of more uncertainty and error between the event generator and detector simulation data.
\item Detector resolution defined by the ATLAS online glossary as the "\emph{measure of the accuracy of a detector measurement, e.g. of energy or spatial position}" \cite{detector_resolution}.
\end{itemize}
These effects present obstacles in the process of mass reconstruction. The above is not necessarily a complete list, rather a list of effects which were encountered during the work on this thesis.
\section{ROOT}
ROOT is a framework for working with data created at CERN. Natively, it can be used to write and run C++ programs with the built-in Cling interpreter, but it can also be used with Python through the PyROOT library. It can be used for storing and accessing data in a tree-like structure and plotting or other processing of data \cite{about_root}.
An example of the structure of data stored in a ROOT file is on Figure \ref{ROOT_structure}.
\begin{figure}[h]
\centering{
\resizebox{115mm}{!}{\includegraphics{images/root_diagram.pdf}}
\caption{Root file diagram}
\label{ROOT_structure}
}
\end{figure}
\section{Provided datasets and selections}
\label{datasets_selections}
In this thesis two datasets provided by the ATLAS collaboration were used.
The first one contains data of the full ATLAS detector simulation (\emph{detector available}) as well as the event generator (\emph{truth}) level. It consists of multiple ROOT files each containing events from either $t\overline{t}H$, $t\overline{t}Z$, $t\overline{t}W$ or $t\overline{t}$ production. The total number of available events in each of the named productions is in the \emph{All events} column of Table \ref{num_events}. The individual file of these productions are then further separated based on the decay of the top quarks — as mentioned in Section \ref{signal_background}, we will be mostly using the data of the \emph{lepton+jets} decay (Eq. \ref{tt_decay}) of the top quarks.
The events in the dataset were generated with a Monte Carlo generator and the detector simulation is based on Geant4 \cite{geant}.
As discussed in Section \ref{signal_background}, in this thesis we are focusing on the $2lSS + 1 \tau _{had}$ decay channel. Each event contains a detector available boolean variable indicating whether the event is of this specific channel. This variable is computed by requiring the event data to meet some conditions and as it relies on the detected objects, it is not always correct. In addition to that, we will be making a selection of events with at least three detected jets and at least one detected b jet, which is also made based on detector available variables. Making these selections allows for the data that we work with to be more consistent.
The number of events after the application of the selections is in Table \ref{num_events}.
\begin{table}[h]
\begin{ctucolortab}
\begin{tabular}{ R{2.5cm} | R{2.5cm} R{2.5cm} R{3.2cm} }
\toprule
Production &All events &Selected events &Percentage selected \\
\midrule
$t\overline{t}H$ &1 055 628 &73 741 &6.99\% \\
$t\overline{t}Z$ &1 894 217 &32 108 &1.69\% \\
$t\overline{t}W$ &614 984 &13 295 &2.16\% \\
$t\overline{t}$ &252 225 &6 027 &2.39\% \\
\midrule
Total &3 819 054 &125 171 &3.28\% \\
\bottomrule
\end{tabular}
\end{ctucolortab}
\caption[Number of events]{Number of events \par\small Number of all events for each production and number of selected events. The selection requires the event to have the $2lSS + 1 \tau _{had}$ decay channel tag and to have at least three detected jets with at least 1 b jet. These selections are based on detector available variables.}
\label{num_events}
\end{table}
In the next chapter additional selections based on truth information will be introduced on top of the ones mentioned here.
%The numbers in \textcolor{red}{the second row} of Table \ref{num_events} are equal to the number of events we will be working with as we do not require any further criteria to be met.
The selected events were further split into three separate datasets for training, validation and testing of the neural networks. The ratio of the split was 0.8, 0.1 and 0.1 respectively.
The second dataset contains real ATLAS data and as such it cannot be separated into the different productions, but instead it contains all of them. What can still be used, are the detector available selections of $2lSS + 1 \tau _{had}$ channel and of the required number of three detected jets with one b jet.
The number of expected events of the different productions with this selection can be approximated and the value are listed in Table \ref{production_ratios}.
\begin{table}[h]
\begin{ctucolortab}
\begin{tabular}{ R{4cm} R{4cm} }
\toprule
Production &Events \\
\midrule
$t\overline{t}H$ &22.8 \\
$t\overline{t}Z$ &18.2 \\
$t\overline{t}W$ &24.8 \\
$t\overline{t}$ &22.2 \\
Other &17.9 \\
\midrule
Total &105.9 \\
\bottomrule
\end{tabular}
\end{ctucolortab}
\caption[Number of expected events of productions in real ATLAS data]{Number of expected events of productions in real ATLAS data \par\small The values were provided by my supervisor.}
\label{production_ratios}
\end{table}
The real data was produced in the years 2015-2018. The number of events obtained each year is in Table \ref{}.
\begin{table}[h]
\begin{ctucolortab}
\begin{tabular}{ R{4cm} R{4cm} }
\toprule
Year of production &Events \\
\midrule
2015 &\num{4.04e+06} \\
2016 &\num{4.04e+07} \\
2017 &\num{5.28e+07} \\
2018 &\num{6.93e+07} \\
\midrule
Total &\num{1.63e+08} \\
\bottomrule
\end{tabular}
\end{ctucolortab}
\caption[Number of expected events of productions in real ATLAS data]{Number of expected events of productions in real ATLAS data \par\small The values were provided by my supervisor.}
\label{production_ratios}
\end{table}
\subsection{Used variables}
Here is an overview of the ROOT variables we will be using in the mass reconstruction and the particle assignment process. The chosen variables are closely related to the $2lSS + 1 \tau _{had}$ decay channel, specifically to the final state particles in this decay (see Fig. \ref{decay_channel}).
The detector available variables we will be using are the following:
\begin{itemize}
\item The $2lSS + 1 \tau _{had}$ decay channel tag and variables indicating the number of detected jets and b jets.
\item Four-vectors of up to eight jets (the number differs in each event based on how many have been detected by the detector).
\item For each jet a \emph{b-tag} which indicates how likely is the jet coming from a b quark based on characteristics such as "\emph{large mass}" or "\emph{significant lifetime}" \cite{b-tags}.
\item Four-vectors of two leptons.
\item Four-vector of a hadronically decayed tau.
\item The decay mode of the hadronically decaying tau (either 1-prong or 3-prong\footnote{Decays into 1 or 3 charged particles.}).
\item Missing transverse energy characterized by its energy and azimuthal angle.
\item The sum of the total visible transverse energy (a scalar variable).
\end{itemize}
Variables listed above are stored as scalars (e.g. each four-vector is stored in the form of four separate scalar variables).
The truth data on the other hand includes information of each particle occurring in an event. It is stored in the form of eight vectors for each event containing pt, eta, phi, E, ID, particle type, parent and children relations. The last mentioned is a vector of vectors as a particle can decay into multiple particles. Each particle is represented by an index at which the vectors can be accessed to obtain the particle's information.
%*-*-*-*-*-*-*-METHODS-*-*-*-*-*-*-*-*
\chapter{Proposed methods}
\section{Task at hand}
\label{task_at_hand}
The task at hand is to reconstruct the mass of the Higgs boson from the data produced by the ATLAS detector simulation on a per-event basis. First, the data has to be preprocessed — jets and leptons have to be assigned to their corresponding positions in the decay, so that the data is organized. Figure \ref{pipeline_main} shows a simplified diagram of the described task.
\begin{figure}[h]
\centering{
\resizebox{110mm}{!}{\includegraphics{images/pipeline_main.pdf}}
\caption{Simplified diagram of the task pipeline}
\label{pipeline_main}
}
\end{figure}
For both the particle assignment and the mass reconstruction a neural network will be used. For the first mentioned, a classification NN based approach inspired by the paper on jet-parton assignment in $t\overline{t}H$ events cited at \cite{parton_assignment}, and for the latter a regression NN. For the rNN, three loss functions will be developed, each representing a different approach to the mass reconstruction.
We will be training and testing the NNs on three selections of the simulated data, which are based on truth information and are illustrated in Figure \ref{data_selections}. More precisely, the particle assignment NN will be trained only on the \emph{Narrow selection} in (a) in the figure, because it requires a consistent structure of the decay across all events, which is precisely what the Narrow selection ensures. A mass reconstruction NN will then be trained and tested on each of the three selections.
The choice of selections (a) and (b) is based on having the narrowed down decay of the $t\overline{t}H$ and $t\overline{t}Z$, that allows us to rely on the exact structure of the channel (Fig. \ref{decay_channel}), allowing for the assignment of all the jets and leptons. In addition to that, in the \emph{Additional backgrounds selection} in (b) there is also the $t\overline{t}W$ and $t\overline{t}$ background, which cannot have the top pair decay and Higgs/Z boson decay cuts applied to it, since their decay channels are different (e.g. the W never decays into $\tau\;\tau$), so we opt to take all the data.
The \emph{Real data selection} in (c) simulates the real ATLAS data structure, where the productions cannot be distinguished, therefore we take all of the data, without any truth-based selections.
All three of these sets will be using the common selection of requiring the $2lSS + 1 \tau _{had}$ channel tag to be true and at least three detected jets with one b jet, which has been discussed in Sec. \ref{datasets_selections}. This selection will also be applied to the real ATLAS data, on which we will test the NN trained on dataset (c) from the figure.
\begin{figure}[h]
\centering{
%\resizebox{115mm}{!}{\includegraphics{images/data_selections_1.pdf}}
%\resizebox{115mm}{!}{\includegraphics{images/data_selections_2.pdf}}
%\resizebox{115mm}{!}{\includegraphics{images/data_selections_3.pdf}}
\resizebox{115mm}{!}{\includegraphics{images/data_selections_updated.pdf}}
\caption[Three data selections used with the NNs]{Three data selections used with NNs \par\small In red is the selected data and in black are the cuts made for each selection. The \emph{Narrow selection} is in (a) as described in Section \ref{datasets_selections}. In (b) additional background is added in the form of $t\overline{t}W$ and $t\overline{t}$ productions. Selection simulating the real ATLAS data is in (c), where truth information cannot be used, therefore the data cannot be separated on the conditions in the figure. The selection of $2lSS + 1 \tau _{had}$ channel and at least three jets with one b jet detected is used with all three (not shown in the picture).}
\label{data_selections}
}
\end{figure}
The number of events for the Narrow selection in Figure \ref{data_selections} is in the Table \ref{narrow_events}. The number of events without applying the top pair and Higgs/Z boson decay selection (i.e. Real data selection in the figure) has already been stated in column \emph{Selected events} in Table \ref{num_events}, but will be repeated here in Table \ref{narrow_events} under the column \emph{Real data selection}.
\begin{table}[h]
\begin{ctucolortab}
\begin{tabular}{ R{2.5cm} | R{2.5cm} R{2.8cm} R{3.2cm} }
\toprule
Production &Real data selection &Narrow selection &Percentage narrow \\
\midrule
$t\overline{t}H$ &73 741 &18 124 &24.58\% \\
$t\overline{t}Z$ &32 108 &10 886 &33.90\% \\
$t\overline{t}W$ &13 295 &- &- \\
$t\overline{t}$ &6 027 &- &- \\
\midrule
Total &125 171 &29 010 &23.18\% \\
\bottomrule
\end{tabular}
\end{ctucolortab}
\caption[Number of events after selection]{Number of events after selection \par\small \emph{Wide selection} comprises of all events with the $2lSS + 1 \tau _{had}$ channel and at least three jets with one b jet. \emph{Narrow selection} is obtained by applying the requirement of $\tau\;\tau$ Higgs (or Z) decay and lepton+jets $t\bar{t}$ decay on the wide selection. Wide selection only uses detector available variables, while the narrow selection requires truth information.}
\label{narrow_events}
\end{table}
\subsection{Data extraction code}
The Python code used for data extraction from ROOT ntuples to a format fit for the particle assignment NN is in the directory \path{/source_code/root_data_extraction}. It produces all three of the selections in Fig. \ref{data_selections}.
In the same directory is also the Python script for extraction of selected events from the real ATLAS dataset.
\section{Data augmentation}
\label{sec:data_aug}
Data augmentation is a technique used in machine learning to avoid overfitting and to achieve better generalization of a NN on a dataset \cite[p.335]{data_augmentation}. It is commonly used in convolutional neural networks in the way of altering (e.g. rotating, translating or squeezing) an image used as an input for the network. The principle of generating altered data, that could possibly occur in the original dataset (i.e. it follows the original data distribution) can be transferred to our case as well.
We can make use of the rotational symmetry of the detector about the beam pipe — the azimuthal symmetry \cite{symmetry}. While the symmetry is not perfect, as the detector is not fully homogeneous, it is still a valuable tool which brings us the benefits mentioned above.
\begin{figure}[h]
\centering{
\resizebox{80mm}{!}{\includegraphics{images/rot_sym.png}}
\caption[Picture of the detector with a coordinate system]
{Diagram of the detector with a coordinate system \par\small Source of the image is Fig. 4.5 in \cite{rotational_symmetry_diagram}.}
\label{rot_sym}
}
\end{figure}
Before being used as an input for the NN, each event will have its particles rotated about the beam pipe axis. This way the overall number of events stays the same, but the events differ across epochs. In practice this is achieved by changing the azimuthal angle of the particles by the same random value:
\begin{equation}
\bm{\phi}' = \bm{\phi}+\bm{\Delta\phi},
\end{equation}
where $\bm{\phi}$ is a vector of the azimuthal angles of the original particles in an event, $\bm{\phi}'$ is the vector of the azimuthal angles of augmented particles and $\bm{\Delta\phi} = (\Delta\phi,\Delta\phi,...)^T$ is a vector with the repeated value $\Delta\phi \in [0,2\pi)$, which is randomly generated for each event.
As we immediately input the $\phi'$ into a goniometric function to compute the momentum (Eq. \ref{goniometrix}), we do not require $\phi' \in [0,2\pi)$.
\section{Particle assignment neural network}
\label{sec:p_a_big}
For the reconstruction of the Higgs boson mass we will be using a regression neural network. As input for the network, we want to use data organized in a way where we distinguish between two particles of the same type but coming from different parent particles. Another way to put it is, that we want to pair the positions at the decay diagram (Fig. \ref{decay_channel}) with the corresponding detected particles or that we want to assign each jet and lepton to one of the three decay branches (i.e. top, antitop and Higgs (or Z) branch).
The information that would allow us this (i.e. the information of the child-parent relations between particles) is unavailable in the detector as has been discussed in Sec. \ref{detector_effects}. This means that to organize the data of an event, we will have to choose one out of many possible ways the particles can be assigned to their positions. In this thesis, we call this process the particle assignment or particle association.
There are two types of particles that have to be assigned — jets and leptons. For the task of particle assignment, our goal is to create a program which takes the raw ROOT data of an event on input and outputs data with jets and leptons ordered correspondingly to their positions.
The proposed approach is a classification neural network, which for each event takes in each possible permutation of jets and leptons at different positions and outputs the respective probabilities of each of the positions being assigned correctly. The most likely assignment is then chosen from the permutations as the one with the largest product of the respective probabilities.
This process is schematized on Figure \ref{pipeline_p_a} which expands on Figure \ref{pipeline_main}.
The inspiration for using permutations as input for the neural network comes from the paper \cite{parton_assignment} mentioned in Section \ref{task_at_hand}.
\begin{figure}[h]
\centering{
\resizebox{125mm}{!}{\includegraphics{images/pipeline_particle_ass.pdf}}
\caption[Diagram of the particle assignment process]
{Diagram of the particle assignment process \par\small First the ROOT data of an event has all possible permutations of the assignment of leptons and jets generated. The permutations are then processed by the (trained) NN, which assigns a score vector with five values to each permuation. The permutation with the highest product of the individual values is then chosen as the best particle assignment of the event. The event is then ready to have its mass reconstructed.}
\label{pipeline_p_a}
}
\end{figure}
\subsection{Lepton and jet permutations}
Looking at the decay diagram (Fig. \ref{decay_channel}) we distinguish two lepton positions — one lepton originating from the tau and one lepton originating from the top. Together with the fact that the detector usually detects exactly two leptons, this gives us two possible permutations for each event.
For the jets we distinguish three positions. Two of them are the b jets coming from the top and anti-top respectively. The third one is the sum of the two non-b jets (effectively the W boson that the non-b jets come from, for this reason we will call it the top W), as we do not distinguish between these two, as their positions are interchangeable\footnote{We however still keep their separate four-vectors so as to not lose any information.}. The number of jet permutations can be quite substantial, depending on the number of detected jets (up to eight).
The number of permutations $P$ of an event with two leptons and $n$ detected jets is
\begin{equation}
P = 2\cdot\frac{n(n-1)}{2}\cdot(n-2)(n-3),
\label{eq:perms}
\end{equation}
where $\frac{n(n-1)}{2}$ is the number of combinations for the W jet pair and $(n-2)(n-3)$ is the number of permutations of the two b jets.
The numbers of permutations of an event for different numbers of jets, obtained by application of the Eq. \ref{eq:perms}, are in Table \ref{num_permutations}.
\begin{table}[h]
\begin{ctucolortab}
\begin{tabular}{ R{4.4cm} | R{1cm} R{1cm} R{1cm} R{1cm} R{1cm} }
\toprule
Jets & 4 & 5 & 6 &7 &8 \\
Permutations & 24 & 120 & 360 & 840 & 1680 \\
\bottomrule
\end{tabular}
\end{ctucolortab}
\caption{Number of event permutations in relation to number of jets}
\label{num_permutations}
\end{table}
The total number of permutations, generated from the events of selection (a) from Figure \ref{data_selections}, which are the events we will be training and testing the particle assignment NN on, is in Table \ref{tab:tot_permutations}.
\begin{table}[H]
\begin{ctucolortab}
\begin{tabular}{ R{3.0cm} | R{3.0cm} R{3.0cm} }
\toprule
Production &Events &Permutations \\
\midrule
$t\overline{t}H$ &18 124 &4 294 080 \\
$t\overline{t}Z$ &10 886 &4 000 152 \\
\midrule
Total &29 010 &8 294 232 \\
\bottomrule
\end{tabular}
\end{ctucolortab}
\caption[Number of permutations]{Number of permutations \par\small The number of permutations generated from the events separated into the two distinct productions. The number of events stated is identical to the numbers in column \emph{Narrow selection} in Table \ref{narrow_events}.}
\label{tab:tot_permutations}
\end{table}
\subsection{Applicability}
\label{sec:apllicab}
As the particle assignment relates closely to the exact structure of the decay, it is important to note that it can only be applied on the Narrow selection from Fig. \ref{data_selections}. This selection will be used for the training and testing of the CNN.
Events of the other productions ($t\overline{t}W$ and $t\overline{t}$) and the differently decaying events of $t\overline{t}H$ and $t\overline{t}Z$ (e.g. the top quarks can both decay leptonically; the Higgs can decay into a pair of W bosons instead of taus etc. — these are the events we have removed by using the Narrow selection) will not be used in the evaluation of the CNN. But as we will use them in the mass reconstruction, these events will, in the end, also go through the particle assignment, even though it will be ineffective.
This is an unavoidable issue, because there is too many different productions and ways for them to decay, to have one united way of particle assignment. A possible different approach, would be organizing the particles in an entirely different way than by the positions in the decay. We have decided for the described approach, because our main focus is the narrow selection.
%As we noted in Sec. \ref{signal_background} the $t\overline{t}H$ and the $t\overline{t}Z$ can both decay into a pair of taus, as is illustrated in Fig. \ref{decay_channel}. The events which follow this decay (i.e. (a) in Fig. \ref{data_selections}) will be used to train the particle assignment NN.
%But there are also other background productions, most notably the $t\overline{t}W$ and $t\overline{t}$, which decay differently (e.g. the W boson cannot decay into a tau-tau pair as has been discussed in \ref{signal_background}). Additionally, even the $t\overline{t}H$ and $t\overline{t}Z$ can decay in different ways (e.g. the top quarks can both decay leptonically; the Higgs can decay into a pair of W bosons instead of taus etc. — these are the events we have removed by using the narrow selection in (a) in Fig. \ref{data_selections}). For these events will the particle assignment be less effective or not effective at all.
\subsection{Features}
\label{assignment_features}
There are 70 features on the input of the particle assignment NN. Most of them are the momentum and energy of final state particles (i.e. their four-vectors) and also their masses and angular distances calculated from these four-vectors (the equations for these calculations are in Section \ref{mass_deltar_equations}). The full list of variables, grouped by the physical nature of the variables, with names of the particles based on their origin particle (e.g. anti-top lepton is the lepton originating from the anti-top), is following.
\begin{itemize}
\item Four-vectors ($p_X$, $p_Y$, $p_Z$, $E$)$^T$ of detected particles which are the top b jet, anti-top b jet, both top non-b jets, Higgs boson hadronically decaying tau\footnote{By this, we do not mean the tau itself, but rather the hadrons coming from the tau, which are detected.} (Higgs tau), Higgs boson lepton and anti-top lepton.
\item Four-vectors of intermediate particles of the decay, created by addition of selected particles from previous paragraph. Specifically the top W boson (sum of top non-b jets), top quark (sum of top b jet and top W boson), visible part of the Higgs boson (sum of Higgs tau and Higgs lepton) and visible part of the anti-top quark (sum of anti-top b jet and anti-top lepton).
\item The mass of each detected particle or intermediate particle mentioned above.
\item The angular distance between selected pairs of particles, the focus being primarily on the particles that are being assigned (the jets and the leptons). The pairs are the Higgs tau and Higgs lepton, the Higgs tau and anti-top lepton, the anti-top b jet and Higgs lepton, the anti-top b jet and anti-top lepton, the two top non-b jets, the top b jet and top W boson and, finally, the top quark and the anti-top quark.
\item For each of the four jets a \emph{b-tag}, which indicates how likely is the jet coming from a b quark.
\item Missing transverse energy characterized by its $\cancel{E}_{T_X}$ and $\cancel{E}_{T_Y}$ components.
\item The scalar sum of the total visible transverse energy and the scalar sum of the transverse energy of all detected jets.
\item The number of jets with energy over 25 GeV.
\item Lastly the decay mode of the hadronically decaying tau.
\end{itemize}
Inspiration in the choice of the features was in the thesis of Petr Urban \cite{decay_channel_image} and the MMC, as the last three items on the list are variables also used in there. We will be using the same features in the mass reconstruction NN as well.
It should also be emphasized, that the names of the particles in the above list refer to the positions, not to the actual particles assigned to them, because finding the correct particles to assign to the positions is the task at hand.
\subsection{Labels}
Each permutation is labeled with a vector
\begin{equation}
\begin{aligned}
l &= (a_{b1}, a_{b2}, a_{W}, a_{l1}, a_{l2}), \\
a_{p} &= \left\{
\begin{array}{ll}
1, & \text{if particle at position $p$ is assigned correctly,}\\
0, & \text{if particle at position $p$ is assigned incorrectly,}\\
\end{array}
\right.
\end{aligned}
\label{labels_equation}
\end{equation}
$$\text{where } p \in \{b1, b2, W, l1, l2\} \text{ is one of the five positions we are assigning to.}$$
The label for a permutation is created by comparing the assigned particle to the correct particle for that position, which is known from the truth information of the event, which is available in the dataset. As the particle is not detected perfectly, the detected and true particle have to be compared on some criteria to determine, whether they can be paired and the detected particle assigned to the position.
The criteria on which we compare this pair to decide, whether the assignment is correct, is the angular distance between the two \cite[p.6]{parton_assignment}:
\begin{equation}
\begin{aligned}
a_{l}= \left\{
\begin{array}{ll}
1, & dist(l_{true},l_{assigned}) \leq 0.12, \\
0, & dist(l_{true},l_{assigned}) > 0.12, \\
\end{array}
\right. \\
a_{b}= \left\{
\begin{array}{ll}
1, & dist(b_{true},b_{assigned}) \leq 0.32, \\
0, & dist(b_{true},b_{assigned}) > 0.32, \\
\end{array}
\right. \\
a_{q}= \left\{
\begin{array}{ll}
1, & dist(W_{true},W_{assigned}) \leq 0.32, \\
0, & dist(W_{true},W_{assigned}) > 0.32, \\
\end{array}
\right.
\end{aligned}
\end{equation}
$$\text{where }dist(p_1,p_2) \text{ symbolizes the angular distance between two particles.}$$
As the W position is a sum of two non-b jets (symbol $q$), we compare the two jets and if both meet the distance condition the W is labeled as correctly assigned.
The right-side threshold values of the angular distances were obtained from experimental results, as values that separate the two distributions of correctly and incorrectly assigned particles such as in Fig. \ref{deltar_separation}. These distributions become apparent once we plot the distances between all possible b jet combinations and the separation value is then selected.
\begin{figure}[h]
\centering{
\resizebox{85mm}{!}{\includegraphics{images/deltar_1.pdf}}
\caption[Delta R separation of correctly and incorrectly paired b jets]
{Delta R separation of correctly and incorrectly paired b jets \par\small Data used was a combination of $t\overline{t}H$ and $t\overline{t}Z$. All possible pairs were made for each event and the angular distance was calculated. By plotting the distances in a histogram a value is chosen to separate the two distributions that become apparent.}
\label{deltar_separation}
}
\end{figure}
\subsection{Architecture and hyper-parameters}
\label{sec:pa_architecture}
The inspiration for the architecture stems from paper on jet-parton assignment \cite{parton_assignment}. Changes were made to the output and the exact architecture and hyper-parameters were adjusted for our task.
The NN consists of multiple fully connected layers with ReLU activation functions. Dropout layers and data augmentation are used to reduce overfitting on the training data. Furthermore, L2 regularization was used on all weights \cite[p.182]{data_augmentation}:
\begin{equation}
\mathcal{L}' = \mathcal{L} + \lambda \cdot \sum_{i=0}^{d} w_i^2,
\end{equation}
where $\mathcal{L}$ is the old loss function (in this case the binary crossentropy) $\mathcal{L}'$ is the loss function with regularization, $\lambda$ is the regularization parameter and $\sum w^2$ is the sum of the weights squared.
Skip connections are used to accelerate the learning process \cite{parton_assignment}. A diagram of the NN architecture is shown in Figure \ref{p_a_schematics}.
\begin{figure}[h]
\centering{
\resizebox{100mm}{!}{\includegraphics{images/particle_assignment_schematics.pdf}}
\caption{The particle assignment neural network diagram}
\label{p_a_schematics}
}
\end{figure}
The exact specifications of the NN are in Table \ref{p_a_specifications}.
\begin{table}[h]
\begin{ctucolortab}
\begin{tabular}{ R{4.2cm} | R{4.2cm} }
\toprule
Number of inputs & 71 \\
Number of outputs & 5 \\
Learning rate & 0.0003 \\
Optimizer & Adam \\
Dropout rate & 0.2 \\
Dense layer neurons & 500 \\
Hidden layers activations & ReLU \\
Output layer activation & Sigmoid \\
\bottomrule
\end{tabular}
\end{ctucolortab}
\caption{Specifications of the particle assignment CNN}
\label{p_a_specifications}
\end{table}
Learning rate step decay was used to make the training process smoother \cite[p.136]{data_augmentation}:
\begin{equation}
\alpha_t = \alpha_0 \cdot 0.99^{t},
\end{equation}
where $t$ is the number of the epoch and $\alpha_0$ is the initial learning rate.
The loss function is the binary cross entropy (Eq. \ref{bce}). Since we are outputting five values we calculate it for each output and the loss is then their weighted mean. The mean has to be weighted, because the five outputs each have a different ratio between the total number of zeros and ones in their respective labels across all samples (imbalanced classification). For the leptons, where in the vast majority\footnote{Not always because of the detector effects which can (very rarely) cause for example one of the leptons to be detected poorly.} of cases there is one correct and one incorrect permutation the classification can be considered balanced. For the jets, on the other hand, there are usually multiple contenders but only one of them is correct, this leads to more zero labels than ones.
This unbalance of the label classes (zero corresponding to incorrect assignment and one corresponding to correct assignment) has to be offset, for which weights in the binary cross entropy are used. These are calculated from the exact ratio of the classes for each output separately (Table \ref{Tab:p_a_weights}):
\begin{equation}
\begin{aligned}
b_p^0 &= \sum_{k=1}^n a_{p_k}, \\
b_p^1 &= \sum_{k=1}^n 1-a_{p_k}, \\
w_p^0 &= \frac{2b_p^0}{b_p^0+b_p^1}, \\
w_p^1 &= \frac{2b_p^1}{b_p^0+b_p^1}, \\
\end{aligned}
\end{equation}
where $n$ is the number of events, $a_{p_k}$ is the label of position $p$ of $k$-th event, $b_p^0$ is a helper variable equal to the number of labels with 0 and $w_p^0$ is the weight for class 0 for the position $p$ being assigned to. The equations are designed for the sum of the two class labels to be two:
\begin{equation}
w_p^0 + w_p^1 = 2, \;\;\;\; \forall p.
\end{equation}
\begin{table}[h]
\begin{ctucolortab}
%\begin{tabular}{ R{1.0cm} | R{1.6cm} R{2.3cm} R{1.2cm} R{1.8cm} R{2.5cm} }
% \toprule
% class & top b jet & anti-top b jet & top W & tau lepton & anti-top lepton \\
% \midrule
% 1 & 1.72 & 1.72 & 1.85 & 1.02 & 1.03 \\
%
% 0 & 0.28 & 0.28 & 0.15 & 0.98 & 0.97 \\
% \bottomrule
%\end{tabular}
\begin{tabular}{ R{3.0cm} R{3.0cm} R{3.0cm} }
\toprule
Particle & Class 0 & Class 1 \\
\midrule
Top b jet & 0.28 & 1.72 \\
Anti-top b jet & 0.29 & 1.71 \\
Top W & 0.07 & 1.93 \\
Tau lepton & 0.98 & 1.02 \\
Anti-top lepton & 0.99 & 1.01 \\
\bottomrule
\end{tabular}
\end{ctucolortab}
\caption{Weights used in the loss function of the particle assignment CNN}
\label{Tab:p_a_weights}
\end{table}
\subsection{Particle assignment code}
The Jupyter notebook for the training of the particle assignment NN is at \path{/source_code/particle_assignment/particle_assignment_}\linebreak\path{training.ipynb}. The code trains on the data of the Narrow selection (Fig. \ref{data_selections}) extracted from the ROOT ntuples.
To choose the best permutation for each event and thus process the data for the mass reconstruction the Jupyer notebook at \path{/source_code/particle_assignment/particle_assignment_training.ipynb} can be used.
A trained NN is also included in directory \path{/source_code/trained_NN_models}.
\section{Mass reconstruction neural network}
\label{sec:mass_reco}
As has been stated in Sec. \ref{task_at_hand}, for the mass reconstruction a regression neural network will be used. The data on which the NN will be trained and tested will be the three selections of Fig. \ref{data_selections}, with the data processed by the particle assignment NN. The particle assignment is only effective for the Narrow selection from the figure. For the other two wider selections from the figure, we will use the same particle assignment NN, although it will be mostly ineffective, as the decay channels of the events added by these wider selections are different. In the approach we have chosen, we do not have a better method to assign the particles of these added events.
The MMC (Sec. \ref{sec:mmc}) will serve as a comparison for the NN. An MMC library from \emph{Athena} (described as "the \emph{ATLAS Experiment's main offline software repository}" \cite{athena_git}) has been provided by the supervisor. The library has been used in our script with the \emph{2015 calibration set} and the data used with the MMC was the same as for the NN, which means, it was also processed by our particle assignment NN (the MMC requires only the assignment of leptons).
The C++ code for our implementation of a script that uses the MMC library is in the directory \path{/source_code/MMC}, where are also the Jupyter notebooks for evaluating the reconstructed mass data of the MMC.
\subsection{Mass reconstruction goal}
The goal of the mass reconstruction is to predict the mass of the desired particle. For the signal ($t\overline{t}H$) events this particle is the Higgs boson and for the background events this particle is the Z boson for $t\overline{t}Z$, the W boson for $t\overline{t}W$ and for the $t\overline{t}$ we will ideally predict a zero mass, as there is no particle to be reconstructed\footnote{We could reconstruct the leptonically decaying top quark, but that would be inconsistent with the other productions, where we also could reconstruct the top quark, but we do not, as we reconstruct a different particle.}.
The values to which we will be comparing the predicted masses will be calculated from the truth information that is present in the simulated dataset. The masses calculated this way are not exactly equal to the known constant masses but have a negligible variance, which is a property of the dataset we are working with. Another almost equivalent approach would be to take the constant invariant masses from Table \ref{masses}.
\subsection{Loss function}
\label{sec:loss_functions}
We propose multiple possible approaches each requiring a different loss function to be used with the NN. When choosing an approach (and its loss function) we have to consider multiple things.
We expect the NN to distinguish between the signal and the background and for the reconstructed masses to be close to their actual values.
There is also the question of what background signals do we want the NN to be able to process. The MMC only works well with the $t\overline{t}H$ and $t\overline{t}Z$ productions, but in the real ATLAS data there are also other productions such as the $t\overline{t}W$ and $t\overline{t}$.
The loss functions will use different labels, but each will be able to produce the reconstructed mass from its output.
\subsubsection{MMC inspired loss}
Inspired by the MMC, a loss function incorporating equations of the invariant mass of reconstructed particles and the MET. The NN outputs the predicted four-vectors of the four neutrinos. Then the masses of the neutrinos and also of particles, reconstructed by adding together the neutrinos and the visible particles (e.g. the neutrino and the lepton coming from the anti-top branch when added together make the anti-top W boson), are calculated by the formula Eq. \ref{masses_equation}. With four neutrinos, two W bosons, two taus, one Higgs and one top this gives us ten predicted masses.