Skip to content

Commit

Permalink
Added files to dev branch
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewbellis committed Apr 7, 2020
1 parent 9700071 commit 3b874c2
Show file tree
Hide file tree
Showing 37 changed files with 403 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1 @@
_build
site/
1 change: 1 addition & 0 deletions PUSH.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mkdocs gh-deploy
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# cms-open-data-guide
MkDocs documentation for the open data guide
8 changes: 8 additions & 0 deletions docs/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# About

This guide points you to places


## Contributors

## Contact
4 changes: 4 additions & 0 deletions docs/analysis/backgrounds/qcdestimation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# QCD Estimation

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/analysis/backgrounds/techniques.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Techniques

!!! Warning
This page is under construction
72 changes: 72 additions & 0 deletions docs/analysis/datasim/collisiondata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Collision Data

!!! Warning
This page is under construction

The CMS collision data is organized in primary datasets (PD).
All CMS open data primary datasets can be found with [this search](http://opendata.cern.ch/search?page=1&size=20&type=Dataset&subtype=Collision&experiment=CMS>).

The dataset name consists of three parts separated by "/", e.g.:

```/TauPlusX/Run2011A-12Oct2013-v1/AOD```

The first part indicates the primary dataset contents (```TauPlusX```), the second part is the data-taking era (```Run2011A```) and reprocessing (```12Oct2013```), and the last one indicates the data format (```AOD```).


## Dataset contents

The primary dataset definition is centered around physics objects (SingleMu, Jet, Tau etc).
Events triggered by High Level Triggers (HLT) with a similar physics contents or use
are mostly directed in the same PD. [This guide](http://opendata.cern.ch/docs/cms-guide-trigger-system)
gives an overview of the CMS trigger system.
Besides requirements on the physics content, the organisation of the primary
datasets has to satisfy constraints related to the data processing and handling,
such as the average event rate approximately uniform across the
different PDs, and the event rate more than 10 Hz and less than 200 Hz. (relevant?)

Each CMS collision dataset comes with a brief description of the contents, and
the full listing of all possible HLT trigger streams included in the dataset.
The instructions how to find the exact definitions and parameters of the
HLT trigger definitions can be found in
[Guide to the CMS Trigger System](http://opendata.cern.ch/docs/cms-guide-trigger-system) under "*HLT Trigger Path definitions*".

Since a given event can pass more than one HLT path, it
can be included in more than one primary dataset.
There's an overall overlap between the PDs of around 25-35% during Run1 and
it must be taken into account when combining events from different datasets in an analysis.

## Data taking and reprocessing

One year of data taking is divided in several "eras" indicated as RunA, RunB, etc.
According to the CMS data policy, 50% of data is published after the embargo period,
completed with the full release within 10 years. Currently available are

* Run2010A and Run2010B
* Run2011A
* Run2012B and Run2012C

The data are reprocessed several times, and it is the last complete reprocessing available at the time of the release which is made public.

## Data format

The data format in use for Run1 data is Analysis Object Data (AOD).
A brief description of data formats can be found in the introductory
[About CMS](http://opendata.cern.ch/docs/about-cms) under "*Primary and simulated datasets*".


## Else (FIXME)

To consider:

- mention json files for validated runs/LS
- mention condition data and GT
- Integrated luminosity here or in a separate chapter?


Refs
G. Franzoni: Dataset definition for CMS operations and physics analyses
https://cds.cern.ch/record/1976679/files/CR2014_311.pdf




4 changes: 4 additions & 0 deletions docs/analysis/datasim/eventgeneration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Event Generation

!!! Warning
This page is under construction
47 changes: 47 additions & 0 deletions docs/analysis/datasim/mcsimulations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Monte Carlo Simulations

!!! Warning
This page is under construction


A set of simulated data (Monte Carlo - MC) corresponding to the collision data
is made available. All directly available MC datasets can be found with
[this search](http://opendata.cern.ch/search?page=1&size=20&type=Dataset&subtype=Simulated&experiment=CMS).
Furthermore, large amount of MC, thought to be of less frequent use, is available on demand
and included in [search results](http://opendata.cern.ch/search?page=1&size=20&type=Dataset&experiment=CMS&subtype=Simulated&ondemand=True)
if "*include on-demand datasets*" option is selected.

MC dataset are searchable by [categories](http://opendata.cern.ch/docs/simulated-dataset-categories),
which can be found under "Filter by category" on the left bar of the search page.

The dataset name consists of three parts separated by ```/``` e.g.:

```/DYToMuMu_M-15To50_Tune4C_8TeV-pythia8/Summer12_DR53X-PU_S10_START53_V19-v1/AODSIM```

The first part indicates the simulated physics process (```DYToMuMu```),
some of the production parameters (```M-15To50_Tune4C```), collision energy (```8TeV```),
and the event generator used in the processing chain. [CMS simulated datasets names](http://opendata.cern.ch/docs/cms-simulated-dataset-names)
gives more details in the naming.
The second part is the production campaign (```Summer12_DR53X```), [pile-up](http://opendata.cern.ch/docs/cms-guide-pileup-simulation)
profile (```PU_S10```) and processing [conditions](http://opendata.cern.ch/docs/cms-guide-for-condition-database) (```START53_V19```),
and the last one indicates the data format (```AODSIM```).

## Dataset contents
~~~~~~~~~~~~~~~~
The dataset naming reflects the contents of the dataset, and the actual generator parameters
with which the dataset contents have been defined can be
found as explained under "*Finding the generator parameters*" in the
[CMS Monte Carlo production overview](http://opendata.cern.ch/docs/cms-mc-production-overview).
## Processing
[CMS Monte Carlo production overview](http://opendata.cern.ch/docs/cms-mc-production-overview)
briefly describes the steps in the MC production chain.
## Data format
The data format in use for Run1 MC data is Analysis Object Data (AODSIM).
A brief description of data formats can be found in the
introductory [About CMS](http://opendata.cern.ch/docs/about-cms) under "*Primary and simulated datasets*".
19 changes: 19 additions & 0 deletions docs/analysis/do.csh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#foreach file(luminosity.md )
foreach dir(backgrounds datasim interpretation selection systematics)

mkdir $dir

end

foreach file(backgrounds/qcdestimation.md backgrounds/techniques.md datasim/collisiondata.md datasim/eventgeneration.md datasim/mcsimulations.md interpretation/limits.md interpretation/stats.md selection/objectid.md selection/objects.md selection/triggers.md systematics/lumiuncertain.md systematics/mcuncertain.md systematics/objectsuncertain.md systematics/pileupuncertain.md)

echo $file

echo "# " $file > $file
echo '\!\!\! Warning' >> $file
echo " This page is under construction" >> $file

end



4 changes: 4 additions & 0 deletions docs/analysis/interpretation/limits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Upper-limit calculations

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/analysis/interpretation/stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Statistics

!!! Warning
This page is under construction
6 changes: 6 additions & 0 deletions docs/analysis/luminosity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Luminosity

!!! Warning
This page is under construction

## Sub-heading
4 changes: 4 additions & 0 deletions docs/analysis/selection/objectid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Object ID

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/analysis/selection/objects.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Objects

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/analysis/selection/triggers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Triggers

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/analysis/systematics/lumiuncertain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Luminosity Uncertainty

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/analysis/systematics/mcuncertain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# MC Uncertainty

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/analysis/systematics/objectsuncertain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Object Uncertainty

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/analysis/systematics/pileupuncertain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Pileup Uncertainty

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/cmssw/cmsswanalyzers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Analyzers

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/cmssw/cmsswconfigure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Configure

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/cmssw/cmsswdataanalysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Data Analysis

!!! Warning
This page is under construction
3 changes: 3 additions & 0 deletions docs/cmssw/cmsswdatamodel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Data Model
!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/cmssw/cmsswfrontier.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Frontier

!!! Warning
This page is under construction
4 changes: 4 additions & 0 deletions docs/cmssw/cmsswoverview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Overview

!!! Warning
This page is under construction
33 changes: 33 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# FAQ

Frequently Asked Questions and other problems and issues
that have come up.

***Possible subsections below***

## High-level questions

#### **Why would I choose VirtualBox over docker? Why would I choose docker over VirtualBox?**

Great question! Anyone?

## Docker

#### **Docker downloads container but never launches environment**

This is an issue with newer OSs on your local laptop/desktop running older OSs in the container.

For example, suppose you are following the [Running CMS analysis code using Docker](http://opendata.cern.ch/docs/cms-guide-docker)
tutorial. If you run

```bash
docker run --name opendata -it cmsopendata/cmssw_5_3_32 /bin/bash
```
and the container downloads but you don't find yourself in the ```CMSSW_5_3_32``` environment, then...


## Data

## CMSSW


10 changes: 10 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# CMS Open Data Guide

All you want to know!

[CERN Open Data Portal](http://opendata.cern.ch/)


## How to use this site


5 changes: 5 additions & 0 deletions docs/tools/cernportal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# The CERN Open Data Portal

!!! Warning
This page is under construction

5 changes: 5 additions & 0 deletions docs/tools/cmsopendata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# CMS Open Data

!!! Warning
This page is under construction

5 changes: 5 additions & 0 deletions docs/tools/cmstwiki.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# The CMS Twiki

!!! Warning
This page is under construction

5 changes: 5 additions & 0 deletions docs/tools/cppandpython.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# C++ and python

!!! Warning
This page is under construction

4 changes: 4 additions & 0 deletions docs/tools/docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Docker

!!! Warning
This page is under construction
6 changes: 6 additions & 0 deletions docs/tools/git.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Git

!!! Warning
This page is under construction

Here are some helpful links to learn how to use git.
32 changes: 32 additions & 0 deletions docs/tools/root.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# ROOT

!!! Warning
This page is under construction

From [ROOT's webpage](https://root.cern.ch)

*A modular scientific software toolkit.
It provides all the functionalities needed to deal with big data processing,
statistical analysis, visualisation and storage.
It is mainly written in C++ but integrated with other languages such as Python and R.*

It is the primary toolkit for many experimental analysis and while you are
free to analyze these datasets however you like, some familiarity with
ROOT will serve you well when accessing the data.

* Many ROOT examples can be found [here](https://root.cern/doc/master/group__Tutorials.html>).
If you don't know where to start, we would recommend
* Example 1
* Example 2
* Example 3


* Python has become the language of choice for many analysts and most of the examples
you'll see make use of the PyROOT module, callable from python. You can go through
a number of the examples [here](https://root.cern.ch/doc/master/group__tutorial__pyroot.html).
If you don't know where to start, we would recommend
* Example 1
* Example 2
* Example 3


5 changes: 5 additions & 0 deletions docs/tools/virtualmachines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Virtual machines

!!! Warning
This page is under construction

Loading

0 comments on commit 3b874c2

Please sign in to comment.