No Slide Title

Download Report

Transcript No Slide Title

Pit Operation - “Luminosity Production” - is in good hands with many devoted and
competent people from experts to shifters
• But as conclusion will state, we need more to guarantee quality physics
 Experts would also like to be able to devote a bit of time to physics analysis
• Also, luckily we left behind a very good team at CERN meeting the challenges of this
week with nominal bunch intensities!
R. Jacobsson
1

Concentrate on global topics that are of concern interest to entire collaboration
• Will not discuss the status of the individual sub-detectors unless affecting global
operation
• In the past presented a lot how we followed and participated to the beam
commissioning

Main topics
• Operation up to now
• Operational status and efficiency
• Luminosity
• Data Quality
• First experience with nominal bunches
• Trigger
• Organization
• Tools to follow operation
• Shifter situation, the working model, and the needs for the future
R. Jacobsson
2
Machine and Experiment Availability

• Extremely low average failure rates * extremely high number of vital system = 0.50
• Thunderstorms daily now!
 Tripped LHCb magnet already twice
Wednesday:
1. AFS problem
2. SPS down
3. Thunderstorm
4. VELO motion
5. ….
A lot for one day…..
Still we took 1h of physics!
•
Wrong fill number for 30min!
R. Jacobsson
3
Where the plan stopped at the RC report in March:
January
w1
w2
February
w3
w4
w5
LHCb
LHC Beam LHC
Meeting on
Day shifts & PostMortem Commission Chamonix calibration
Day Piquet meeting
Workshop Workshop run needs
Power-up &
Online
Upgrades Sub-detector stand-alone works and tests
w6
Mars
w7
w8
w9
w10
w11
Continuous global 'Heat Run' with rate MD Mode = off
ramp (First detector calibrations)
No Beam = Tests
450 GeV
Technical
Fills?
Detector
Calibrations
& Dataflow?
Cosmics
13/2-14/2
Determining
colliding
beams?
24h Shifts
11/2 -
LHC + Exps First beam LHC Beam
Dryn Run in LHC 28/2 Commissioning
TED Run
18/2
LHCb
Magnet (V-)
w. beam
w12
LHCb
Magnet (V+)
w. beam?
R. Jacobsson
4
(Hz/mb)
MB<100 Hz
MB<1 kHz
HLT1 rejection
Nominal bunches
B down
HLT1 rejection
HLT2 pass-thru
Minimum Bias, HLT pass-thru
Bup~5nb-1
Bdown~7.6nb-1
R. Jacobsson
5

Cumulative (In-)efficiency logging implemented since fill 1089
• Breakdown on HV, VELO, DAQ, DAQ Lifetime (trigger throttling)
• Entered into Run Database
Operational luminosity (in-)efficiencies May 10 – June 5
HV 4.0%
VELO 2.5%
DAQ 5.4%
Deadtime 1.6%
Trigger deadtime <0.01%
Running 86.5%
- All detectors in data taking
- VELO fully closed
- No single HV trips
- Triggers accepted
R. Jacobsson
6

LHCb dependence on LHC:
• Short-hand page for LHC Operators and EICs
• Completely automatized for LHCb Shifters requiring ‘only’ confirmations
 Also Voice Assistance
 VELO still to be fully integrated
 Very advanced as compared to Atlas and CMS….
R. Jacobsson
7

LHCb State Control
R. Jacobsson
8

Shifter Voice Assistance
• Draw attention to new information or changes
 LHC Page 1, injection, optimization scans, etc
• Instructions for LHCb State Control handling
 HV/LV handling, BCM rearm etc
• Undesired events…
 Beams lost, run stopped, magnet trip, clock loss
• DSS Alarms, Histogram Alarms to be added and voice quality to be improved
• Related work in progress: Clean up shifters instructions on the consoles and add help button
to all displays

Collapse of separation bumps simultaneous between all experiments
• Golden orbit established with improved reproducibility
 Good luminosity already during Adjust
 Optimization scan right at start of Stable Beams starting with experiment with lowest
luminosity
 Full VELO powering during ADJUST (TCTs at physics setting and separation bumps
collapsed)
 Powering of VELO by central shifters next step
• Future of VELO Closure by Shifter/Expert being discussed
 Closing Manager now very user friendly
 Aim for have “on-call” shifter for closing, preferably same as piquet

End of fill calibrations, automization?
R. Jacobsson
9

Work on automatic recovery from DAQ problems in progress
• Added one after the other
• Start testing Autopilot

Majority mechanism when configuring farm to start run being looked into
• Farm sw and storage sw crashes still room for improvement

Exclusion/recovery of problematic (sub)farms on the fly while taking data
• Routine procedure for shifters

Recovery of monitoring/reconstruction/calibration farms while taking data

Faster recovery of sub-detectors without stopping the run (only trigger) becoming
routine maneuvers for most shifters
R. Jacobsson
10
Two numbers for Trigger Deadtime counting
•
TriggerLivetime(L0) @ bb-crossings
•
TriggerLivetime(Lumi) @ bb-crossings
Also major improvements made on monitoring of the HLT (histograms and trends)
• Both technical parameters and physics retention
R. Jacobsson
11

Problem with DAQ and control switch seems solved

Storage problem earlier this year also solved

Purchase of farm during 2010Q3, install in November during ion run

Some outstanding subdetector problems:
• Dieing VCSELs in the subdetectors is a worry
• SPECS connections in the OT tracker
• Control of ISEG HV for VELO seems solved changing from Systec to Peak
• L0 Derandomizer emulation of Beetle – about to be addressed
• …

System Diagnostics Tools is like an AOB on every agenda since always…
• Alarm screen and Log viewer
• Well, better solving the problem than adding the alarm if that works!
R. Jacobsson
12
Data Quality is of highest importance now (together with trigger)
• Main problem: we need more interest/participation from people doing physics analysis

Discover and document “which” problem is tolerable and not tolerable
• Impact on data quality in order to know the urgency of solving a problem  operational
efficiency
 Aim at perfect data obviously more than 100% operational efficiency
 But recoveries should be well thought through, well planned and swift, and to the extent that it is
possible coordinated with other pending recoveries!
• How to classify data quality problems for different physics analysis

Establish routine for use of Problem Database
• Checking in and checking out entries, fast feedback

Procedure outlined for decision on detector interventions which may have an
impact on data quality

Working group setup to address Online Data Quality tools and follow up
•
•
•
•
•
Improvements of histogram presenter, histogram analysis, alarms etc
Need for trend plots, trend presenter and trend database being looked into
Documenting quality problems and their impacts/recoveries
Reconstruction Farm and associated histograms
More interest from subdetectors would be welcome
R. Jacobsson
13

Shifter catalogue
• Most important/significant histograms with descriptions and references
• Several iterations, still need improvements and links to severity/actions

Alarm panel from automatic histogram analysis
• Associate sound/voice to alarms
R. Jacobsson
14

The tool for registering data quality problems – Problem Database
• Shared between Online – Offline
• http://lbproblems.cern.ch/ (“Problem DB” from LHCb Welcome page)
R. Jacobsson
15

Three sources of luminosity online
• Counted by ODIN using non-prescaled L0Calo or L0Muon trigger from L0DU
 Getting average number of interactions per crossing and pileup from fraction of null crossings
 Correcting luminosity real-time
 Recorded luminosity
• Beam Loss Scintillators acceptance determined relative to L0 Calo
 Luminosity corrected for pileup
• LHC Collision rate monitors (BRANs)
 Not yet calibrated but in principle only used for cross-checking
• Combination gives delivered luminosity
• Recorded in Online archive, Run Database, LHC displays and logging, and LHC
Program Coordinator plots (delivered) for overall machine performance

Optimization scans are based on this combined luminosity

For offline lumi triggers containing luminosity counters – “nanofied”
• Tool being finalized to obtain integrated luminosity on analyzed files
• Constantly at 1 kHz
• Careful changing thresholds/prescaling on sources of the lumi counters
R. Jacobsson
16

http://lbrundb.cern.ch/ (“RunDB” on LHCb Welcome page)
• Tool for anybody in the collaboration to get rough idea on data collected
• Help/documentation should be linked
R. Jacobsson
17

Van der Meer scans
• To a large extent automatic with ODIN connected directly to the scan data received
from LHC real-time and flagging the steps in the data
 Allows easy offline analysis

Has allowed a first determination of length scales (LHC/VELO) and of absolute luminosity:
•
Visible L0Calo cross-section to 60+/-6 mb (prel)
• From MC: s(L0 CALO) = s(L0) x 0.937 = 63.7 * 0.937 = 59.7 mb
• Many things still to be verified, another vdM scan is on our planning

Also allows another method to extract beam shapes and VELO resolution
R. Jacobsson
18

Access to experiment condition archive in the online system
•
•
•
•
•

Machine settings
Beam parameters measured by machine and LHCb
Backgrounds measured by machine and LHCb
Trigger rates, luminosities, VELO luminous region, bunch profiles
Run performance numbers, etc
Tool also produces LPC files for luminosity, luminous region and bunch profile data
R. Jacobsson
19
Arrived at a dead-end with Qbunch ~ 2E10 (max 4-5E10)
2. More to understand with increasing Qbunch than Nbunch
3. Summer months with not all experts present
4. Keep up luminosity ladder for this year
 June 9 - June 25 (16 days!)
1.
7x7@5E10
[email protected]
[email protected]
[email protected]
13x13@2E10
R. Jacobsson
20

Increasing number of nominal bunches through July-August
• 170 kJ  1.5 MJ
• Gain experience
• Understand already strange bunch/beam behaviour
• LHC Operation does not feel ready for 0.5 – 1 MJ yet, work in progress
2x2
1e11
3x3
1e11
6x6
1e11
12x12
1e11
24x24
1e11
Trains needed…
2
3
6
12
24
1
2
4
8
16
112
168
336
672
1344
2.5E29
5.0E29
1.0E30
2.0E30
4.0E30
0.005
0.03
0.7
2.1
4.9
(1 fills)
(3 fills)
(10 fills)
(10 fills)
(10 fills)
R. Jacobsson
21

Complete two-day internal review of the Machine and Experiment Protection
• >1.5 (3) MJ
• Long list of actions
• Will be followed by a complete external review

Dump following lightning strike and power blackout!
R. Jacobsson
22

Four fills with 3x3
•
•
•
•
•
•
#fill
1179
1182
1185
1186
1188
Qbunch
0.8E11
0.9E11
1.15E11
Rocky start!...
L0Calo
7500
16000
19300
10000
16000
Pileup
1.2
1.7
2.3
1.3
1.7
PeakLumi
0.15
0.46
0.73
0.22
0.46
Efficiency
78% (VELO lumi-monitoring/BPM/new conf)
68% (deadtime, HLT blocked)
85% (RICH, VELO,
To be patched (wrong fill number but stable)
65% (Storage, HLT, VELO, Trigger OK)
Old L0 settings + HLT1+ HLT2Express (Stable but 15% deadtime)
Reconfiguring: New L0 settings + HLT1+ HLT2Full (30 min)
Memory and combinatorics – run died and tasks stuck…
2 hours to recover/reconfigure New L0 + HLT1 + HLT2Express
Completely stable through entire night
R. Jacobsson
23

We’ve been sailing in light breeze up to now

Not only interaction pileup but also problem pileup
• Pileup 2.3!
• Occupancies
 E.g. Problem with MTU size for UKL1
• Event size 85 kB (used to be 35 kB)
• Storage backpressure
 Running with 10% - 20% deadtime at 1500 – 2000 Hz at 85 kB (peak!)
 Suspicion is that MD5 checksum calculation limits output (again) to 1 Gb/s
• Lurking instabilities in weak individual electronics boards?
 Desychronizations, data corruption, strange errors at beginning of fills….
R. Jacobsson
24

Peak occupancies 22%! Average >7.5% as compared to 5% in the past
R. Jacobsson
25

(0x2710  0x1F)
• L0-Mb (CALO, MUON, minbias, SPD, SPD40, PU, PU20)  Prescale by 100
• Physics
 Electron 700 MeV
 Hadron 1220 MeV
 Muon
320 MeV
 Dimuon 320/80MeV
 Photon 2400 MeV

 1400 MeV
 2260 MeV
 1000 MeV
 400 MeV
 2400 MeV
Yet another configuration prepared
• L0xHLT1 retention 2%, including HLT2 would allow to go to 200 kHz
• Would prefer not to use even if we have to run with a bit of deadtime

Changed to solve 10% - 20% deadtime problem
• System completely stable with deadtime but long to stop in case of problems….

10 kHz of random bb-crossing and be-, eb-, ee-crossings according to
• Weighting {bb:0.7, eb:0.15, be:0.1, ee:0.05}
R. Jacobsson
26

Technical problems in HLT
• HLT1 (3D) OK with 7.5 % retention
• HLT2Express stable but contains only J/y, L, KS, Dsfp, D*D0p, BeamHalo
• HLT2Full (150++ lines) serious problems and surely a lot of unnecessary overlap
• HLT2Core (81 lines) validated with FEST and data taken during weekend
 Configured in pass-through now to test it and check output before we have to switch on rejection
>6x6
 Best compromise we have for the moment together with L0TCK 0x1F
 First impression is that it was working stable during fill this night
• Processing time for HLT with HLT2Express observed to be 140ms…
450 nodes x 8 task * 1/140E-3 = 26 kHz!
To be followed up
Should see how this developed with HLT2Core during this nights fill
• Two measures to solve bad memory behavior partly and stuck tasks already done
 Activating swap space on local disk of farm node improved significantly the situation
 Automatic script prepared which would kill the leader
 Requires careful tuning and testing since memory spread is narrow
 Memory/disk in Westmere machines?
R. Jacobsson
27

We managed to take a lot of data containing full natural mixture of pileup
• Invaluable for testing, validating and debugging HLT
• Lucky we got nominal intensity now with few bunches!...

We aim hard to be flexible and should keep this spirit
• But converge quickly on compromise for physics and technical limitations
 Most of all solve bugs and tune system
• Avoid cornering ourselves in phase space now in panic by severe cuts
• Exploring and understanding is now or never

Procedure for release of new TCKs works well now and efficient
• But should not be abused! 

FEST is an indispensible tool for testing/debugging/validating HLT
• Make sure it satisfies needs for future
• More HLT real-time diagnostics tools to be developed

Effect of L0 derandomizer and trains….
• No proper emulation for Beetle and we are forced to only exploit half of buffer
• We currently accept all crossings…
 Filling scheme for autumn  25% L0 deadtime
R. Jacobsson
28

Two possibilities to reduce luminosity per bunch
• Back-off on beta*
 Requires several days – week of machine commissioning
• Collision offset in the vertical plane
 Beam-beam interaction with an offset between the beams can result in an emittance growth
 Follow ongoing tests for Alice to reduce luminosity by a factor 30
• Hoped to detailed news from Alice beam offset tests
 Attempt during end-of-fill study this morning but not completed due to control software
 HOT NEWS while I was in the plane: Seems to work fine
R. Jacobsson
29

Daily Run Meeting ~30 minutes
• EVO everyday
• Chaired by Run Chief
• 24h summary with Run Summary attached to the agenda (Lumi, Efficiency, Beam, Background)
• LHC status and plan
• Round table where experts comment on problems
• Internal Plan of the Day

Minutes from Run Meeting and other postings on Run News serve two purposes
• Expert follow up on problem
• Inform collaboration about daily operation – strive for public language in 24h summary and plan
for next 24h

Improve
• Systematic follow up on data quality
• Check lists
• Checkup on Piquet routines
• Invite more Run Chiefs – already discussed with several candidates
• Meetings three days a week when we are ready for this (Monday – Wednesday – Friday)
 Requires more discipline from piquets and efficient exchange of information directly with involved people
• Synchronize piquets take-over with overlaps
R. Jacobsson
30

http://lhcbproject.web.cern.ch/lhcbproject/online/comet/Online/
• (“Status” from LHCb welcome page)
R. Jacobsson
31
1.Shifter Intro
1.
Introduction
2.
Pit Area 8 – LHCb
3.
Control Room
4.
Cavern
2. SLIMOS
5.
Access to Cavern
6.
Shift Organization
7.
Safety
8.
Calling Experts
9.
Coordinators
1. Introduction
2.
SLIMOS
3. Basic
Concepts
for LHCb
Shifters
4.
Running
LHCb
5.
Data
Manager
6. Shift Leader
for Shifters
1.
Role of SLIMOS
2.
Safety Systems
3.
Level 3 Alarms
4.
L3 Alarm and fire brigade
3. Coordinate Systems
5.
6.
7.
8.
10.Experts
9.
3. Basic Concepts
1.
Introduction
2.
LHCb at LHC
L3 and SLIMOS duties
4.
Emergency Panel
5.
4. Running LHCb
1.
Introduction
5. Data Manager
2.
Running LHCb
1.
Introduction
6. Shift Leader
3.
Operational Phases
2.
Data Manager Duties
1.
Introduction
4.
Shifter Interfaces
3.
Quality Checking
2.
SL Duties
5.
LHC Page 1
4.
Problem Reporting
3.
Golden Rules
6.
LHCb Overview
5.
Data Monitoring
4.
Operational Procedure
7.
LHC/LHC Op.View
6.
Histogram Presenter
5.
Mode Handshakes
8.
Intensity&Luminosity
7.
Trend Presenter
6.
Cold Start
Event Display
7.
LHCb State Control
Run & File Status
8.
Clock Switching
10.Problem Database
9.
End of Fill
Insertion Region 8
Injection
Detector Safety System
6. Filling Schemes
DSS Panel
Contacts
7.
8.
Collimation
Beam Dump
11.Online computers
9.
Fill Procedure
12.Shifter Duties
10.Crossing Angle
13.Shift Logbook
9.
Backgrounds
8.
11.Timing
14.LHCb Status
10.LHCb Beam
Dumps
9.
12.LHCb Detector
15.LHC
Status
11.Beam Pos.Monitor
13.Readout System
16.LHC
Logbook
11.Logbook
12.Timing
14.Trigger
17.Documentation
11.Run Control
13.Trigger Rates
15.Luminosity
18.Conclusion
10.Machine Development
12.System Allocation
14.Run Change
16.Backgrounds
13.System Configuration
15.Run Performance
14.Run/File Status
R. Jacobsson
1
16.Experiment Status
15.Farm
17.Magnet
R. Jacobsson
R. Jacobsson

Shifter Training
18.Cavern
Node Status
16.Dead Time
Radiation
17.Error
1
1
18.Slow Control
R. Jacobsson
R. Jacobsson
19.Access
1
1
• Completely overhauled and updated training slides
• Refrsher course now as well
 With EVO in future
 Invite piquets to go through Shifter Histograms with Data Managers
• Insist more on shifts with already experienced shifters as newcomer
1
R. Jacobsson
R. Jacobsson
32

In my view the experiment consists of sort of three levels of activities:
1. Maintaining and developing all from electronics to the last bit of software in the common
interest of the experiment.
2. Producing the data we use for analysis, basically carried out by four types of shifters:
Shift Leader, Data Manager, Production Manager, Data Quality checker
3. Consuming the data and producing physics results
• Activity 1 and 2 should not be compared and counted in the same "sum“
• Activities 2 and 3 are instead coupled:
 "I contribute to produce the data that I analyze"
• Huge benefit taking regular shifts, learn about data quality, and have the opportunity to
discuss and exchange information about problems met in your analysis of real data

Shifter situation “Far from satisfactory” – What does it mean?
• Means that “the situation is vital to improve” by:
Maintaining current commitments
2. And making an additional effort which is relatively modest spread across all of LHCb!
1.
R. Jacobsson
33

Shifter model based on the idea of “volunteers”
• Not synonymous with “offering a favour” to people heavily involved in operating LHCb
• Based on the idea of feeling responsible, in particular for your own data
• We need people interested in learning about the detectors and data they are hopefully
going to use
 Each group would normally find the representatives themselves, also to a large extent meaning an
Experiment Link Person
• Why this model?
 Because we don’t have neither the tools, nor the time and strength to be bureaucratic
• However, up to now not sufficiently clear on the size of the required commitments

November 2009 – July 2010
#/24h
• Active Shift Leaders
• Active Data Managers:
• Active Production Managers:
• Active Data Quality Checkers:
Total
3
3
2
1
9
#Shifters
30
61 (- Dec)
27
11
129
#Shifts
660
564
408
13
1768
R. Jacobsson
34
60
November 2009 – July 2010
50
Authors
40
Shifters
30
20
10
400.0%
0
350.0%
Current Normalized Contribution
300.0%
250.0%
200.0%
150.0%
100.0%
50.0%
0.0%
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
Institute
R. Jacobsson
35
700
November 2009 – July 2010
600
Data Quality
500
Prod. Manager
Data managers
400
Shift Leaders
300
200
100
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
Institute
R. Jacobsson
36
Nov 09 – July 10 (3 / 24h)
Nov 09 – July 10 (3 / 24h)
30
25
20
15
10
5
0
60
50
40
30
20
10
0
Da
ta
Ma
n…
Shift
Lead
ers
Nov 09 – July 10 (1 / 24h)
Nov 09 – Dec 10 (2 / 24h)
60
50
40
30
20
10
0
Produc
tion
Manag
ers
25
20
15
10
5
0
Data Quality
Shifters
R. Jacobsson
37

Assuming
• Perfect uniform availability (no exclusion of weekends, nights)
• Immediate replacement of people leaving and no lag in training new people
“Theoretical” shift contribution per month (0.53/authormonth)
30.0
Normalized shifts per month
25.0
20.0
15.0
10.0
5.0
0.0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
R. Jacobsson
38

“One available shifter taking 4-6 shifts every 2 months per 3 authors”,

Recruited 2010-2011 from:
1.
2.
3.
4.

Shift Leaders: A pool of 50-100 people with experience in commissioning/operation of LHCb
Data Managers: All authors making physics analysis
Production Managers: A pool of 50-100 people with experience with analysis on Grid
Data Quality: All authors making physics analysis
Change in subdetector piquets  coverage being increasingly assured by nonexperts instead of experts
 Should free the people with the ideal profile for shift leader shifts this year
R. Jacobsson
39
R. Jacobsson
40

Experiment Conditions are good, machine is very clean

Data Quality
• Requires fast reaction time and feed-back/good communication with offline
• Establish the habit and routine
• No Data Quality offline now for two weeks!

Find appropriate compromise for trigger is of absolute highest priority and solve
technical issues
• Dedicate time/luminosity intelligently now

System stability, individually is good but multiplied with the number….
• Sensibilize everybody to react to any anomaly and act quickly
• Big step from 10 years of MC to real data

Masochistic exercise to produce shifter bstatistics
• Need improvements and functions in ShiftDB tool

Great team work, spirit and perseverance
• Join us to produce Your data!

LHC bunch evolution til end of August
• Up to 24 bunches with 16 colliding in LHCb = 1.55 MJ/beam
R. Jacobsson
41


Regular opportunities for access up to now
OT tracker opened 3 times to change FE box
• Impact on data quality

Procedure for filing access request and handling works well
• Taken care of very well by shifter, Run Chiefs and Access Piquet/RPs

Issue:
• Still no instruments for radioactivity in magnetic field!
 Complicates access where in principle magnet could be left on
R. Jacobsson
42