Atlas and the Large Hadron Collider

Download Report

Transcript Atlas and the Large Hadron Collider

HEP
Experience and plans from
running the (SCT) DAQ
•
•
•
•
•
•
Setting things up
Calibration mode operations
Physics mode operation
Transitions
Recovery
Modifying configurations
4 July 2006
Alan Barr - SCT DAQ
What to do
better?
1
The good news!
4 July 2006
Alan Barr - SCT DAQ
2
Overview
• Almost all things were possible with
enough expert hands at the wheel
• Focus is on what could be improved to
make life easier, and to reduce downtime
4 July 2006
Alan Barr - SCT DAQ
3
Setting things up
• Various mapping scans now in place: streamline
checking for mapping errors
– TX  RX (autoconfig)
– TX  Module (Trim -> TV50 RMS)
– RX  DCS (Hard reset scan)
• Perhaps a case for a 1-Pt QCAL scan test at
“physics threshold” – look at trim?
• Opto tuning – improvements to algorithms
– Hard reset between bins for TX current
– Rx Threshold algorithm improved
– Still some worries over marginal TX.
4 July 2006
Alan Barr - SCT DAQ
4
Timing-in
• Good to ~few ns gross timing done by
looking for top/bottom of module coincidence
– Combining statistics between modules
– Relatively few stats required
• (few hundred cosmics)
– Good agreement with “offline monitoring”
• Module  module done by dead
reckoning (at SR1)
4 July 2006
Alan Barr - SCT DAQ
5
Mark/Space Ratio
(Clock jitter in TX)
• Test done for several rods in SR1
sector – enough stats to test algorithm
• Generally seems to work
– Slow
– Some funny channels
– Investigations into BOC plugins proceeding
• Some improvements in algorithm made
to attempt to flag outlying cases
4 July 2006
Alan Barr - SCT DAQ
6
Calibration mode
• Can “do the job”
– New response curve taken and used for SR1
cosmics tests
• Less fault tolerant to errors from modules
– Still aborts after single module failure
– To be improved?
• Would benefit from tools to do statistics on
different modules
– Nasty fudge used to combine data for timing scan
– Need to be able to compare against reference set
rather than just labelling “good”/”bad” modules
4 July 2006
Alan Barr - SCT DAQ
7
Things to speed up calibration?
• Diagnostics from DSP greatly improved
– Pick out which modules are causing problems
• Improved fault tolerance?
– Decrease wasted down-time.
• Has histogram read-out slowed down?
– If so where? ROD? Network? NFS demon?
• Quick turn-around with updated configuration (see
later)
• Could pack trims better to reduce configuration
loading to ROD by factor of ~2.
– Simultaneous change needed to DSP code and to SctRodDaq
4 July 2006
Alan Barr - SCT DAQ
8
Physics mode
• Standard operation of SCT HW/SW eventually good
• Couldn’t read out histograms during physics mode
when running
– And slow when paused
• Runs able to continue for several hours.
• What stopped “good” runs?
– Crash of ROS/FILAR
• Formatter allowing “illegal” data through?
– LV trips from SCT modules
• Reduce frequency
• Recover
– Occasionally loss of synchronisation with TRT?
• Identify
• Add counter resets
4 July 2006
Alan Barr - SCT DAQ
9
Non-expert operation?
• Work to be done in:
–
–
–
–
–
–
–
Detector safety (DCS)
Module initialisation
Module recovery
Information in error messages
Getting configurations to offline
Feedback on synchronisation
Other “error” states
• Recovery time after power failure was I think
~ 8 hours?
4 July 2006
Alan Barr - SCT DAQ
10
Transitions
• Physics  Calibration transition ok for
SCT alone
• Cal. couldn’t be done in configuration
containing TRT
• Taking out subdetector involved lots of
different resources (EventBuilder,
ROS, RODs, …)
• I thinkTDAQ has a way to streamline
this?
4 July 2006
Alan Barr - SCT DAQ
11
Configuration tools
• SR1 highlighted the need for tools to
compare, update and merge
configurations.
• Chris and Bruce produced/working on
some tools for xml.
• ‘Final’ database configuration solution
now has same functionality as xml.
• GUI display of configuration
parameters?
4 July 2006
Alan Barr - SCT DAQ
12
Headaches
• ROS machine failure
– Required reboot, remount, …
– Possibly linked to invalid data coming from ROD
– Some ideas as to what might be the cause…
• Occasional initialisation problems in SctRodDaq
start-up
– Not really investigated – replace delays in synch-init?
• “Full” ROD (inc. s-link) reset seems to require
crate reboot (slow!)
– What isn’t reset on “reset”?
• Module start up needs Hard Reset, and then send
configs automated.
– Problem: DCS hand-shaking
• Automation of module power cycle required
(from DAQ)
4 July 2006
Alan Barr - SCT DAQ
13
Issues for the pit
•
•
•
•
•
Configuration (inc. DCS) still a big area
First large-scale multi-crate test
Parallelism needed for analysis
Much harder to reset off-detector hardware!
Auto recovery required for
•
•
•
•
Can we still use NFS for data transfer?
Auto-archival to CASTOR through firewall?
Set up of trigger chain (resets etc) through LTPs.
ROD monitoring not physics-ready
– Modules
– OPTO
– DAQ systems
4 July 2006
Alan Barr - SCT DAQ
14
If things don’t work
• Don’t just grumble to yourself…
• Grumble to the experts!
– Wiki page:
– http://www.hep.phy.cam.ac.uk/daqbin/wiki.cgi/UserStories
– Email address:
– [email protected]
• We rarely bite and will endeavour to
help!
4 July 2006
Alan Barr - SCT DAQ
15
The good news!
4 July 2006
Alan Barr - SCT DAQ
16