Document 7413608

Download Report

Transcript Document 7413608

Machine Availability and System
Reliability at RHIC
Fulvia Pilat
WAO-07 Trieste, September 24-28 2007
RHIC performance
Delivered luminosity increased by >2 orders of magnitude in 6 years.
Delivered per run
to PHENIX.
FOM=LP4
WAO-07
Fulvia Pilat
Enhanced Design Parameters
Calendar time in store affects ability to project performance.
WAO-07
Fulvia Pilat
Enhanced Design Parameters
Parameter
(~2009)
unit
Achieved
Enhanced
design
GeV/n
100
100
No of bunches
…
103
111
Bunch intensity
109
1.1
1.0
1026cm-2s1
12
8
GeV
100
100 (250)
No of bunches
…
111
111
Bunch intensity
1011
1.4
2.0
1030cm-2s1
20
60 (150)
%
60
Au-Au operations
Energy
Average L
p- p operations
Energy
Average L
Polarization P
WAO-07
Fulvia Pilat
70
goal
exceeded
3x
+10%
Enhanced and RHIC-II luminosity
Electron or
Stochastic
cooling
WAO-07
Fulvia Pilat
Time at store: trend and goal
Trend
Goal:
back to mid 50% in Run-8
60% time at store in Run-9
WAO-07
Fulvia Pilat
Outline
Operation stats, performance
Factors determining time at store
Machine development (short term investment)
APEX: Accelerator Physics EXperiments program
(longer term investment)
Scheduled Maintenance talk Sampson today
Machine set-up
Systems downtime and failure
Mode of operation: “pushing the envelope”
WAO-07
Fulvia Pilat
RHIC Retreat 2007-July 16-17
Session on Availability and Reliabiliy
11:00
11:15
11:40
12:00
12:20
(15)
(25)
(20)
(20)
(10)
Pilat
Ingrassia
Kling
Sampson
Introduction
Operations and Uptime
Turn-around time
Maintenance models, organization
Discussion
2:00
2:15
2:30
3:00
3:30
4:30
4:45
5:00
5:15
5:30
5:45
(15)
(15)
(30)
(30)
(30)
(15)
(15)
(15)
(15)
(15)
(15)
Ahrens
Zhang, Wu
Bruno
Sandberg
Zaltsman
Oerter
Morris
Reich
Russo
Tuozzolo
Mapes
RHIC abort system
Pulsed power systems
Power supplies
Electrical systems
RF: RHIC and injectors
Controls, hardware
Controls, software
Access controls
BPM, IPM, BBQ in operations
Cryogenic system
Vacuum systems
WAO-07
Fulvia Pilat
60% goal
M
M
M
M
M
M
WAO-07
Fulvia Pilat
WAO-07
Fulvia Pilat
Failure Flavors
Charged – threshold for log is 6 minutes or more

Failure hours that impact the program -- charged to
one OR MORE systems during a failure period.
Simultaneous failures result in charged hours less than
actual hours
Actual – Severe

Duration of a failure that impacts the program often
LONGER than the hours charged.
Actual – Mild

Failure that does not impact the program e.g. 1 of 10
AGS Rf Stations trip. Hours recorded but not
“charged”
Resets – threshold for log is less than 6 minutes
WAO-07
Fulvia Pilat
“Top 10” Failures by Group & by Run
PS_RHIC
Rf
Cryogenic
PulsedPower
ElectricalService
Controls
ES&FD_AtR&Experiment
AccessControls
QuenchProtection
Services Water
HumanError
FY07
RANK
1
2
3
4
5
6
7
8
9
10
11
System
PS_RHIC
Rf
PulsedPower
Controls
ES&FD_AtR&Experiment
AccessControls
FY07
HOURS
186.8
106.9
92.6
58.8
58.7
39.1
38.1
36.1
31.6
23.1
23
FY06
RANK
1
6
FY06
HOURS
94.6
39.9
8
7
2
4
5
33
34.2
67.2
49
43.9
9
29.9
FY05
RANK
1
3
7
5
FY05
HOURS
78.15
67.8
41
43.3
2
4
9
6
11
8
69.2
46.5
32
42.6
22.8
32.5
FY04
RANK
2
3
5
8
6
1
9
10
7
11
FY04
HOURS
85.5
79.6
66
32.1
34.8
134.9
30
21.7
34
20.4
Actual
Actual
Resets (h)
Ratio
Charged (h) Severe (h) Mild (h) Resets (#) @ 3 min per Actual/Charged
187
236
5
15
1
1.26
107
216
272
44
2
2.02
59
80
15
70
4
1.36
39
70
39
303
15
1.79
38
53
51
33
2
1.39
36
40
25
0
1.11
WAO-07
Fulvia Pilat~0
WAO-07
Fulvia Pilat
WAO-07
Fulvia Pilat
WAO-07
Fulvia Pilat
WAO-07
Fulvia Pilat
Operations Planned Improvements
Multiple Failure, often simultaneous CAS (tech
support on shift – 2 now) needs help
Train Siemens Watch for LOTO

Together with MCR Operators they can perform LOTO when
CAS is busy
Get Operators into the field

Train Operators to (only) reset “accelerator” power supplies
OC instructed to call in help for CAS when CAS is
making a repair AND another system goes down.
OC instructed to call in help from two groups with
knowledge of the equipment when the cause of a
problem is not clear
WAO-07
Fulvia Pilat
Outline
Operation stats, performance
Factors determining time at store
Machine development (short term investment)
APEX: Accelerator Physics EXperiments program
(longer term investment)
Scheduled Maintenance talk Sampson today
Machine set-up
Systems downtime and failure
Mode of operation: “pushing the envelope”
WAO-07
Fulvia Pilat
Turn around time
WAO-07
Fulvia Pilat
WAO-07
Fulvia Pilat
WAO-07
Fulvia Pilat
Outline
Operation stats, performance
Factors determining time at store
Machine development (short term investment)
APEX: Accelerator Physics EXperiments program
(longer term investment)
Scheduled Maintenance talk Sampson today
Machine set-up
Systems downtime and failure
Mode of operation: “pushing the envelope”
WAO-07
Fulvia Pilat
Input from systems
Maintenance, set-up and turn-around
time, modes of operations all affect the
availability but the main factor is system
failure. In Retreat presentations please focus
on the reliability of your system and think
critically about ways to improve it. I would ask
each of you to discuss a plan - including
timelines and necessary funding - to increase
your system reliability. This is an important
input towards an integrated plan to improve
time at store to be discussed at the Retreat
and implemented thereafter.
WAO-07
Fulvia Pilat
After the Retreat reliability
Review Retreat information on
operations, maintenance and systems
Prioritize actions – especially systems
improvements for reliability
Analyze aging infrastructure, systems
Use the recently revisited “Trouble
Report Committee” as input and advice
on system reliability
WAO-07
Fulvia Pilat
RHIC PS Performance Stats
Average RHIC PS Failure Hours/Week
MTBF of RHIC due to any PS Failure
MTBF of an individual PS Failure
WAO-07
Fulvia Pilat
Leading Causes of PS Down Time in Hours
IR - Dynapowers
42.4
Main p.s.’s
36
IR p.s.’s – SCE 150’s
26.7
6000A Quench Switches
20.6
IR p.s.’s – SCE 300’s
16.2
Quench Detectors
14.6
Node Cards
6
Correctors
5.8 success story
Ground Fault
5.1
QPA’s
4.55
New Sextupole p.s.’s
4.5
Bypass chassis
0.3
WAO-07
Fulvia Pilat
Power Supply System Priorities
Bipolar 150A, 300A p.s.’s Phase 1
 QPA’ s (Quench protection assemblies)
 Main dipole and quadrupole PS
 Investigate yellow quad bus ground fault
 Improving Dynapower PS cooling
 Quench detector cleaning and fan
replacements
 Air Conditioning (for air quality and
temperature)

WAO-07
Fulvia Pilat
Expected MTBF in Run 8?
Run 5 = 30.79 hours
Run 7 = 14.75 hours
Remove 3 major problems from Run 7 = 40 hours
WAO-07
Fulvia Pilat
Electrical Systems
* excluding
arc flash event
WAO-07
Fulvia Pilat
Run
Power
System
Failure
Hrs
Total
Failure
Hrs
5
15
694
6
26*
700
7
45
881
Most Significant Causes of ES Downtime-Run # 7
Location
Hours
Events
Equipment
1004 A
18.3
2
Switch & 208 Volt
CB
1000 P
15.3
multiple
914
13.8
1
Switch
929
9.5
1
Cooling Tower Fan
Motor
Switch & Circuit
Breaker
4 areas responsible for 90% of downtime in Run-7
WAO-07
Fulvia Pilat
Electrical systems: Steps being taken
•
18 Electricians Assigned to C-AD this Summer vs. 6 last
year
•
On going Thermal Inspection of Switches
•
Use of torque Wrenches Instituted
•
Better understanding of Thermal Effects
• Replace 1000 P 13.8 kV Switches
• Replace Trip units 1000 P Substation
• Replace Switchgear in 914
• Maintenance BMMPS CB’s
WAO-07
Fulvia Pilat
Electrical systems: Steps being taken- cont’
•Continuation of Arc Flash Calculations
•Connecting RHIC Bard A/C Units through
Isolation Transformers
•21 New Alcove UPS’ s
•8 year Program to improve Electrical
Infrastructure ($ 9 million)
•Open Slot for New Power Engineer
WAO-07
Fulvia Pilat
ES: Top Concerns from last Year’s Retreat
1. Power Dips 8 in Run-6, 6 in Run-7
2. Response to 1006 Arc Flashalmost done
3. 1004 B CB Problem
4. AMMPS Transformer Replacement
Additional Steps to Improve Availability
•Increase the number of assigned electricians
•Centralize Spare Parts Location
•Increase Spares Inventory
 this shutdown
WAO-07
Fulvia Pilat
RF system: Performance
Number of systems:
Booster: 2 AGS: 11
RHIC: 16
Charged failure hours:
Booster: 7 AGS: 39
RHIC: 65
Actual failure hours:
Severe: 216
Mild: 272
Factor affecting the system performance in RHIC RF: beam
loading (more than double total intensity than in Run-4).
(Example: large debunching at rebucketing time, losses and beam
dumps). Took time to understand and mitigate the beam loading
effects.
WAO-07
Fulvia Pilat
07 Gold Bunch Merge
WAO-07
Fulvia Pilat
RF - IMPROVEMENTS
Complete system upgrade of low level RF in
AGS and RHIC (unified hardware and software,
modern system, better ring-2-ring synchro)
Window comparators to provide fast shutdown for
storage systems
New beam permit chassis to speed up the response
Low power circulators
New tubes
Ongoing work on window for storage system
Continue development of ferrite tuner for acceleration
system
WAO-07
Fulvia Pilat
Abort kickers - Failure Modes
Prefires



One module discharges unilaterally
The other four fire in response ASAP
Not synchronized with abort gap
Unconditioned Triggers


All five modules discharge together
Not synchronized with the abort gap
Spontaneous Capacitor Discharges


As if a “stop charge” occurred with no associated
trigger – stop charge turns off the charging
mechanism
Damaging if not noticed
WAO-07
Fulvia Pilat
RHIC abort kickers pre-fires in Run-7 broken
out by ring and by module
Run 7 Prefires
12 yellow
blue
yellow
18 blue
broken down by PFN module involved
4
3
2
time (calendar)
WAO-07
Fulvia Pilat
22-Jun
12-Jun
2-Jun
23-May
13-May
3-May
23-Apr
13-Apr
3-Apr
24-Mar
14-Mar
1
4-Mar
module involved
5
Abort kickers: observations, improvements
•
•
•
•
B2 and B4 use thyratron CX1575C. They will be replaced
by CX3575C.
Y5 had 7 pre-fire at beginning, but stayed clean after 4/4.
Y1 stayed clean during entire RUN
Y5, B2, and B4 had 7 pre-fires each, contributed to 70%
of total pre-fires.
What may help?
•
•
•
•
Condition high voltage system at higher voltage than
operation level (Engineering control? Routine procedure?)
Keep modulators on
Pre-conditioning before beam operation
Keep operating voltage as low as possible
WAO-07
Fulvia Pilat
RHIC abort kickers: R&D
•
•
•
Charge up high voltage modulators on
command 4ms before beam abort to
avoid pre-fire during long DC hold up
A preliminary study was performed on
2003
Project cost over $2 million based on
2003 budget estimate.
WAO-07
Fulvia Pilat
Cryo system: Phase III Upgrade
New gas bearing turbine for energy removal at the cold end
of the refrigerator (Run-7).
New high efficiency vertical heat exchanger system at the
cold end of refrigerator (Run-7).
Re-configured the cold helium supply to the accelerator rings to
eliminate the use of the cold circulators (Run-6).
Modified Cold Box 5 to reduce Helium inventory, improve
insulation, and reduce flow restrictions (Run-6).
Results:
Saved an additional 1.0 MW of compressor power in Run-6.
Reduced the liquid inventory in the refrigerator.
Additional 1.0 MW achieved during Run-7.
Reduced number of running compressors by 4 FS and 1 SS.
WAO-07
Fulvia Pilat
RHIC POWER HISTORY
WAO-07
Fulvia Pilat
Cryo Stumbling at the Start of Run-7:
HX OBSTRUCTION
Oil contamination in HX-20 from Rotoflow oil bearing
expanders
• Oil Crossover Happens During Start-up (Warm)
+ LN2 contamination on HX-20
• Extended 80K operations contaminated GHe in RHIC
• During cool-down 80K GHe returned to the refrigerator
• Poorly seated crossover valve (H409M) between CR line and
Expander 6 outlet allowed LN2 to collect on HX-20
= High Recooler Return Pressure
resulting in (too) high magnet temperatures.
WAO-07
Fulvia Pilat
He Flow Rate
HX20 DP
Blue 4.5K
Wave Starts
Blue recooler
Wave Starts
Blue ready Yellow 4.5K
Yellow 45K Wave Starts
Wave Starts
WAO-07
Fulvia Pilat
Warm-up Attempts
To Clear Blockage
Outline
Operation stats, performance
Factors determining time at store
Machine development (short term investment)
APEX: Accelerator Physics EXperiments program
(longer term investment)
Scheduled Maintenance talk Sampson today
Machine set-up
Systems downtime and failure
Mode of operation: “pushing the envelope”
WAO-07
Fulvia Pilat
Running for high availability
Example: Low energy copper run (Run-5)
2 weeks of physics: choice to limit set-up time and downtime
Machine parameters
(almost the same #bunches 37-41, transmission HE~95%, LE ~ 85-92 %,
same transition set-up)
bunch intensity: HE 41 x 4.5e9 LE: 37 x3.8e9
beta*
HE: 0.85m
LE: 3m
energy
HE: 100 GeV/u LE: 31.2 GeV/u
Reproducibility: minimized time tuning time
Minimized time between stores
 Longer lumi-lifetime
WAO-07
Fulvia Pilat
Cu Run-5 high-energy run
b*=0.85m
(0.89m)
time at store:
52%
power dip+
access
access +
snowstorm
WAO-07
Fulvia Pilat
access +
equipment
failures
b*=2.6m
b*=3.0m
Cu Run-5 low energy run
time at store:
74%
WAO-07
Fulvia Pilat
Beam experiemnts
Phobos 0 & polarity
access
injection
Cu Run-5 LE (week 2 – stores)
WAO-07
Fulvia Pilat
Optimization of performance and availability
Projected performance and run plans must
include optimization of the time at store if we want to
achieve the 60% goal
Limit the number of new developments during the
run preparation
Stop or reduce machine developments during
physics running once potential for returns is low
Optimal choice of lattice, beta*, bunch intensity
and number of bunches (with parameters evolution
during the run, more conservative or aggressive, based
of optimization of delivered luminosity and time at
store)
WAO-07
Fulvia Pilat
Conclusions
Analyzed machine availability at RHIC
Identified the main factors determining
the time at store
Have a plan towards increase availability
to 60% in ~2 RHIC runs
….will report at the next WAO !
WAO-07
Fulvia Pilat