Document 7686520

Download Report

Transcript Document 7686520

COOL deployment in ATLAS
Richard Hawkings (CERN)
LCG COOL meeting, 03/7/06
 A brief overview to give flavour of COOL activities in ATLAS





COOL usage online and offline
Current COOL database instances
Conditions database deployment model
Testing - from online to Tier-n
Some ATLAS feedback
3rd July 2006
Richard Hawkings
1
COOL usage in ATLAS
 COOL is now widely deployed as ATLAS conditions database solution
 Only some legacy 2004 combined testbeam data in Lisbon MySQL - migrating…
 COOL usage in online software
 CDI - interface between online distributed information system and COOL
 Archiving IS ‘datapoints’ into COOL - track history - run parameters, status, monitoring
 PVSS2COOL - application for copying selected DCS data from PVSS Oracle
archive into COOL, for use in offline/external analysis
 Interfaces between TDAQ ‘OKS’ configuration database and COOL (oks2cool)
 Direct use of COOL and CORAL APIs in subdetector configuration code
 COOL in offline software (Athena)
 Fully integrated for reading (DCS, calibration, …) and calibration data writing
 Using inline data payloads (including CLOBs), and COOL -> ref to POOL files
 Supporting tools developed:
 COOL_IO - interface to text and ROOT files, new AtlCoolCopy tool
 Use of PyCoolConsole and PyCoolCopy
 Use of Torre’s web browser, new Lisbon and Orsay JAVA/Athena-plugin browsers
3rd July 2006
Richard Hawkings
2
COOL database instances in ATLAS
 Now have around 25 GB of COOL data on production RAC
 Most ATLAS subdetectors have something, largest volume from DCS (PVSS)
 Current effort to understand how best to use PVSS smoothing/filtering techniques to
reduce data volume without reducing information content
 Data split between several database instances for offlline production (data
challenges / physics analysis, …), hardware commissioning and 2004 combined
testbeam
 Mostly using COOL 1.3, but some COOL 1.2 data from ID cosmic tests
 Gymnastics using replication to SQLite files to allow this to be read in offline (COOL1.3)
 Some of this data is replicated nightly out of Oracle to SQLite files
 6 MB of COOL SQLite file data used in offline software simulation/reconstruction
 These files are included in release ‘kits’ shipped to outside locations - for this ‘statically’
replicated data, no need to access CERN central Oracle servers from outside world
 ATLAS COOL replica from ID cosmics is 350 MB SQLite file - still works fine
 (but takes 10-15 minutes to produce replica using C++ version of PyCoolCopy)
3rd July 2006
Richard Hawkings
3
Conditions data deployment model
Computer centre
Outside world
Calib. updates
ATLAS pit
Streams
replication
Online /
PVSS /
HLT farm
Dedicated
10Gbit link
Online
OracleDB
Offline
Oracle master
CondDB
Streams
replication
ATCN/CERN GPN
gateway
3rd July 2006
Richard Hawkings
Tier-0
SQLite
replication
Tier-1
replica
Tier-1
replica
Tier-0 farm
At present, all data on ATLAS RAC
Introduce separate online server soon,
once tests are complete
4
Conditions database testing
 Tests of online database server
 Using COOL verification client from David Front, started writing data from pit to
online Oracle server - using COOL as an ‘example application’ (other online apps)
 Scale: 500 COOL folders, 200 channels/folder, 100 bytes/channel, every 5 minutes
 3 GB/day DCS-type load (would come from PVSS Oracle archive via PVSS2COOL)
 Working - will add Oracle Streams to ‘offline’ server soon
 Working towards correct network config to bring online Oracle server into
production (will be on private ATCN network, not visible on CERN GPN)
 Replication for HLT - a challenge
 Online HLT farm (level 2 and event filter) needs to read 10-100 MB of COOL
conditions data at start of fill to each of O(10000) processes, as fast as possible
 Possible solutions under consideration (work getting underway):




Replication of data for required run to SQLite file which is distributed to all hosts
Replication into MySQL slave database servers on each HLT rack fileserver
Running squid proxies on each fileserver and using Frontier (same data for each)
Using a more specialised DBProxy that understands e.g. MySQL protocol and can even
do multicast to a set of nodes (worries about local network bandwidth to HLT nodes)
3rd July 2006
Richard Hawkings
5
Conditions database testing, continued
 Replication for Tier-0
 Tier-0 does prompt reconstruction, run-by-run, jobs start incoherently
 Several hours per job, data spanning O(1 minute), 1000s jobs in parallel
 Easiest solution is to extract all COOL data needed for each run (O(10-100 MB?))
once into SQLite file, distribute that to worker nodes
 Solution with SQLite files (and POOL payload data) on replicated AFS being tested now
 Replication outside CERN
 Reprocessing of RAW data will be done at Tier-1s (Tier-0 busy with new data)
 Need replication of all COOL data needed for offline reconstruction
 Use Oracle Streams replication - being tested by 3D project ‘throughput phase’
 Once done, do some dedicated ATLAS tests (as in online -> offline), then production
 Tier-2s and beyond need subsets (some folders) of conditions data
 Analysis, Monte Carlo simulation, calibration tasks, …
 Either use COOL API-based dynamic replication to MySQL servers in Tier-2s, or
Frontier web-cache-based replication from Tier-1 Oracle
 With squids at Tier-1s and Tier-2s, need to solve stale-cache problems (by policy?)
 First plans for testing this being made - David Front, Argonne
3rd July 2006
Richard Hawkings
6
Some ATLAS feedback on COOL
 COOL is (at last) being heavily used, both online and offline
 Seems to work well, so far so good…
 Online applications are stressing the performance, offline less so
 The ability to switch between Oracle, (MySQL) and SQLite is very useful
 Commonly-heard comments from subdetector users
 Why can’t we have payload queries?
 This causes people to think about reinventing COOL, or accessing COOL data tables
directly, e.g. via CORAL
 We would like to have COOL tables holding foreign keys to other tables
 Want to do COOL queries that include the ‘join’ to the payload data
 Can emulate with 2-step COOL+CORAL, but not efficient for bulk
 A headache for replication …
 COOL is too slow - e.g. in multichannel bulk insert
 We need a better browser (one that can handle large amounts of data)
 Why can’t COOL 1.3 read COOL 1.2 schema?
 I know the ‘COOL-team’ answers to these questions, but still useful to give
them here - feedback from the end-users!
3rd July 2006
Richard Hawkings
7