Transcript Document
Automated software packaging and installation for the ATLAS experiment Simon George Royal Holloway, University of London Christian Arnault, LAL Orsay; Michael Gardner, RHUL; Roger Jones, University of Lancaster; Saul Youssef, Boston University [email protected] e-Science All Hands Meeting Nottingham 2-4 September 2003 ATLASexperiment.org Introduction This talk is about packaging, distribution and installation for a large software project It is essential because The project computing resources are widely distributed around 140 institutes, who all want to use the software We want to be able to use Grid resources that do not have locally managed installations of the software Our working model also requires the ability to deploy user code that is not part of an official distribution I’ll describe the process developed and the tools used. Wed 03Sep03 Simon George RHUL 2 Contents ATLAS and its software Requirements Tools and formats Meta data Naming conventions Creating and installing the kits Conclusions and outlook Wed 03Sep03 Simon George RHUL 3 The ATLAS Experiment A Particle Physics experiment at the Large Hadron Collider, CERN 1600 physicists, 140 institutes, 6 continents Studies include • search for the origin of mass • excess of matter over antimatter in the universe • evidence for Supersymmetry • other new physics Wed 03Sep03 Simon George RHUL 4 ATLAS software suite Simulation, data processing and analysis 500 “packages”, 50 external, inter-dependent. 100s of developers and 1000s of users in 140 institutes One release build is 2.5 GB of files It takes 10 hours to build Build types and frequencies Build configuration permutations Production release 3-4 times per year Developer release every 2-3 weeks Nightly build of snapshot Optimised, debug and sometimes also profile builds. Two platforms (RedHat 7.3 on Intel x86, Solaris 8 on SPARC) One or more compilers (gcc 3.2) Config. management, build and install handled by CMT So not a trivial task to package, distribute and install Wed 03Sep03 Simon George RHUL 5 CMT www.cmtsite.org Configuration management tool CMT Concerned with setting up user’s environment to build and run software Needs help of tools for a large project CMT helps to define and impose conventions For naming packages, files, directories For describing their relationships In other words, package metadata This is the key feature exploited for this project. Useful features to manage sub projects, dependencies A broad user base, especially in Particle Physics and Astronomy experiments. Wed 03Sep03 Simon George RHUL 6 Packaging Requirements Three types of kit required Binary kit • Pre-built executables, libraries and configuration files needed to run the software • Used for data challenges, production, basic users Developer’s kit • Binary kit plus • Headers, libraries and configuration needed to build against it • For developers and most users Full source kit • To rebuild from scratch on binary-incompatible platforms • When local source code browsing is required For each permutation of platform, config, compiler Wed 03Sep03 Simon George RHUL 7 Installation requirements For large facilities: unattended, push button deployment For normal user: relocateable, no root access Automatic configuration Updates, multiple versions Avoid duplication and unnecessary downloads Possibility to take subset of software Self contained, apart from … Prerequisite software: modest list and automatic check Set up user’s environment (e.g. LD_LIBRARY_PATH) Reversible: uninstall Install and work disconnected from network, e.g. install onto a laptop from CDs Wed 03Sep03 Simon George RHUL 8 Constraints ATLAS software is divided into sub-projects Currently ATLAS and Gaudi Could be more in the future, e.g. split ATLAS into simulation and reconstruction Each sub-project consists off many packages External/Internal package distinction Internal packages are developed and managed within the ATLAS software project External packages are the opposite, e.g. software from the Particle Physics community, public domain software or commercial products. Interface packages for externals • Pure metadata package • Actual external sw can be installed anywhere, any way. • Gives it the outside appearance of an internal package Wed 03Sep03 Simon George RHUL 9 Constraints, continued Existing use of CMT Package structure already in place Meta data provided by packages or implied by default policies is already enough for automated packaging. Problems ATLAS software is written by large communities with a mixed level of experience All such software projects will have small flaws introduced in each release These must be worked around when they impact on the packaging. For example, one problem of particular relevance to packaging & installation is cyclic dependencies Wed 03Sep03 Simon George RHUL 10 Packaging: starting point One kit per package Follow existing granularity Separate metadata and payload Two parts to each kit Performed by librarian as integral part of release procedure Distribution by web or distributed filesystem (e.g. AFS) Wed 03Sep03 Simon George RHUL 11 Tools used CMT Pacman Metadata format Tool used to manage kit installation Tar and RPM Define and impose conventions on packages Query the metadata needed for packaging Payload format – the package itself “Deployment tools” shell scripts Construct the kits using CMT Control location of Pacman cache and distribution Post-installation configuration Wed 03Sep03 Simon George RHUL 12 Overview of process and tools Librarian CMT Create kits Web server or AFS Deployment Tools CMT Pacman Local s/w manager Developer Local computers Wed 03Sep03 Simon George RHUL Run software 13 A package manager Packager defines how the software should be fetched, installed, configured, updated, in a “Pacman” file. The package itself can be in any format as that file is separate. A directory of these files is known as a cache, usually available on the web. Pacman tool is used to install the software Pacman’s feature list is a good match to the requirements for installation. Already used by several Particle Physics and GRID projects. Wed 03Sep03 Simon George RHUL http://physics.bu.edu/~youssef/pacman Pacman 14 Package distribution format Tar vs. RPM Both can be made relocateable Feature set Tar has a simple feature set but is complementary to CMT and Pacman RPM overlaps with CMT and Pacman • e.g. RPM also handles dependencies and prerequisites Platforms RPM is only widely used on Linux, while tar is standard on pretty much any Unix Annoyances Default RPM database needs root access to write to it • There are workarounds for this but not pretty Conclusion Decided to use tar but retained RPM as an option Wed 03Sep03 Simon George RHUL 15 Meta data For each package Other packages it uses (dependencies) Location of constituents • • • • External packages Pure meta data “glue” packages Just define paths to export All defined in CMT requirements files Applications and libraries Header files Run time/config files CMT requirements file or implied by default conventions of ATLAS Can be queried through cmt cmt show uses cmt show macro <package>_export_paths Wed 03Sep03 Simon George RHUL 16 Naming and structure Package naming convention Packages in a sub-project • <package name>-<sub-project release id> External packages • <package name>-<version id> These names are used when expressing the inter-package dependencies Directory structure within each kit <sub-project>/<release-id>/InstallArea/ • contains the sub-directories bin, lib, include, share. <sub-project>/<release-id>/<package>/<version>/cmt/ • Contains the configuration management files <external-package>/ • Assumed to have their own internal structure for versions & builds This is designed to support coexistence of: Different versions of every piece of software Different binary versions (platform and build config) Wed 03Sep03 Simon George RHUL 17 Examples CMT requirements file: package ExamplePkgA author A. Person <[email protected]> use ExamplePkgB use ExampleExtPkg library ExamplePkgA *.cxx apply pattern component_library apply pattern declare_runtime Package name and author Inter-package dependencies Instruction to build a library from source files Type of library to build, implies library file names Default location implied Pacman file: description=‘Package ExamplePkgA-01-07-02 in release 6.5.0’ url=‘http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO’ source=‘../dist’ download = { ‘*’:’ExamplePkgA-6.5.0.tar.gz’ } depends = [ ‘ExamplePkgB-6.5.0’, ‘ExampleExtPkg-v1’ ] Wed 03Sep03 Simon George RHUL 18 Creating the kits First, build a release Discover cycles in the dependencies Then, use a feature of CMT to visit every package in a dependency tree and apply a command there cmt broadcast <command> Usage of the script to create a kit: Use a feature of CMT to discover cycles in the dependencies, as these must not be propagated to the kits. Record the output in a file. create_kit.sh –release <release-id> -cycles <file> <target distribution directory> [-rpm] Creates a pacman file and tar file, optional RPM file Finally, there are often a few things to fix by hand specific to each release. Note that CMT itself is included as a kit Wed 03Sep03 Simon George RHUL 19 Installation Performed by site software manager or end user on desktop or laptop Straightforward procedure: Install Pacman, if not already done Install prerequisite software • Currently just RedHat 7.3 o/s, gcc-3.2 and Java SDK 1.4.1 Choose directory for the installation • Probably the same as before Choose which release to install • Available releases are listed on a web page Use Pacman to download, install and configure it, e.g. pacman –get ATLAS:AtlasRelease-6.5.0 • Dependencies followed automatically to get everything you need Optionally, run script to set up a user environment and run a test User configures software in the usual way Just choose release and private working area as normal Run a setup script provided by CMT Wed 03Sep03 Simon George RHUL 20 Conclusions Procedures and tools have been developed for the packaging, distribution and installation of ATLAS software Based on Pacman, CMT, tar/rpm and some shell scripts The basic principles could be applied more generally Using some or all of the same tools It satisfies most of the requirements for run-time and developers’ kits and for installation. Full source kit still to be done. Early adopters have given useful feedback and it is now being imported into Grid production systems Must now move to its use as part of the standard release procedure in ATLAS by December 2003, for our global `Data Challenge 2' Wed 03Sep03 Simon George RHUL 21 Future developments Better handling of prerequisite software and platform compatibility checks EDG WP4 configuration management task Potential to work with an installation on demand mechanism for GRID farms LCG/EDG/iVDGL GLUE Meta packaging proposal for Grid middleware and applications, O. Barring et al. Pacman Wed 03Sep03 version 3 Simon George RHUL 22