网络前面是网格

Download Report

Transcript 网络前面是网格

第四章 资源管理
龚 斌
山东大学计算机科学与技术学院
山东省高性能计算中心
Globus与资源规范语言RSL
Globus的资源管理
Globus RMS
RSL
specialization
Broker
RSL
Queries
& Info
Application
Ground RSL
Information
Service
Co-allocator
Simple ground RSL
GRAM
GRAM
GRAM
LSF
Condor
SGEEE
Globus Components In Action
Local Machine
RSL
string
Machines
mpirun
User
Proxy
Cert
X509
User
Cert
grid-proxy-init
RSL multi-request
globusrun
GRAM
Client
RSL parser
GSI
GASS Server
GRAM Job Manager
GRAM Gatekeeper
GASS Client
PBS
GSI
App
Remote Machine
RSL single request
DUROC
AIX
GRAM
Client
GRAM Job Manager
GRAM Gatekeeper
GSI
GASS Client
Unix Fork
App
Nexus
MPI
GSI
Remote Machine
Solaris
Nexus
MPI
GRAM(Globus Resource Allocation
Manager) Overview
• 定位:资源管理的最低层
• 功能:远程运行作业,通过提供的API提交,
检测与终止作业
• GRAM的具体职责
– 处理Resource Specification Language
(RSL)形式的作业请求
– 对创建的作业进行远程监控与管理
– 更新MDS的信息
Globus Pre-WS Component Interaction Diagram
GSI
GSI
GSI
GSI
GRAM: Grid Resource Allocation Manager
GASS: Global Access to Secondary Storage(辅助存储全局访问)
MDS: Monitoring and Discovery Service
GRIS: Grid Resource Information Service
GIIS: Grid Index Information Service
From IBM Redbook SG24-6895-012003: Intro to Grid Computing
GRAM
• Service that provides remote execution and
status management of the request
• When a job is submitted by a client, the request
is sent to the remote host and handled by the
gatekeeper daemon located in the remote host.
• Then the gatekeeper creates a job manager to
start and monitor the job.
• When the job is finished, the job manager sends
the status information back to the client and
terminates.
GRAM Architecture
From IBM Redbook SG24-6895-012003: Intro to Grid Computing
GRAM Elements
•
•
•
•
Clients
Gatekeeper daemon门户监护进程
Job Manager
Global Access to Secondary Storage (GASS)辅助
存储全局访问
• Dynamically-Updated Request Online Coallocator
(DUROC)动态更新请求在线协同分配器
• User Resource Specification Language (RSL)
GRAM Clients
• Three clients:
globusrun
globus-job-run
globus-job-submit
GRAM管理流程图示
Job Request
Gatekeeper
Client API
state change callback
Job cancel
fork/su/exec
Job Manager
Scheduler Specific Plugin
fork/exec/wait
spsubmit/spq
condor,lsf
Job Process
gatekeeper的作用
• gatekeeper:A process, running as root,
which begins the process of handling
allocation requests
– performing mutual authentication of user and resource,
– determining a local user name for the remote user,
– starting a job manager which executes as that local user
and actually handles the request.
• In order to start the job manager, the gatekeeper
must run as a privileged program
相关名词解释
• Resource
–
An entity capable of running one or more processes on behalf of
a user
• Client
–
The process that is using the resource allocation client-side API
• Job
–
A process or set of processes resulting from a job request.
• Job Request
–
A request to gatekeeper to create one or more job processes,
expressed in the supplied Resource Specification Language.
• Job Manager
–
One job manager is created by the gatekeeper to fulfill every
request submitted to the gatekeeper.
GRAM调度与状态转换模型
对各个阶段的解释
• Unsubmitted :The job has not yet been submitted to the
scheduler
• StageIn :The job manager is staging executable, input, or data
files to the job
• Pending :The job has been submitted to the scheduler, but
resources have not yet been allocated for the job.
• Active :The job has received all of it's resources, and the
application is executing
• Suspended :The job has been stopped temporarily by the
scheduler
• StageOut :The job manager is staging output files from the job
manager host to remote storage.
• Done :The job completed successfully.
• Failed :The job terminated before completion, as a result of an
error, or a user or system cancel.
GRAM Components
MDS client API calls
to locate resources
Client
MDS: Grid Index Info Server
Site boundary
MDS client API calls
to get resource info
GRAM client API calls to
MDS:
request resource allocation
and process creation.
GRAM client API state
change callbacks
Globus Security
Grid Resource Info Server
Query current status
of resource
Local Resource Manager
Infrastructure
Request
Create
Gatekeeper
Job Manager
Parse
RSL Library
Monitor &
control
Allocate &
create processes
Process
Process
Process
DUROC(Dynamically-Updated
Request Online Co-allocator)
• Simultaneous allocation of a resource set
– Handled via optimistic co-allocation based on free nodes
or queue prediction
– advance reservations will also be supported
• globusrun will co-allocate specific multi-requests
using DUROC
GRAM Examples
The globus-job-run client is a sample GRAM
client, using command-line arguments rather
than RSL.
% globus-job-run pitcairn.mcs.anl.gov /bin/ls
% globus-job-run pitcairn.mcs.anl.gov –s myprog
% globus-job-run pitcairn.mcs.anl.gov \
–s myprog –stdin –s in.txt –stdout –s out.txt
GRAM Examples
The globusrun client is a more involved
prototype that allows complicated RSL
expressions.
% globusrun –r pitcairn.mcs.anl.gov –f myjob.rsl
% globusrun –r pitcairn.mcs.anl.gov \
‘&(executable=myprog)’
Resource Management APIs
• Globus Toolkit has APIs for RSL, GRAM,
and DUROC:
–
–
–
–
–
globus_rsl
globus_gram_client
globus_gram_myjob
globus_duroc_control
globus_duroc_runtime
Resource Specification Language
• 可以用于说明作业要求的通用语言
• RSL是GRAM的核心部分,它提供了不同组件之间
交换信息的手段,比如应用与资源代理之间,资源
协同分配与资源管理之间的信息交换
• 形式
– (attribute=value)
– 需要GRAM理解这些属性attribute
• Globus提供使用RSL的API
• 可以用于以上之外的更多场合
RSL的一些属性
• (executable=string)
– Program to run
– A file path (absolute or relative) or URL
• (directory=string)
– Directory in which to run (default is $HOME)
• (arguments=arg1 arg2 arg3...)
– List of string arguments to program
• (environment=(E1 v1)(E2 v2))
– List of environment variable name/value pairs
RSL的一些属性
•
(stdin=string)
• Stdin for program
• A file path (absolute or relative) or URL
•
(stdout=string)
• Stdout for program
• A file path (absolute or relative) or URL
•
(stderr=string)
• Stdout for program
• A file path (absolute or relative) or URL
•
(count=integer)
• Number of processes to run (default is 1)
•
(hostCount=integer)
• On SMP multi-computers, number of nodes to distribute the “count” processes across
•
(project=string)
• Project (account) against which to charge
•
(queue=string)
• Queue into which to submit job
RSL的一些属性
• (maxTime=integer)
– Maximum wall clock or cpu runtime (schedulers’s choice) in
minutes
• (maxWallTime=integer)
– Maximum wall clock runtime in minutes
• (maxCpuTime=integer)
– Maximum CPU runtime in minutes
• (maxMemory=integer)
– Maximum amount of memory for each process in megabytes
• (minMemory=integer)
– Minimum amount of memory for each process in megabytes
RSL Attributes For GRAM
• (jobType=value)
– Value is one of “mpi”, “single”, “multiple”, or “condor”
• mpi: Run the program using “mpirun -np <count>”
• single: Only run a single instance of the program, and let the
program start the other count-1 processes.
• multiple: Start <count> instances of the program using the
appropriate scheduler mechanism
• condor: Start a <count> Condor processes running in “standard
universe”
RSL Attributes for GRAM
• (gramMyjob=value)
– Value is one of “collective”, “independent”
– Defines how the globus_gram_myjob library will
operate on the <count> processes
• collective: Treat all <count> processes as part of a single job
• independent: Treat each of the <count> processes as an
independent uniprocessor job
• (dryRun=true)
– Do not actually run job
RSL 的替代符
• RSL supports simple variable substitutions
• Substitutions are declared using a list of pairs
– (rslSubstitution=(SUB1 val1)(SUB2 val2)
• A substitution is invoked with $(SUB)
• Processing order:
– Within scope, processed left-to-right,
– Outer scope processed before inner scope
– Variable definition can reference previously defined variables
替代符示例
• This
&(rslSubstitution=(URLBASE “ftp://host:1234”))
(rslSubstitution=(URLDIR $(URLBASE)/dir))
(executable=$(URLDIR)/myfile)
• is equivalent to this
&(executable=ftp://host:1234/dir/myfile)
GRAM Defined RSL Substitutions
• GRAM defines a set of RSL substitutions
before processing the job request
• Machine Information
–
–
–
–
GLOBUS_HOST_MANUFACTURER
GLOBUS_HOST_CPUTYPE
GLOBUS_HOST_OSNAME
GLOBUS_HOST_OSVERSION
GRAM Defined RSL Substitutions
• Paths to Globus
–
–
–
–
GLOBUS_INSTALL_PATH
GLOBUS_TOOLS_PATH
GLOBUS_SERVICES_PATH
GLOBUS_DEPLOY_PATH
• Miscellaneous
– HOME
– LOGNAME
– GLOBUS_ID
用于DUROC的RSL属性
• (subjobStartType=value)
– Alters the startup barrier mechanism
– values are “strict-barrier”, “loose-barrier”, “no-barrier”
• (subjobCommsType=value)
– values are “blocking-join” and “independent”
– if value is set to “independent”, the subjob won’t be seen from
the other subjobs when doing inter-subjob communication.
• (label=string)
– Identifier for this subjob
• (resourceManagerContact=string)
(resourceManagerName=string)
– Resource manager to which to submit a subjob
Example: (single resource for now…)
$ globusrun -r chi/jobmanager-pbs
'& (executable="/home/abose/test.exe")
(host_count=2) (count=4)
(arguments=“-t 100 –f out.dat")
(email_address=“[email protected]")
(queue="cac")
(pbs_stagein=“morpheus:/home/abose/test.exe")
(pbs_stageout=“morpheus:/home/abose/out.dat")
(pbs_stdout="/tmp/stdout")
(pbs_stderr="/tmp/stderr")
(maxwalltime=10)(jobtype="mpi”)‘
“get test.exe from morpheus and run it on hypnos” submitted by Globus gatekeeper on chi using PBS job
manager
RSL Example – Resulting PBS Submission Script on Hypnos:
#! /bin/sh
# PBS batch job script built by Globus job manager
#
#PBS -S /bin/sh
#PBS -M [email protected]
#PBS -m n
#PBS -q cac
#PBS -W stagein=/home/abose/[email protected]:/home/abose/test.exe
#PBS -W stageout=/home/abose/[email protected]:/home/abose/out.dat
#PBS -l walltime=10:00
#PBS -o hypnos:/tmp/stdout
#PBS -e hypnos:/tmp/stderr
#PBS -l nodes=2
#PBS -v
X509_USER_PROXY=/home/abose/.globus/.gass_cache/local/md5/1c/fd/d3/753b90
28dfec2ddd6df84cd06c/md5/0a/4b/1d/599dac54863d650c2531cb92fc/data,GLOBUS_
LOCATION=/usr/grid,GLOBUS_GRAM_JOB_CONTACT=https://chi.grid.umich.edu:58963/
575/1047861360/,GLOBUS_GRAM_MYJOB_CONTACT=URLx-nexus://chi.grid.umich.edu:58
964/,
HOME=/home/abose,LOGNAME=abose,LD_LIBRARY_PATH=
#Change to directory requested by user
cd /home/abose
/usr/gmpi.pgi/bin/mpirun –np 4 /home/abose/test.exe –t 100 –f out.dat
Slides taken from NPACI Training, 2003
Programming with Globus API
• Command line programs syntax: grid_* or globus_*
• Function calls/APIs start with globus_*
• Library binaries start with libglobus_*.a
• Includes:
#include <globus_common.h> //defines most common data structures
and others depending on which modules/functions are called in the program.
• Module Activation/Deactivation:
- Functions are arranged in several modules. The corresponding modules must be activated
before calling a function:
- globus_module_activate(MODULE_NAME)
- globus_module_deactivate(MODULE_NAME)
- globus_module_deactivate_all()
GLOBUS_SUCCESS (0) is returned if successful.
Example Module Names:
GLOBUS_GRAM_CLIENT_MODULE
GLOBUS_IO_MODULE
GLOBUS_GASS_COPY_MODULE
Dependencies among module activations exist. Read API documentation.
评 价
• 优点:
–增加了对JOB资源的描述
–定义了很多Attribute,支持GRAM、DUROC等多种资
源管理方式
• 缺点:
–也是偏重于对计算资源和资源请求的描述,不够广
泛
–可扩展性不好
–目前仅用于Globus,还不被其他Grid项目所支持
WWW服务描述语言WSDL
WSDL
• Web Service Description Language
• 用于描 述Web服务的技术调用语法。
• WSDL定义了一套基于 XML的语法,将Web服务描述
为能够进行消息交换的服务访问点的集合,从而满足
了这种需求。
• WSDL服务定义为分布式系统提供了可机器识别的
SDK文档,并且可用于描述自动执行应用程序通信中
所涉及的细节。
• WSDL的当前版本是1.1,规范可以从
http://www.w3.org/TR/wsdl获得。
WSDL
• WSDL由Ariba、Intel、IBM和微软等开发商提
出。
• 它用一种和具体语言无关的抽象方式定义了给
定Web服务收发的有关操作和消息。
• WSDL保持协议中立,但它确实内建了绑定
SOAP的支持,从而同SOAP建立了不可分割的
联系。
WSDL的信息模型
• WSDL信息模型充分利用了抽象规范与规范具体实现
的分离,也就是分离了服务接口定义(抽象接口)与服务
实现定义(具体端点)。
• 抽象接口规范描述了终端的处理能力,它在WSDL中
表示为portType。束定机制 (binding mechanism)在
WSDL中表示为binding元素,它使用特定的通信协议、
数据编码模型和底层通信协议,将Web服务的抽象定
义映射至特定实现。若束定结合了实现的访问地址,
抽象端点也就成为可供服务请求者调用的具体端点
(concrete endpoint),WSDL的port元素表示了这一结合。
• 抽象接口可以支持任何数量的操作(operations)。操作
是由一组消息(messages)定义,消息定义了操作的交互
定式。与抽象的消息、操作概念相对应的具体实现是
由binding元素指定。与XML应用相同,WSDL模式定
义了几个高层元素,或称为主要元素。
WSDL描述的基本属性
• 服务做些什么--服务所提供的操作(方法)。
• 如何访问服务--数据格式详情以及访问服
务操作的必要协议。
• 服务位于何处--由特定协议决定的网络地
址,如URL。
WSDL基本元素的含义
元素名
含义
types
定义了Web服务使用的所有数据类型集合,可被元素
的各消息部件所引用。
message
通信消息数据结构的抽象类型化定义。使用Types所定
义的类型来定义整个消息的数据结构。
operation
对服务中所支持操作的抽象描述。
portType
对于某个访问入口点类型所支持操作的抽象集合。
binding
包含了如何将抽象接口的元素(portType)转变为具体表
示的细节,具体表示也就是指特定的数据格式和协议
的结合
port
定义为协议/数据格式绑定与具体Web访问地址组合的
单个服务访问点。
service
定义服务
WSDL信息模型
WSDL对象结构图
WSDL文档类型
types
types
types
message
types
message
operation
operation
portType
binding
port
binding
port
service
port
service
WSDL文档结构
WSDL工具
• Omniopera----图形用户界面的WSDI、XML和
XSD编辑器。
• Microsoft的SOAP Toolkit----一种工具包,其中
包括根据WSDL定义创建COM接口的向导程序,
还包括根据COM接口创建WSDL的向导程序。
• IBM的Web Services Toolkit----一种工具包,其
中包括产生WSDL和SOAP部署说明的向导程
序。
资源描述框架RDF
RDF
• Resource Description Framework, RDF
• W3C的资源描述框架(RDF)的目的是提供一个
访问网络资源元数据(metadata)的标准,因此也
提供了一个描述特定资源内容的标准协议。
•
•
•
•
W3C应用元数据时的推荐标准
是一个模型,一种句法(syntax(es))
应用在Web上时,RDF 通常用XML来编码
是语义万维网 (semantic Web)的基础、支撑
W3C - Resource Description Framework (RDF)
http://www.w3.org/RDF/
RDF
• 是一个用于表达关于万维网(World Wide
Web)上的资源的信息的语言。
• 专门用于表达关于Web资源的元数据, 比
如Web页面的标题、作者和修改时间,
Web文档的版权和许可信息,某个被共享
资源的可用计划表等
为什么要使用 RDF?
• RDF提供共享元数据的模型(model)…
• …共享语义(meaning)
• 元数据可以在相互了解不多或根本不了解
的应用之间共享
• 例如一个基于RDF的书目应用能够吸收基
于RDF的地理空间应用的元数据并对其意
义有所理解。
…用(X)HTML和XML置标后,软件应用必须能够理解复
杂的编码…
RDF的基本思想
• 用Web标识符(称作统一资源标识符,
Uniform Resource Identifiers或URIs)来标
识事物,用简单的属性(property)及属
性值来描述资源。这使得RDF可以将一
个或多个关于资源的简单陈述表示为一
个由结点和弧组成的图(graph),其中
的结点和弧代表资源、属性或属性值。
举例
• 有一个人由http://www.w3.org/People/EM/contact#me 标识, 他的名
字是Eric Miller, 他的电子邮件地址是[email protected],他的头衔是Dr.
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdfsyntax-ns#"
xmlns:contact="http://www.w3.org/2000/10/swap/pim/con
tact#">
<contact:Person
rdf:about="http://www.w3.org/People/EM/contact#me">
<contact:fullName>Eric Miller</contact:fullName>
<contact:mailbox rdf:resource="mailto:[email protected]"/>
<contact:personalTitle>Dr.</contact:personalTitle>
</contact:Person>
</rdf:RDF>
统一资源标识符URI
URI
• Uniform Resource Identifiers,URI
• 是一种简单的可扩展的指定资源的方法
URI的同一性
• 尽管不同资源的访问机制可能不同,但URI允许不同类
型的资源标识符在相同的上下文环境中使用
• URI允许用统一的语义解释跨越不同类型资源标识符的
通用语法规范
• URI可以在不影响已有的标识符系统的情况下,引入新
类型标识符
• URI允许在多种不同的环境中重用同一个标识符
• URI允许新的应用或协议采用已经存在的、广泛使用的
资源标识符
URI举例
• ftp://ftp.is.co.za/rfc/rfc1808.txt
• Gopher://spinaltap.micro.umn.edu/00/Weat
her/Californian/LosAngeles
• http://www.cs.sdu.edu.cn/index.html
• mailto:[email protected]
• News:comp.infosystem.www.servers.unix
• telnet://159.226.39.252
URI=URL+URN
• URL(Uniform Resource Locators)统一
资源定位器
• URN(Uniform Resource Name)统一资
源名字
• 从不同角度标识一个资源
URL
• 一般形式是
<scheme>:<scheme-specific-part>
• scheme:
– ftp,http,Gopher,mailto,news,nntp,telnet,wais,Fil
e,prospero
• //<user>:<passeord>@<host>:<port>/<urlpath>
URN
• <URN>::=“urn:”<NID>“:”<NSS>
– <NIS>:名字空间标识符
– <NSS>是符合名字空间<NID>规范的字符串
– NID
•
•
•
•
<NID>::=<let-num>[1,31<let-num-hyp>]
<let-num-hyp>::=<upper><lower>|<number>|“-”
<let-num>::= >::=<upper><lower>|<number>
Upper:大写字母,lower:小写字母,number:数字
URN
– <NSS>
• <NSS>::=I*<URN chars>
• <URN chars>::=<trans>|“%”<hex><hex>
• <hex>::=<number>|“A”|“B”|“C”| “D”| “E”| “F”| “a”|
“b”| “c” |“d” |“e”| “f”
• <other>::= “(”|“)”|“+”| “,”| “-”| “.”| “:”| “=”| “@”
|“;” |“$”| “_” |“!”|“*”|“’”
LDAP中的资源描述
LDAP
• 用一系列“属性对”的形式来存储记录项,
每一个记录项包括类型和属性值。
举例
• dn:cn=My Computer, ou=devices, dc=sdu,
dc=edu.cn
• cn:FB Computer
• usage:computing
• resouce:866MHZ
• resource:512M memory
• resource:60GB storage
• resource:Linux OS
资源命名
资源命名的意义和作用
• 资源名可以把资源进一步抽象,将资源
的标识和资源的位置分离开来
• 资源命名机制可以建立虚拟空间,扩大
或缩小用户空间
• 实现资源的按名访问,方便用户使用
• 资源的命名有不同形式
–
–
–
–
逻辑名称:方便用户,便于记忆
物理名称:实际名字
内部名称:系统内部
外部名称:外部提供用户
• 命名规则,防止冲突
• 风格统一
• 全球唯一