Nova Scheduler - OpenStack中国社区

Download Report

Transcript Nova Scheduler - OpenStack中国社区

Nova Scheduler
Shane Wang(王庆), Intel Open Source Technology Center
微信号:qq559382
Agenda
What is current situation?
How scheduler works in Juno and Kilo
Resource Tracking
Filters and Weight
Utilization Based Scheduling (UBS)
What is next plan?
Gantt
Dynamic Resource Scheduling (DRS)
How scheduler works in Juno and Kilo
Scheduler
3. Request host that match the
request_spec and filter_properties
4. Returns selected hosts
1. User request and
with scheduler hints to
include scheduling
policy
6. Rescheduling after claim
resource failed or other failure
2. Submit new task
API
Conductor
Compute
5. Call the selected compute
Resource usage Tracking
1) Fetch newest compute node stats for each call
2) Filter and weight the host
3) Consuming the resource for selected host
Scheduler
3. Request host that match the
request_spec and filter_properties
Resource Claiming
1) Validate the resource usage
2) Update the resource Usage
3) Update to DB
4. Returns selected hosts
6. Rescheduling after claim
resource failed or other failure
2. Submit new task
Conductor
Compute
5. Call the selected compute
Hypervisor
Hypervisor
Hypervisor
Periodically update the node resource with 60 seconds interval
1) Get hypervisor resource
2) Consuming the resource
3) Update to DB
DB
Filters and weight hosts
scheduler_available_filters='nova.scheduler.filters.all_filters‘
scheduler_default_filters= [……]
scheduler_weight_classes=nova.scheduler.weights.all_weighers
Request Spec:
Image
Instance_properties
Instance_type
scheduler_host_subset_size=1
Filter_properties
Scheduler-hints
Assist parameter: retry
Nova boot –flavor 1 –image …… --hint group=‘sg1’
--hint <key=value>
Send arbitrary key/value pairs to the
scheduler for custom use.
Filters
Resource:
CoreFilter AggregateCoreFilter: cpu_allocation_ratio=16.0
RamFilter AggregateRamFilter: ram_allocation_ratio=1.5
DiskFilter AggregateDiskFilter: disk_allocation_ratio=1.0
IoOpsFilter AggregateIoOpsFilter: max_io_ops_per_host=8. IoOps means resize, building,
image snaphsot. Migration, rescues, unshelve, backup
PciPassthroughFilter: Generic PCI device or SRIOV assignment
NUMATopologyFilter: NUMA in J, CPUPinning, Hugepage in K
Filters
Affinity:
DifferentHostFilter, SameHostFilter: scheduler_hints: different_host/ same_host =[‘instance
uuid’…]
ServerGroupAffnityFilter, ServerGroupAntiAffinityFilter:
 nova server-group-create
Create a new server group with the specified details.
 nova server-group-delete
Delete specific server group(s).
 nova server-group-get
Get a specific server group.
 nova server-group-list
Print a list of all server groups.
 boot with scheduler-hints: group=uuid Boot new instance into server group
SimpleCIDRAffinityFilter: scheduler_hints: cidr, build_near_host_ip
TypeAffinityFilter, AggregateTypeAffinityFilter: instance_type
Filters
Topology:
AggregateImagePropertiesIsolation: image properties matchs aggregate metadata
IsolatedHostsFilter: isolated_hosts, isolated_images,
restrict_isolated_hosts_to_isolated_images
AggregateInstanceExtraSpecsFilter: Flavor’s extra spec match aggregate metadata
AggregateMultiTenancyIsolation: filter_tenant_id
AvailabilityZoneFilter
Filters
Others:
ComputeCapabilitiesFilter: work with instance type extra_spec: ‘capabilities:’
ComputeFilter: The compute node is live or disabled
ImagePropertiesFilter: architecture, hypervisor type, vm_mode, hypervisor_version_requires
JsonFilter: scheduler_hints:query
NumInstancesFilter, AggregateNumInstancesFilter, max_instances_per_host
RetryFilter
TrustedFilter
Weight
IoOpsWeigher
MetricsWeigher
RAMWeigher
Utilization Based Scheduling
•
•
•
•
CPU Utilization data
Memory Utilization data
Network Bandwidth data
etc
Utilization Based Scheduling
1) Fetch newest compute node stats for each call
2) Filter and weight the host
3) Consuming the resource for selected host
Scheduler
3. Request host that match the
request_spec and filter_properties
Update 60 seconds interval
4. Returns selected hosts
CPU Monitor
6. Rescheduling after claim
resource failed or other failure
2. Submit new task
Conductor
Compute
5. Call the selected compute
NetworkBand
Width
MemoryCache
Monitor
DB
Notification Bus
AMQP
Hypervisor
Hypervisor
Hypervisor
Utilization Based Scheduling
MetricsWeigher:
weight_multiplier: Multiplier used for weighing metrics.
weight_setting: How the metrics are going to be weighed.
Required: If true, use the MetricsFilter
weight_of_unavailable
How scheduler strategy affects performance?
Benchmark Accuracy
Smart Scheduling
Efficiency
QoS meet SLA contract
What is monitored now?
Nova
not easy to add
how to use?
OpenStack Service
Type
Static capabilities
Metrics (e.g.)
• CPU features
• hypervisor version
Dynamic Resources
•
•
•
•
free memory/disk
vCPU #
PCI devices
# of NIC virtual functions
Resources
creation/deletion
•
•
•
•
VM
network/subnet/port
image
……
Resources usage data
•
•
•
•
•
CPU usage in VM
memory usage in VM
network usage in VM
storage usage stats
……
Nova
Not
Enough
Ceilometer
• CPU usage stats of host
• Network usage stats of host
• Intel Node Manager Power data
• Cache Qos Monitoring(CQM) data
……
Ceilometer
no hardware pollsters
What are missing?
Policy management
Break policy into QoS parameter
Mapping QoS parameter to metrics
Actions
Live migration
Resource reallocation
Enforcement
… …
Knowledge model to evaluate complex policy situations(e.g. predict future
VM workload)
Dynamic Resource Scheduling
Existing components
To be implemented
Knowledge
model
Policy
API
Pluggable Executors
admins
Logging
Alarming
Evaluator
Evaluating
Parser
Analyzer
Historic
metrics
data
Pluggable Collectors
Other collectors
alarm
trigger
set
alarm
Enforcement
Ceilometer
collector
API
Ceilometer
Nova
Live migration
De-virtualizing
Nova collector
API
resource
reallocation
Other agents
Benchmarking
API
Other actions
Next: Gantt
 Scheduler-as-a-Service project
 Split from Nova first, then for other projects
 Plan to split begin from L
Gantt in Kilo: Refactor, Refactor,
Refactor….
The Scheduler before Juno
Scheduler
API
Compute
Scheduler
The scheduler in Kilo
3. Request host that match the
request_spec and filter_properties
Scheduler API:
 select_destinations
 update_resource_stats
4. Returns selected hosts
1. User request and
with scheduler hints to
include scheduling
policy
6. Rescheduling after claim
resource failed or other failure
2. Submit new task
API
Conductor
Compute
5. Call the selected compute
Refactor
https://blueprints.launchpad.net/nova/+spec/make-resource-tracker-use-objects
https://blueprints.launchpad.net/nova/+spec/detach-service-from-computenode
https://blueprints.launchpad.net/nova/+spec/resource-objects
https://blueprints.launchpad.net/nova/+spec/request-spec-object
https://blueprints.launchpad.net/nova/+spec/sched-select-destinations-use-request-specobject
https://blueprints.launchpad.net/nova/+spec/isolate-scheduler-db
Thanks
Backup
The problem of current Nova scheduler
Server Group
Can’t add/remove active server to/from server-group
 https://review.openstack.org/136487
 https://review.openstack.org/139272
With affinity policy means you can’t evacuate
 Ignore down host when populate the instance: https://review.openstack.org/#/c/135607/
 Remove the instance from server group: https://review.openstack.org/136487, but won’t land in K, maybe L. It also won’t work for
something automatic HA
 https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/soft-affinity-for-server-group,n,z
Anti-affinity policy race problem, may trigger extra rescheduling
Race for migration
 Support unshelve, rebuild, live-migration, migration, resize in K….but not resolve the anti-affinity policy problem.
 Unshelve: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1400015,n,z
 Rebuild: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:rebuild_schedule,n,z
 Migration/live-migration on going…
The problem of current Nova scheduler
Missing resource claiming and retry for migration
 Unshelve: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1400015,n,z
 Rebuild: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:rebuild_schedule,n,z
 Migration/live-migration on going…
Scheduling-hints can’t persist
You only can specific your scheduling policy at the beginning
Violate the policy after migration
https://review.openstack.org/88983 block in K, maybe L
Race Problem
the bug link https://bugs.launchpad.net/nova/+bug/1341420
scheduler_host_subset_size=N
Ironic integration
https://bugs.launchpad.net/nova/+bug/1402658
Any more problem for scheduler?
Only do initial placement!
Each project have own scheduler
DRS in Openstack
Gantt
Tetris https://docs.google.com/document/d/1DMsnGxQ3POwZCF3uxaUeEFaKX8LqUqmmgQ_7EVK7Y8/edit
Purview(Tetris) will provide framework to quickly implement and enforce different kinds of policies. Policies can be different types. Here
are a few examples of policies in clouds: Availability Policies, Performance Policies, Load balancing Policy, User Defined Policy.
Congress https://wiki.openstack.org/wiki/Congress
 Congress is a policy-based management framework for the cloud. It is designed to work with any cloud software that reasonably fits
within the relational data model. It automatically prevents policy violations when possible and corrects them when not, and it
enables administrators to control the extent to which enforcement is automatic
 Tetris is domain-specific policy system
 Congress is domain-independent policy system
domain-independent and domain-specific policy systems are highly complementary