Virtualisation on Linux and storage high-availability Mario

Download Report

Transcript Virtualisation on Linux and storage high-availability Mario

Virtualisation on Linux and storage high-availability (C) 2011 - CS / MI Adfinis SyGroup AG

Introduction

Adfinis SyGroup AG Mario Iseli & Christian Schläppi

Table of contents

Virtualisation overview Linux KVM Requirements for dedicated storage Storage concepts Costs

Table of contents

HA concepts Pimp your system Filesystems Our flagship implementation ;-)

Virtualisation overview

Full virtualisation (Hypervisor) Para virtualisation (aka “pimped chroot”)

Linux KVM

Hypervisor, “out of the box” libvirt Frontend API Backend API nw-filter definition Storage pool definition

Requirements for dedicated “Storage”

Problems with images Complexity of migrations (large downtime) Load-balancing High-availability Performance bottlenecks

Storage concepts

NFS (file-sharing, common HW) iSCSI (block-device sharing, common HW) FibreChannel (block-device sharing, custom HW)

Cost overview

“real” SANs (appliance solutions) licensing feature licensing high CapEx and high Opex “custom” Storage

High-availability concepts

Master-Master Clusters Master-Slave Clusters Multinode Clusters Load distribution in clusters

General pimping

Memory sharing (KSM) Network bonding (EtherChannel, Trunk, Aggregation however you’d like to call it, IEEE802.3ad aka LACP) VLANs (Trunking... again - IEEE802.1q) Logical volumes (LVM)

Linux HA solutions

Pacemaker / Corosync DRBD

Filesystems

Clustered filesystems OCFS2 GFS GlusterFS Locking (DLM) cLVM

Implementation

Architecture overview Multipathing (and “protocol C” in DRBD) KVM storage-management (attach a whole target from libvirt) Config snippets

node node1 node node2 primitive resFSvarwww ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/var/www" fstype="xfs" options="noatime" primitive resHTTPD lsb:lighttpd \ op monitor interval="10s" timeout="5s" \ meta target-role="Started" primitive resIntIP ocf:heartbeat:IPaddr2 \ params ip="10.10.10.30" cidr_netmask="24" primitive resNFS-Kernel-Server lsb:nfs-kernel-server primitive resPing ocf:pacemaker:ping \ params host_list="router1 router2" multiplier="100" \ op monitor interval="10s" timeout="20s" \ op start interval="0" timeout="90s" \ op stop interval="0" timeout="100s"

group groupService resPubIP resIntIP resNFS-Common resNFS-Kernel-Server ms msDRBD resDRBD \ meta master-max="2" notify="true" target-role="Master" is-managed="true" clone clonePing resPing \ meta globally-unique="false" location locServiceonConnected groupService \ rule $id="locServiceonConnected-rule" -inf: not_defined pingd or pingd lte 0 colocation colServiceDRBD inf: groupService msDRBD:Master order ordDRBDbeforeService 0: msDRBD:promote groupService property $id="cib-bootstrap-options" \ stonith-enabled="true" \ no-quorum-policy="ignore" \ maintenance-mode="false"

Performance comparison

Values with iSCSI Downtime with storage-node failure

Questions?

• (cu @ social event) (C) 2011 - CS / MI Adfinis SyGroup AG