odp - Red Hat
Download
Report
Transcript odp - Red Hat
Enterprise Storage Management
with Red Hat Enterprise Linux
for Reduced TCO
- Dave
Wysochanski ([email protected])
Agenda
Enterprise Storage Problems
Red Hat Enterprise Linux Solutions
Device Mapper Architecture
LVM2
Multipath
Low Level Drivers / Transports (FC, iSCSI, etc)
Management Tools
Summary
Not Covered
Filesystems (tiny bit)
Clustering, GFS
Surfing (web, surfboard)
Enterprise Storage Problems
Primary
● Data Sizes Unknown / Unpredictable
● Availability (component servicing, failures)
● Performance
● Backup
Secondary
● Vendor incompatibility
● Problem resolution ping/pong
● Technology evolution / incompatibility
Enterprise Storage Support out of the box
Ext3 – the standard open source journaling filesystem
● Online resizing, perf improvements
SAN support
● F/C – for popular Emulex and Qlogic F/C HBAs
● iSCSI – Software initiator, iSCSI HBA
Native (DM) Multipathing
● Availability in face of path / component failures
LVM2 – kernel-level storage virtualization, now includes:
● Dynamic volume resizing (unknown data size)
● Snapshot (backup, availability)
● Striping / RAID0 (performance)
● Mirroring / RAID1 (availability)
Filesystems
Ext3 / Ext4
Today (ext3)
● Max filesystem size 8TB
● Max file size 8TB (x86/AMD64/EM64T) & 16TB (Itanium2/POWER)
● Online growing (System-config-lvm or cmdline, e.g. 4.x: ext2online)
● Directory indexing (hash tree directories improves perf)
● Block reservations (improved read/write perf)
Futures
● RHEL5 has 16T ext3 support (tech preview today)
● (previous limit was 8T)
● RHEL5 ext4 into RHEL5 as a tech preview
● Extents, prealloc, delayed alloc, large fs, etc.
● Reference: Ted Ts'o's ext4 talk
Device Mapper (DM) Architecture
DM Architecture Overview
dmsetup
lvm2
dmraid
kpartx
multipath
libdevmapper
...
Userspace
Kernelspace
dmioctl
dmcore
target
target
target
e.g. dm-raid1
Device Mapper
General-purpose kernel block I/O redirection subsystem
Logical devices are maps of:
● specified sectors on underlying devices
● according to the rules implemented in a “target”
Contains DM “core”, and DM “targets”
● core
● only about 2000 lines of code
● Generic infrastructure for registering targets
● “targets”
● Multipath, linear, striped, snapshot, mirror, etc
DM Kernel Architecture
ioctl
interface
filesystem
interface
control interface
block
interface
core device-mapper
mapping / target interface
linear
mirror
log
snapshot
kcopyd
multipath
path
hardware
selectors
handlers
round-robin
emc
Device Mapper
No concept of volume groups, logical or physical volume
● simply maps from one block device to another
Does not know about on-disk formats
● e.g. LVM2 metadata, filesystems
User<-->kernel interface is via ioctl()
● Encapsulated into libdevmapper
Kernel component of dm-multipath, LVM2, dmraid, etc
DM devices can be stacked, for example:
● snapshot of a mirror whose components are multipath
devices
Device Mapper Targets
/dev/dm0
??????
/dev/sda
/dev/sdb
/dev/hda
Target
/dev/dm1
Device Mapper Targets and
Corresponding Userspace Subsystems
Surfing Dos / Don'ts
Do
●
●
●
●
●
●
●
First learn to swim
Get a lesson from a professional
Start in a location with smaller waves (<2m)
Try to ride broken waves first
Use a long board
Learn to dive under waves
Have fun
Don't
● Think you know it all
● Start in a location like Hawaii with very large (>>3m) waves
● Get in the way of experts
DM Multipath: Today
Modular design
● Storage vendor plugins
● path checker, path priorities (userspace)
● error code handling / path initialization (kernel)
● Path selection policies
● Only round robin currently
● UUID calculation
● Path grouping
● Fail-over / fail-back policies
Broad hardware support
● HP, Hitachi, SUN, EMC, NetApp, IBM, etc
Active / Active, Active / Passive
DM Multipath: Future
Multipath root
Alternative load-balancing policies
Request-based multipath (NEC)
Support newest storage arrays
Active / Passive arrays?
Requests?
Logical Volume Management
Volume Management creates a layer of abstraction over the physical storage.
Physical Volumes (disks) are combined into Volume Groups.
Volume Groups are divided into Logical Volumes, like the way in which disks are
divided into partitions.
Logical Volumes are used by Filesystems and Applications (e.g. databases).
Logical Volume Management
Logical Volumes
lvcreate
Volume Group
vgcreate
Physical Volumes
pvcreate
Disks
Advantages of LVM
Filesystems can extend across multiple disks.
Hardware storage configuration hidden from software
change without needing to stop applications or unmount filesystems
Data can be rearranged on disks
e.g. empyting a hot-swappable disk before removing it
Device snapshots can be taken
●
●
●
●
consistent backups
test effect of changes without affecting the real data
LVM2 Features
Concatenation, Striping (RAID 0), Mirroring (RAID 1)
Supports additional RAID levels (3, 5, 6, 10, 0+1) (future)
Snapshots (writeable)
Provides underpinnings for cluster-wide logical volume
management (CLVM)
● Same on-disk metadata format
Integrated into the Anaconda installer allow configurations at
installation time
Replaces LVM1, which was provided in RHEL 3
Easier to use and more configurable (/etc/lvm.conf)
Clean separation of application and kernel runtime mapping
LVM2 Features
Single tool binary designed to contain all tool functions
Column-based display tools with tailored output
LVM2 Metadata
● Concurrent use of more than one on-disk format
● Human-readable text-based format
● Changes happen atomically
● Redundant copies of metadata
● Upwardly compatible with LVM1 enables easy upgrade
● Transactional (journaled) changes
Pvmove based on temporary mirrors (Core Dirty Log)
LVM2: DM Striped / Linear
linear target
– Device name, start sector, length
– <start length 'linear' device offset>.
striped target
– # stripes, striping chunk size, pairs of device name and sector
error target causes any I/O to the mapped sectors to fail
useful for defining gaps in the new logical device (e.g. fake a huge device)
LVM2: DM Mirror (raid1)
Maintains identical copies of data on devices.
Divides the device being copied into regions typically 512KB in size.
Maintains a (small) log with one bit per region indicating whether or not each
region is in sync.
Two logs are available – core or disk.
Parameters are:
mirror <log type> <#log parameters> [<log parameters>] <#mirrors> <device>
<offset> <device> <offset> ...
The disk log parameters are:
<log device> <region size> [<sync>]
<sync> can be sync or nosync to indicate whether or not an initial sync from the
first device is required.
LVM2: DM Mirror (today)
Single node mirror
Clustered mirroring (4.5)
HA LVM
LVM2: DM Mirror (future)
Extend an active mirror
Snapshots of mirrors
Install/boot from mirror
RAID 10 and RAID 01
>2 legged mirror
Clustered Mirror (5.x)
Read-balancing, handling failing devices automatically
Robustness and performance
LVM2: DM Snapshot
An implementation of writable snapshots.
Makes a snapshot of the state of a device at a particular instant
The first time each block is changed after that it makes a copy of the data prior to
the change, so that it can reconstruct the state of the device.
Run fsck on a snapshot of a mounted filesystem to test its integrity to find out
whether the real device needs fsck or not.
Test applications against production data by taking a snapshot and running tests
against the snapshot, leaving the real data untouched.
Take backups from a snapshot for consistency.
LVM2: DM Snapshot
Copy-on-write
Not a backup substitute
Requires minimal storage (5% of origin)
Preserves origin
Allows experimentation
Dropped if full
Resizeable
LVM2: Snapshot Example Uses
Backup on live system
fsck snapshot of live filesystem
Test applications against real data
Xen domU's
Others?
LVM2: Snapshot Futures
Merge changes to a writeable snapshot back into its read-only origin
Robustness and performance
Memory efficiency
Clustered Snapshots
LiveCD + USB
Snapshots of mirrors
Device Mapper Futures
LVM2: DM Raid4/5 (dm-raid45)
Features
● Failed device replacement
● Selectable parity device with RAID 4
● Selectable allocation algorithms (data and parity)
● left/right, symmetric/asymmetric
● Stripe cache (data and parity)
● XOR algorithm for parity
written by Heinz Mauelshagen
http://people.redhat.com/heinzm/sw/dm/dm-raid45/
DM Block Caching Target: dm-cache
write-back or write through local disk cache
intended use is remote block devices (iSCSI, ATAoE)
written by Ming Zhao
technical report on IBM's CyberDigest
● http://tinyurl.com/35qzcg
DM Block Caching Target: dm-hstore
dm-hstore (“Hierachical store”)
● similar to dm-cache
● building block for HSM System
● single host caching
● remote replication
● written by Heinz Mauelshagen
Features
● caches reads and writes to an origin device
● writes data back to origin device
● keep state of extents (eg., uptodate, dirty, ...) on disk
● background initialization (instantaneous creation)
● supports read-only origins
DM Misc Futures
Add a netlink-based mechanism for communication with userspace
● Mike Anderson (IBM)
Reduce kernel stack usage for some targets/paths
Automatically detect and handle changes to physical capacity
A lot of other stuff
Requests?
Transports / Low Level Drivers
ISCSI Initiator
Low-cost Enterprise SAN connectivity
RHEL 3 U4+: linux-iscsi
● Open source Cisco implementation
RHEL4 U2+: linux-iscsi
● Rewrite for 2.6 kernel (based on 2.4 driver)
RHEL5+: open-iscsi.org
Qualified with major storage vendors
● NetApp
● EMC
● EqualLogic
Red Hat Enterprise Linux Host
Cisco iSCSI initiator
Qlogic/Adaptec driver
NIC
iSCSI adapter
TCP/IP
ISCSI
storage controller
(e.g. NetApp)
ISCSI bridge
Fibre Channel
Switch
SAN
FC
host
iSCSI Initiator: Today
open-iscsi.org
RFC 3720 compliant
Flexible transport design
● software iSCSI, hardware iSCSI, iSER
Command-line management (iscsiadm)
● building block for GUIs
Basic iSNS support
iSCSI Initiator: Future
Fully integrated hardware iSCSI (full offload, partial offload)
Improved / more flexible management tools
Install to software iSCSI
Performance improvements
Improved iSNS
BIOS / OF Boot
Requests?
Fibre Channel Device Drivers
RHEL4/5
● Driver versions tracking upstream submissions very closely
● Goal is to keep them current as much as possible (e.g. F/C 4GB, SATA 2)
● Greatly increased support with over 4,000 SCSI devices/paths
(was 256 in RHEL 3)
Each update contains current drivers
● Actively coordinate with Qlogic, Emulex and system vendors
● Integrate key bug fixes
● Aid partners to keep their open source drivers current up stream
● Driver update model (Jon Masters)
Management Tools
Management Tools: system-config-lvm
Transport Protocol agnostic
Simplifies resizing
Future: iSCSI management plugin (currently in 4.5)
Management Tools: Conga
Web browser front-end (luci)
Agent (ricci) serializes requests
Single node or cluster management
Future: Storage Server
● Clustered NFS
● Clustered SAMBA
Management Tools: Conga (screenshots)
Summary
Out of the box support for multipathing, LVM, etc
DM provides very flexible, extensible architecture
New DM targets being developed
Active communities around DM, LVM2, etc
Good management tools (CLI, GUI)
More Information
LVM2
● http://sourceware.org/lvm2/
Device Mapper
● http://sourceware.org/dm
Multipathing
● [email protected]
● http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=Home
iSCSI
● http://www.open-iscsi.org
Conga
● http://sourceware.org/cluster/conga/
More Information
Red Hat Enterprise Linux LVM Administrator's Guide
● http://www.redhat.com/docs/manuals/enterprise/RHEL-5-manual/Cluster_Logical_Volum
Presenter
● Dave Wysochanski ([email protected])
This presentation
● http://people.redhat.com/dwysocha/talks
Backup Slides
Tagging
LVM2 supports two sorts of tags.
Tags can be attached to objects such as PVs, VGs, LVs and
segments.
Tags can be attached to hosts, for example in a cluster
configuration.
Tags are strings using [A-Za-z0-9_+.-] of up to 128
characters and they cannot start with a hyphen. On the
command line they are normally prefixed by @.
LVM1 objects cannot be tagged as the metadata does not
support it.
Tagging – Object Tags
Use --addtag or --deltag with lvchange vgchange pvchange lvcreate or vgcreate.
Only objects in a Volume Group can be tagged. PVs lose their tags when
removed from a VG. This is because tags are stored as part of the VG
metadata. Snapshots cannot be tagged.
Wherever a list of objects is accepted on the command line, a tag can be used.
e.g. lvs @database to lists all the LV with the 'database' tag.
Display tags with lvs -o +tags or pvs -o +tags etc.
Tagging – Host Tags
You can define host tags in the configuration files.
If you set tags { hosttags = 1 }, a hosttag is automatically defined using the
machine's hostname. This lets you use a common config file between all your
machines.
For each host tag, an extra config file is read if it exists: lvm_<hosttag>.conf. And
if that file defines new tags, then further config files will be appended to the list of
files to read in.
tags { tag1 { } tag2 { host_list = [“host1”] } }
This always defines tag1, and defines tag2 if the hostname is host1.
Tagging – Controlling Activation
You can specify in the config file that only certain LVs should be activated on that
host.
e.g. activation { volume_list = [“vg1/lvol0”, “@database” ] }
This acts as a filter for activation requests (like 'vgchange -ay') and only activates
vg1/lvol0, any LVs or VGs with the 'database' tag in the metadata on that host.
There is a special match “@*” which causes a match only if any metadata tag
matches any host tag on that machine.
Tagging – Simple Example
Every machine in the cluster has tags { hosttags = 1 }
You want to activate vg1/lvol2 only on host db2.
Run lvchange --addtag @db2 vg1/lvol2 from any host in the cluster.
Run lvchange -ay vg1/lvol2.
This solution involves storing hostnames inside the VG metadata.
dmsetup
A command line wrapper for communication with the Device Mapper.
Provides complete access to the ioctl commands via libdevmapper.
Examples:
●
●
●
●
●
●
dmsetup version
dmsetup create vol1 table1
dmsetup ls
dmsetup info vol1
dmsetup table vol1
dmsetup info -c