odp - Red Hat

Download Report

Transcript odp - Red Hat

Enterprise Storage Management
with Red Hat Enterprise Linux
for Reduced TCO
- Dave
Wysochanski ([email protected])
Agenda

Enterprise Storage Problems

Red Hat Enterprise Linux Solutions

Device Mapper Architecture

LVM2

Multipath

Low Level Drivers / Transports (FC, iSCSI, etc)

Management Tools

Summary
Not Covered

Filesystems (tiny bit)

Clustering, GFS

Surfing (web, surfboard)
Enterprise Storage Problems

Primary
● Data Sizes Unknown / Unpredictable
● Availability (component servicing, failures)
● Performance
● Backup

Secondary
● Vendor incompatibility
● Problem resolution ping/pong
● Technology evolution / incompatibility
Enterprise Storage Support out of the box




Ext3 – the standard open source journaling filesystem
● Online resizing, perf improvements
SAN support
● F/C – for popular Emulex and Qlogic F/C HBAs
● iSCSI – Software initiator, iSCSI HBA
Native (DM) Multipathing
● Availability in face of path / component failures
LVM2 – kernel-level storage virtualization, now includes:
● Dynamic volume resizing (unknown data size)
● Snapshot (backup, availability)
● Striping / RAID0 (performance)
● Mirroring / RAID1 (availability)
Filesystems
Ext3 / Ext4

Today (ext3)
● Max filesystem size 8TB
● Max file size 8TB (x86/AMD64/EM64T) & 16TB (Itanium2/POWER)
● Online growing (System-config-lvm or cmdline, e.g. 4.x: ext2online)
● Directory indexing (hash tree directories improves perf)
● Block reservations (improved read/write perf)

Futures
● RHEL5 has 16T ext3 support (tech preview today)
● (previous limit was 8T)
● RHEL5 ext4 into RHEL5 as a tech preview
● Extents, prealloc, delayed alloc, large fs, etc.
● Reference: Ted Ts'o's ext4 talk
Device Mapper (DM) Architecture
DM Architecture Overview
dmsetup
lvm2
dmraid
kpartx
multipath
libdevmapper
...
Userspace
Kernelspace
dmioctl
dmcore
target
target
target
e.g. dm-raid1
Device Mapper



General-purpose kernel block I/O redirection subsystem
Logical devices are maps of:
● specified sectors on underlying devices
● according to the rules implemented in a “target”
Contains DM “core”, and DM “targets”
● core
● only about 2000 lines of code
● Generic infrastructure for registering targets
● “targets”
● Multipath, linear, striped, snapshot, mirror, etc
DM Kernel Architecture
ioctl
interface
filesystem
interface
control interface
block
interface
core device-mapper
mapping / target interface
linear
mirror
log
snapshot
kcopyd
multipath
path
hardware
selectors
handlers
round-robin
emc
Device Mapper





No concept of volume groups, logical or physical volume
● simply maps from one block device to another
Does not know about on-disk formats
● e.g. LVM2 metadata, filesystems
User<-->kernel interface is via ioctl()
● Encapsulated into libdevmapper
Kernel component of dm-multipath, LVM2, dmraid, etc
DM devices can be stacked, for example:
● snapshot of a mirror whose components are multipath
devices
Device Mapper Targets
/dev/dm0
??????
/dev/sda
/dev/sdb
/dev/hda
Target
/dev/dm1
Device Mapper Targets and
Corresponding Userspace Subsystems
Surfing Dos / Don'ts

Do
●
●
●
●
●
●
●

First learn to swim
Get a lesson from a professional
Start in a location with smaller waves (<2m)
Try to ride broken waves first
Use a long board
Learn to dive under waves
Have fun
Don't
● Think you know it all
● Start in a location like Hawaii with very large (>>3m) waves
● Get in the way of experts
DM Multipath: Today

Modular design
● Storage vendor plugins
● path checker, path priorities (userspace)
● error code handling / path initialization (kernel)
● Path selection policies
● Only round robin currently
● UUID calculation
● Path grouping
● Fail-over / fail-back policies

Broad hardware support
● HP, Hitachi, SUN, EMC, NetApp, IBM, etc

Active / Active, Active / Passive
DM Multipath: Future

Multipath root

Alternative load-balancing policies

Request-based multipath (NEC)

Support newest storage arrays

Active / Passive arrays?

Requests?
Logical Volume Management

Volume Management creates a layer of abstraction over the physical storage.

Physical Volumes (disks) are combined into Volume Groups.

Volume Groups are divided into Logical Volumes, like the way in which disks are
divided into partitions.

Logical Volumes are used by Filesystems and Applications (e.g. databases).
Logical Volume Management
Logical Volumes
lvcreate
Volume Group
vgcreate
Physical Volumes
pvcreate
Disks
Advantages of LVM

Filesystems can extend across multiple disks.

Hardware storage configuration hidden from software

change without needing to stop applications or unmount filesystems
Data can be rearranged on disks

e.g. empyting a hot-swappable disk before removing it
Device snapshots can be taken
●
●
●
●
consistent backups
test effect of changes without affecting the real data
LVM2 Features








Concatenation, Striping (RAID 0), Mirroring (RAID 1)
Supports additional RAID levels (3, 5, 6, 10, 0+1) (future)
Snapshots (writeable)
Provides underpinnings for cluster-wide logical volume
management (CLVM)
● Same on-disk metadata format
Integrated into the Anaconda installer allow configurations at
installation time
Replaces LVM1, which was provided in RHEL 3
Easier to use and more configurable (/etc/lvm.conf)
Clean separation of application and kernel runtime mapping
LVM2 Features




Single tool binary designed to contain all tool functions
Column-based display tools with tailored output
LVM2 Metadata
● Concurrent use of more than one on-disk format
● Human-readable text-based format
● Changes happen atomically
● Redundant copies of metadata
● Upwardly compatible with LVM1 enables easy upgrade
● Transactional (journaled) changes
Pvmove based on temporary mirrors (Core Dirty Log)
LVM2: DM Striped / Linear



linear target
– Device name, start sector, length
– <start length 'linear' device offset>.
striped target
– # stripes, striping chunk size, pairs of device name and sector
error target causes any I/O to the mapped sectors to fail

useful for defining gaps in the new logical device (e.g. fake a huge device)
LVM2: DM Mirror (raid1)

Maintains identical copies of data on devices.

Divides the device being copied into regions typically 512KB in size.

Maintains a (small) log with one bit per region indicating whether or not each
region is in sync.

Two logs are available – core or disk.

Parameters are:
mirror <log type> <#log parameters> [<log parameters>] <#mirrors> <device>
<offset> <device> <offset> ...

The disk log parameters are:

<log device> <region size> [<sync>]

<sync> can be sync or nosync to indicate whether or not an initial sync from the
first device is required.
LVM2: DM Mirror (today)

Single node mirror

Clustered mirroring (4.5)

HA LVM
LVM2: DM Mirror (future)

Extend an active mirror

Snapshots of mirrors

Install/boot from mirror

RAID 10 and RAID 01

>2 legged mirror

Clustered Mirror (5.x)

Read-balancing, handling failing devices automatically

Robustness and performance
LVM2: DM Snapshot

An implementation of writable snapshots.

Makes a snapshot of the state of a device at a particular instant

The first time each block is changed after that it makes a copy of the data prior to
the change, so that it can reconstruct the state of the device.

Run fsck on a snapshot of a mounted filesystem to test its integrity to find out
whether the real device needs fsck or not.

Test applications against production data by taking a snapshot and running tests
against the snapshot, leaving the real data untouched.

Take backups from a snapshot for consistency.
LVM2: DM Snapshot

Copy-on-write

Not a backup substitute

Requires minimal storage (5% of origin)

Preserves origin

Allows experimentation

Dropped if full

Resizeable
LVM2: Snapshot Example Uses

Backup on live system

fsck snapshot of live filesystem

Test applications against real data

Xen domU's

Others?
LVM2: Snapshot Futures

Merge changes to a writeable snapshot back into its read-only origin

Robustness and performance

Memory efficiency

Clustered Snapshots

LiveCD + USB

Snapshots of mirrors
Device Mapper Futures
LVM2: DM Raid4/5 (dm-raid45)

Features
● Failed device replacement
● Selectable parity device with RAID 4
● Selectable allocation algorithms (data and parity)
● left/right, symmetric/asymmetric
● Stripe cache (data and parity)
● XOR algorithm for parity

written by Heinz Mauelshagen

http://people.redhat.com/heinzm/sw/dm/dm-raid45/
DM Block Caching Target: dm-cache

write-back or write through local disk cache

intended use is remote block devices (iSCSI, ATAoE)

written by Ming Zhao

technical report on IBM's CyberDigest
● http://tinyurl.com/35qzcg
DM Block Caching Target: dm-hstore

dm-hstore (“Hierachical store”)
● similar to dm-cache
● building block for HSM System
● single host caching
● remote replication
● written by Heinz Mauelshagen

Features
● caches reads and writes to an origin device
● writes data back to origin device
● keep state of extents (eg., uptodate, dirty, ...) on disk
● background initialization (instantaneous creation)
● supports read-only origins
DM Misc Futures

Add a netlink-based mechanism for communication with userspace
● Mike Anderson (IBM)

Reduce kernel stack usage for some targets/paths

Automatically detect and handle changes to physical capacity

A lot of other stuff

Requests?
Transports / Low Level Drivers
ISCSI Initiator

Low-cost Enterprise SAN connectivity

RHEL 3 U4+: linux-iscsi
● Open source Cisco implementation

RHEL4 U2+: linux-iscsi
● Rewrite for 2.6 kernel (based on 2.4 driver)

RHEL5+: open-iscsi.org

Qualified with major storage vendors
● NetApp
● EMC
● EqualLogic
Red Hat Enterprise Linux Host
Cisco iSCSI initiator
Qlogic/Adaptec driver
NIC
iSCSI adapter
TCP/IP
ISCSI
storage controller
(e.g. NetApp)
ISCSI bridge
Fibre Channel
Switch
SAN
FC
host
iSCSI Initiator: Today

open-iscsi.org

RFC 3720 compliant

Flexible transport design
● software iSCSI, hardware iSCSI, iSER

Command-line management (iscsiadm)
● building block for GUIs

Basic iSNS support
iSCSI Initiator: Future

Fully integrated hardware iSCSI (full offload, partial offload)

Improved / more flexible management tools

Install to software iSCSI

Performance improvements

Improved iSNS

BIOS / OF Boot

Requests?
Fibre Channel Device Drivers

RHEL4/5
● Driver versions tracking upstream submissions very closely
● Goal is to keep them current as much as possible (e.g. F/C 4GB, SATA 2)
● Greatly increased support with over 4,000 SCSI devices/paths
(was 256 in RHEL 3)

Each update contains current drivers
● Actively coordinate with Qlogic, Emulex and system vendors
● Integrate key bug fixes
● Aid partners to keep their open source drivers current up stream
● Driver update model (Jon Masters)
Management Tools
Management Tools: system-config-lvm

Transport Protocol agnostic

Simplifies resizing

Future: iSCSI management plugin (currently in 4.5)
Management Tools: Conga

Web browser front-end (luci)

Agent (ricci) serializes requests

Single node or cluster management

Future: Storage Server
● Clustered NFS
● Clustered SAMBA
Management Tools: Conga (screenshots)
Summary

Out of the box support for multipathing, LVM, etc

DM provides very flexible, extensible architecture

New DM targets being developed

Active communities around DM, LVM2, etc

Good management tools (CLI, GUI)
More Information

LVM2
● http://sourceware.org/lvm2/

Device Mapper
● http://sourceware.org/dm

Multipathing
● [email protected]
● http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=Home

iSCSI
● http://www.open-iscsi.org

Conga
● http://sourceware.org/cluster/conga/
More Information

Red Hat Enterprise Linux LVM Administrator's Guide
● http://www.redhat.com/docs/manuals/enterprise/RHEL-5-manual/Cluster_Logical_Volum

Presenter
● Dave Wysochanski ([email protected])

This presentation
● http://people.redhat.com/dwysocha/talks
Backup Slides
Tagging





LVM2 supports two sorts of tags.
Tags can be attached to objects such as PVs, VGs, LVs and
segments.
Tags can be attached to hosts, for example in a cluster
configuration.
Tags are strings using [A-Za-z0-9_+.-] of up to 128
characters and they cannot start with a hyphen. On the
command line they are normally prefixed by @.
LVM1 objects cannot be tagged as the metadata does not
support it.
Tagging – Object Tags

Use --addtag or --deltag with lvchange vgchange pvchange lvcreate or vgcreate.

Only objects in a Volume Group can be tagged. PVs lose their tags when
removed from a VG. This is because tags are stored as part of the VG
metadata. Snapshots cannot be tagged.

Wherever a list of objects is accepted on the command line, a tag can be used.
e.g. lvs @database to lists all the LV with the 'database' tag.

Display tags with lvs -o +tags or pvs -o +tags etc.
Tagging – Host Tags

You can define host tags in the configuration files.

If you set tags { hosttags = 1 }, a hosttag is automatically defined using the
machine's hostname. This lets you use a common config file between all your
machines.

For each host tag, an extra config file is read if it exists: lvm_<hosttag>.conf. And
if that file defines new tags, then further config files will be appended to the list of
files to read in.

tags { tag1 { } tag2 { host_list = [“host1”] } }

This always defines tag1, and defines tag2 if the hostname is host1.
Tagging – Controlling Activation

You can specify in the config file that only certain LVs should be activated on that
host.

e.g. activation { volume_list = [“vg1/lvol0”, “@database” ] }

This acts as a filter for activation requests (like 'vgchange -ay') and only activates
vg1/lvol0, any LVs or VGs with the 'database' tag in the metadata on that host.

There is a special match “@*” which causes a match only if any metadata tag
matches any host tag on that machine.
Tagging – Simple Example

Every machine in the cluster has tags { hosttags = 1 }

You want to activate vg1/lvol2 only on host db2.

Run lvchange --addtag @db2 vg1/lvol2 from any host in the cluster.

Run lvchange -ay vg1/lvol2.

This solution involves storing hostnames inside the VG metadata.
dmsetup

A command line wrapper for communication with the Device Mapper.

Provides complete access to the ioctl commands via libdevmapper.

Examples:
●
●
●
●
●
●
dmsetup version
dmsetup create vol1 table1
dmsetup ls
dmsetup info vol1
dmsetup table vol1
dmsetup info -c