Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Device mapper

News

Linux Disk Management

Linux Networking

Recommended Links

Linux Multipath Biosdevname and renaming of Ethernet interfaces LVM
multipath command Manage Your Disk By UUID udev Device Mapper Multipath Module The Linux USB sub-system Device drivers  
Ext3 Snapshots Software RAID Solaris Multipathing Admin Horror Stories Humor Etc  

The device-mapper is a Linux kernel component introduced in kernel 2.6 that permits mapping one block device onto another. It is used in such subsystems as

Device-mapper works by processing data passed in from on virtual block device that it provides to  another block device. If provides several views on devices including  udev, and UUID

LVM2 and other application that use device mapper talk to it the libdevmapper.so shared library, which in turn issues ioctls to the /dev/mapper/control device node. Developers can also access device-mapper from shell scripts via the dmsetup tool.

Device-mapper has plugins based design. Among available plugins are : This subsystem is the core component of the multipath tool chain. This component fills the following requirements :

How it work

The Device Mapper is configured one map at a time. A device map, also referred to as a table, is a list of segments in the form of :

0 35258368 linear 8:48 65920 
35258368 35258368 linear 8:32 65920 
70516736 17694720 linear 8:16 17694976 
88211456 17694720 linear 8:16 256
The first 2 parameters of each line are the segment starting block in the virtual device and the length of the segment. The next keyword is the target policy (linear). The rest of the line is the target parameters.

The Device Mapper can be fed its tables through the use of a library : libdevmapper. EVMS2, dmsetup, LVM2, the multipath configuration tool and kpartx all link this lib. A table setup boils down to sprintf'ing the right segment definitions in a char *. Should the DM user-kernel interface change from being ioctl based to a pseudo filesystem, the libdevmapper API should remain stable. Here is an example of a multipath target :
                             [----------- 1st path group -----------] [--------- 2nd path group -----------]
0 71014400 multipath 0 0 2 1 round-robin 0 2 1 66:128 1000 65:64 1000 round-robin 0 2 1 8:0 1000 67:192 1000
^     ^       ^      ^ ^ ^ ^      ^      ^ ^ ^   ^      ^
|     |       |      | | | |      |      | | |   |      nb of io to send to this path before switching
|     |       |      | | | |      |      | | |   path major:minor numbers 
|     |       |      | | | |      |      | | number of path arguments 
|     |       |      | | | |      |      | number of paths in this path group
|     |       |      | | | |      |      number of selector arguments
|     |       |      | | | |      path selector
|     |       |      | | | next path group to try
|     |       |      | | number of path groups
|     |       |      | number of hwhandler
|     |       |      number of features
|     |       target name
|     target lenght in 512-bytes blocks
starting offset of the target
For completeness, here is an example of a pure failover target definition for the same LU :


0 71014400 multipath 0 0 4 1 round-robin 0 1 1 66:112 1000 round-robin 0 1 1 67:176 1000 round-robin 0 1 1 68:240 1000 round-robin 0 1 1 65:48 1000

And a full spread (multibus) target one :


0 71014400 multipath 0 0 1 1 round-robin 0 4 1 66:112 1000 67:176 1000 68:240 1000 65:48 1000

Upon device map creation, a new block kernel object named dm-[0-9]* is instantiated, and a hotplug call is triggered. Each device map can be assigned a symbolic name when created through libdevmapper, but this name won't be available anywhere but through a libdevmapper request.

 udev

In Linux 2.6, a new feature was introduced to simplify device management and hot plug capabilities. This feature is called udev and is a standard package in RHEL4 or Oracle Enterprise Linux 4 (OEL4) as well as Novell’s SLES9 and SLES10. Udev is a user space utility for dynamic device node creation. A device node is an entry in the /dev directory; e.g., sda1 or sdh1. One main benefit of udev for Oracle environments, is that it provides persistent disk naming; i.e., prevents device renames upon SCSI reconfigurations, node reboots or even storage recabling. By default, udev is used only to create the device nodes for hot plug devices. Many devices on the system are considered “cold-plug” and will not be handled by default. In order to have udev handle the device node creation for all devices, the udevstart script should be enabled. This will cause all devices on the system to be named by udev. It can also be started manually by calling /sbin/udevstart. Udev uses hot plug events sent by the kernel whenever a device is added or removed from the system. The details about the newly added devices are exported by the kernel to the sysfs filesystem. The sysfs filesystem, which is a new filesystem in the 2.6 kernel, is managed by the kernel and exports basic information about the devices currently plugged into your system. Udev manages the /dev directory by monitoring the /sys directory.

By leveraging the information in sysfs, udev can determine which devices need to have a /dev entry created and the corresponding device names to be used. This infrastructure of udev and sysfs allows predictable behavior when devices are added or removed from the system.

Udev Configuration

Once udev is notified that a device is added, it uses the scsi_id call to retrieve and generate a unique SCSI identifier. The scsi_id call queries a SCSI device via the SCSI INQUIRY command and leverages the vital product data (VPD) page 0x80 or 0x83. The output from this query is used to to generate a value that is unique across all SCSI devices that properly support pages 0x80 or 0x831.

udev uses three basic configuration files that control its functionality.

See the udev man page for details on modifying these files.

udev.conf file

The main udev configuration file, /etc/udev/udev.conf, controls the directory locations for the udev permission and rules files, the udev database, and the default location where udev device nodes are created. The udev man page shows a sample udev.conf file.

udev.permissions

Once udev is started and the device nodes are created, udev applies the ownership and permissions of the device node to what is defined in the *-udev.permissions files in the udev.permissions directory. This is done every time udev is reloaded. This file is essentially used to keep file attributes persistent across reboots. Thus, this file should contain an entry for the disks that will be used for ASM, OCR, and Voting disks. Either specify individual disks or use wildcards.

This file was obsoleted in later versions of the 2.6 kernels such as SLES10, RHEL5 and Oracle Enterprise Linux 5 (OEL5), that functionality was merged into the rules files, which is explained later in this paper.

Starting with Linux kernel 2.6, the hotplug callbacks are provided by the sysfs pseudo filesystem events. This filesystem presents to userspace kernel objects like bus, driver instances or block devices in a hierarchical and homogeneous manner. /sbin/hotplug is called upon file creation and deletion in the sysfs filesystem.
Udev acts as a proxy for sysfs events. The multipath tools collects events from a udev event relaying unix socket.

For our needs this facility provides :

Here is how we use this callbacks for the multipath implementation : Udev is a reimplementation in userspace of the devfs kernel facility. It provides a dynamic /dev space, with an agnostic naming policy. Greg Kroah-Hartman is the original developper of this package, and it now maintained by Kay Sievers. It can be found at http://ftp.kernel.org/pub/linux/utils/kernel/hotplug/

To summarize what implementation details these subsystems fill :

Device mapper multipathing (DM-MP)

DM-MP provides a consistent user interface for storage devices provided by multiple vendors. There is only one block device (/dev/mapper/XXX) for a LUN. This is the device created by device mapper.

Paths are grouped into priority groups, and one of the priority group will be used for I/O, and is called active. A path selector selects a path in the priority group to be used for an I/O based on some load balancing algorithm (for example round-robin).

When a I/O fails in a path, that path gets disabled and the I/O is retried in a different path in the same priority group. If all paths in a priority group fails, a different priority group which is enabled will be selected to send I/O.

DM-MP consists of 4 components:

  1. DM MP kernel module - Kernel module that is responsible for making the multipathing decisions in normal and failure situations.

  2. multipath command - User space tool that allows the user with initial configuration, listing and deletion of multipathed devices.

  3. multipathd daemon - User space daemon that constantly monitors the paths. It marks a path as failed when it finds the path faulty and if all the paths in a priority group are faulty then it switches to the next enable priority group. It keeps checking the failed path, once the failed path comes alive, based on the failback policy, it can activate the path. It provides an CLI to monitor/manage individual paths. It automatically creates device mapper entries when new devices comes into existence.

  4. kpartx - User space command that creates device mapper entries for all the partitions in a multipathed disk/LUN. When the multipath command is invoked, this command automatically gets invoked. For DOS based partitions this command need to be run manually.


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Dec 21, 2011] /dev/dm-0 - LinuxQuestions.org

fdisk -l output in case you are using LVM contains many messages like

Disk /dev/dm-0 doesn't contain a valid partition table

This has been very helpful to me. I found this thread by Goggle on dm-0 because I also got the no partition table error message.

Here is what I think:

When the programs fdisk and sfdisk are run with the option -l and no argument, e.g. # /sbin/fdisk -l

they look for all devices that can have cylinders, heads, sectors, etc.

If they find such a device, they output that information to standard output and they output the partition table to standard output. If there is no partition table, they have an error message (also standard output).
One can see this by piping to 'less', e.g.
# /sbin/fdisk -l | less

/dev/dm-0 ... /dev/dm3 on my fedora C5 system seem to be device mappers associated with LVM.

RAID might also require device mappers.

[Nov 23, 2011] Linux Multipath Focusing on Linux Device Mapper

The HP Blog Hub

Enterprise computing requires consistency especially when it comes to mapping storage luns to unique devices.

In the past several months, I have encountered multiple situations where customers have lost data due to catastrophic corruption within environments utilizing device mapper (multipath) on Linux. Read further to determine if you are at risk of suffering a similar fate.

Background:

Does it matter if a SCSI disk device file or physical disk with which the device files references changes? While the system is up, it would be problematic for a device file to change; however, the kernel will not change an open file descriptor. Therefore, as long as the device is open ( i.e. mounted, activated LVM, etc) the device structure in the kenrel will not change. The issue is what happens when the device is closed.

Using Qlogic or Emulex, every boot can result in a SCSI disk re-enumeration due to the fact that Linux enumerates devices based on scan order. Device Mapper "solves" this condition by providing persistent devices based on a device's inquiry string (GUID).

Does persistent devices matter? Examples:

Linux LVM – NO

Veritas Volume Manager – NO

LABEL – NO

Oracle OCR – YES
RAW devices – YES

/dev/sd# in /etc/fstab -- YES

I recently experienced an Oracle RAC issue where the OCR disks changed on a system while it was booted but before the cluster services were started.

First RAW devices are mapped using a configuration file

# /etc/sysconfig/rawdevices

/dev/raw/raw1 /dev/mpath/mpath1

/dev/raw/raw2 /dev/mpath/mpath2

/dev/raw/raw3 /dev/mpath/mpath3

/dev/raw/raw4 /dev/mpath/mpath4

The problem occurred when device mapper's mapping to the SCSI lun for mpath1 changed to a different LUN after reboot. OCR started and wrote it's header to ASM disks. Needless to say, this is BAD.

Device mapper configuration

Device mapper Persistence is configured via a bindings file defined in multipath.conf

Default location of bindings

either

/etc/multipath/bindings

or

/var/lib/multipath/bindings

The problem is when the bindings file is not available at boot time to dm_mod. In the above scenario, the bindings file was located in a filesystem which had not been mounted yet; therefore, device mapper had no reference to use when creating the mappings to the SCSI luns presented. It was pure luck that the bindings had not changed before it did.

The multipath.conf file reported the bindings file in /var/lib/multipath/bindings. The system booted, eventually having to mount /var covering up the bindings file. Multipath command was ran at some later date when more storage was added to the production system. Multipath command was ran to make note of the new LUNS and the file had to be built.. since there were no entries, all luns were re-enumerated. The cluster services were down on this node… but when the OCR disks were remapped to other SCSI luns and Oracle services were started and this node attempted to join the cluster, the damage was done.

What to look for:

LOGS

Feb 23 13:00:05 protect_the_innocent multipathd: remove mpath15 devmap

Feb 23 13:00:05 protect_the_innocent multipathd: remove mpath28 devmap

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath226 removed

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath227 removed

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath228 removed

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath229 removed

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath230 removed

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath231 removed

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath232 removed

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath233 removed

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath234 removed

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath235 removed

Feb 23 13:00:07 protect_the_innocent multipathd: dm map mpath236 removed

Multipath.conf

Confirm that the multipath.conf is in the root filesytem

There is a RHEL article on this:

https://bugzilla.redhat.com/show_bug.cgi?id=409741

Example:

Boot time

mpath236 (360060e8005709a000000709a000000b3)

multipath command was ran: followed by multipath -l

mpath212 (360060e8005709a000000709a000000b3)

Checking the bindings file:

/var/lib/multipath/bindings:

mpath212 360060e8005709a000000709a000000b3

Solution:

  1. Confirm bindings file is available at boot time. If /var is a filesystem, configuration location of bindings file so that it is in /etc/multipath/bindigs via the multipath.conf file

  2. Create a startup script which creates a historical map of the devices at boot time so that your team will have the ability to see the device map over time

Example command:

# multipath -l | grep -e mpath -e sd | sed -e :a -e '$!N; s/\n/ /; ta'| sed 's/mpath/\nmpath/g' | sed 's/\\_//g'

The above command makes it easy to map mpath device files to their unique SCSI LUNS

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

    Device-mapper Resource Page



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March, 12, 2019