Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better

Recovery of LVM partitions

News

LVM

Recommended Books

Recommended Links Disaster Recovery Relax-and-Recover

Reference

Ext3 filesystem Linux Software RAID Bare metal recovery of Linux systems FIT USB flash drives Direct Disk Editing Partition labels Baseliners
Snapshots dd Recovery of lost files using DD DD Rescue tar Unix cpio Linux Loopback filesystem
Linux Troubleshooting Linux Disk Partitioning Linux SCSI subsystem Missing backup horror stories Sysadmin Horror Stories Humor Etc

Introduction

The price of excessive complexity is the loss of reliability. We recently saw it with Boeing 737 Max (two traffic crashes were not only connected with overcomplexity, but with excessive cost cutting (the chronic disease of neoliberal enterprises, when management is recruited from bean counters who are  mostly concerned with bonuses and please Wall Street analyst, while engineers are pushed to sidelines). The latter results bad/unsafe engineering decisions, which are covered in case of Boeing by regulatory capture.

LVM in case of troubles can be behave exactly like Boeing 737 Max, so beware of adding this complexity layer to mission critical systems unless absolutely necessary. and get all training you can, because you need deep knowledge of LVM is a survival skill in this environment.

When configuring you system with LVM you need understand the gains vs. losses equation here.  With modern RAID controllers, capable of RAID 6  with a spare the need to LVM for providing better data security (the ability to discard second PV in case of disk crash in two or more PV configuration and save data on the first PV) is not obvious if you have a single PV for partition.  If partition is static then the question arise what benefits you get from converting it to a LVM partition?

In this case RAID is the most probably point of failure. If you have two PVs then LVM does provide additional reliability: if the second PV dies you can exclude it from LVM partition and still recover the data stored on the first PV (data on PV2 will be lost) 

In any case installing monitoring of your server via DRAC/ILO does not to ensure recovery from harddrives failures than LVM.

If you RAID5 is unattended and is not monitored by a script it does not matter what configuration is installed -- at some point you will have enough crashed disks to render the partition unreadable.

So there are cases when LVM makes sense and there many cases when it is installed but does not makes any sense. LVM dies provide additional flexibility if one or several of your partitions are dynamic and need to be expanded periodically. It also can provide "poor man" protection against bad properties of RAID5 on controllers that does not  support RAID6. In this case you can create several virtual drives on the RAID controller, and  thus localize the failure to only one of such  drives. Those virtual drives can be combined via LVM into a single huge partition which definitely increase the resiliency. If your RAID controller supports spare drive and you are force to use RAID 5 you need to use this option. Always ! There no other way to increate reliability of this configuration.

Bu the main usage case for LVM remains the presence of the highly dynamic filesystem with unpredictable size. Actually adding space to existing partition in Unix is easy as you always move data and use symlinks to the new (overflow) partition so this flexibility is often overrated. Benefits vs. pitfalls need to evaluated on case by case basis. 

This statement is never more true then in attempts to recover lost data of LVM volumes.  Due to additional layer of complexity introduced and one side effect of existence of this layer is the recovery of damaged volumes became much more complex. You really needs understanding of "nuts and bolts" of LVM,  which is  not a strong point of a typical Linux sysadmin. Moreover LVM is badly (for such a critical subsystem) documented. Most of documentation that exists it iether outdated or irrelevant for the purposes of recovery of damaged volumes, or outright junk (like most docs provided by Red Hat).

For example, try to find a good documentation on how to deal with the case when you have an LVM partition created from two PV (say both  RAID 5) and the second PV crashed badly (say has two failed harddrives on RAID5 virtual disk, which makes RAID 5 recovery impossible), while  the first PV is intact.  The best that exists is from  SUSE and the documents it was not updated since 2007:

For example the commands

  1. Create the LVM meta data on the new disk using the old disk's UUID that pvscan displays.
    ls-lvm:~ # pvcreate --uuid 56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu /dev/sdc
      Physical volume "/dev/sdc" successfully created

Should be

pvcreate --uuid  56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu --restorefile /etc/lvm/backup/vg1 /dev/sdc

Also in after recovery you need to restore /etc/fstab and the reboot the system. Red Hat for some reason refuses to mount partition that was commented out in /etc/fstab.

That means that No.1  solution to LVM problems is avoidance. For example, you should never put root filesystem or /boot filesystem and other small OS related partitions on LVM volume. In case of root partition you get additional complexity for no added benefits (on modern disk you can allocate enough space for root partition to never worry about its size), despite the fact that this is the recommended by Red Hat configuration. But Red Hat does a lot questionable things. Just looks at systemd. So any Red Hat recommendation should be treated with  a grain of salt.

You probably should never put root filesystem and other relatively small OS related partitions on LVM volume. In case of root partition you get additional complexity for no added benefits (on modern disk you can allocate enough space for root partition to never worry about its size), despite the fact that this is the recommended by Red Hat configuration. But Red Hat does a lot questionable things. Just looks at systemd, so any Red Hat recommendation should be treated with  a grain of salt.

With the current size of harddrives using LVM for OS (which now often  resided in a since  partition (with only /tmp and /home being separate partitions) this  just does not make any sense. Without /var /opt and /tmp the root partition is usually less then 16GB and if you allocate, say, 32GB to root partition chances that you ever run of space during the lifetime of the system are very slim. Even for /var you can move largest set of logs to other partitions,  making the size of this partitions more or less static even for the most log intensive services such as corporate proxy server or a major webserver. 

And such a layout does decrease reliability of the system and you ability to troubleshoot it in case of troubles.  If your company specification demand using this solution, the second viable solution is to always have recovery flash drives (FIT flash drives now can be as large as 512GB) with your current configuration, for example created by Relax-and-Recover.

It is just stupid trying to restore your OS installation and spend hours on learning LVM intricacies attaching this problem directly in front it if you can restore the server in less then an hour  using ISO or USB drive. Dell DRAC allow you to have a SD card on which you can keep such an image. That's a big plus for Dell servers. 

As with everything the more you do before the problem occurs, the less you need to do during the emergency. Having a current baseline of the server with all the necessary configuration files helps greatly. It takes a couple of minute to create and absence of the current baseline of the server (at least sosreport in RHEL, supportconfig in Suse)  on some NFS volume or remote server.

Absence of baseline, or non-current data about the crashed server configuration is the most serious blunder sysadmin can commit in his professional career.  Never delete your SOS reports and run SOS at least monthly (so you need to store only 12 reports). Such report contains vital recovery information.

Having an overnight backup of data typically moves the situation from SNAFU to a nuisance category. Also you should create a copy of /etc directory ( or better the whole root partition) on each successful reboot and store it in USB drive or on an external filesystem/remote server.   This trick, when you have access to iether cpio copy or root partition or tarball of /etc directory in /boot provides you with the  set of the most important parameters for server restoration, that might greatly help during the recovery. 

Another important tip of using LVM is to always to have at least 2GB free space on each volute group. That allow using snapshots during patching and similar operations when you can damage or destroy LVM. Learning how to use snapshots is a must for any sysadmin that uses LVM is the business critical environment.   but with a typical size on root partition of less then 1GB backup is a good as snapshot and you should never start patching without backup anyway.

Having the root filesystem on LVM can additionally complicate recovery of damaged file systems, so this is one thing that is probably prudent to avoid. There is no justification of putting operating system partitions on LVM with modern disks.

If your root partition is outside LVM you at least can edit files at /etc directory without jumping through hoops .

Complexity of recovery also depends on Linux distribution you are using.  For example, SLES DVD in rescue mode automatically recognizes the LVM group so it make recovery of LVM somewhat simpler, unless this is a major screw-up. 

Generally you need to have a copy of   /etc/lvm/backup/volume_name  to access the LVM partitions. That's why it is prudent to backup root partition at least weekly and  backup /etc directory on each login. You can write such data on a flash card of the server or blade (vFlash in Dell), or a FIT form factor flash drive permanently installed in one of USB ports.  The same FIT flash drive can contain a tar ball of major filesystems as provided by Relax-and-Recover.   With the current prices there is no excuse not to use FIT drive as a recovery medium. As of November 2016 128GB SanDisk FIT drive costs around $22. 64GB - $15, 32Gb is $8.  Samsung FIT form factor USB flash drives are even cheaper.

In large enterprise environment you can use a dedicated server for such "partial" backups and bare metal recovery ISO files. You should do it yourself, as central backup in large, typically highly bureaucratized corporation is often unreliable.  In other words you do need to practice "Shadow IT".

Centralized backup often it is performed by the other department (operators on night shift) and restore is delegated to operators too, who can screw-up already damaged system further instead of helping to recover it just due to the lack of knowledge and understanding of the environment. Then you will need to deal with two problems :-(. 

Creating private backups

Another typical example of "shadow IT" are private backups and general "abuse" of USB drives, as a reaction to switching to backups over WAN, or outsourcing this operation. They also might be result of some high level brass attempt to hide nefarious activities. 

Now we know  for sure that  emails on her  private, recklessly created email server were intercepted by a foreign power.  And it was not Russia.

Typical enterprise backup tools such as HP Data Protector can create a horrible mess: it tend to stop making backup on its own and personnel often overlook this fact until it's too late. Often users stop trusting central backup after one or two accidents in which they lost data. With the  capacity of modern USB drives (256GB for falsh drives and 3TB for portable USB) any sysadmin can make "private" backups  at least of the OS and critical files to avoid downtime.  But road to hell is paved with good intentions. Even if this is done on site such a move represents some security risk. If backup stored offsite in the cloud -- a huge security risk.

Some example of private backups are images on OS done using a FIT (small factor) flash drive to keep local copy of OS and key files. Those can be viewed as kind of "self-preservation" strategy that can help to avoid frantic efforts to restore damaged servers after power loss (typically such "external" events provoke additional failures and if at this point RAID5 array has one bad drive and the second failed your data are in danger).  Linux LVM is another frequent point failure.

Generally you need to have a copy of   of /etc/ on the first login during the particular day. Especially important is a copy of  /etc/lvm/backup/volume_name  which contain vital information for recovery of mangled LVM partitions.

That's why it is prudent to backup root partition at least weekly and  backup /etc directory on the first login during particular day. You can write such data on a flash card of the server or blade (vFlash in Dell) which provide this operation a patina of legitimacy, or a FIT form factor flash drive permanently installed in one of USB ports. 

The same FIT flash drive can contain a tar ball of major filesystems as provided by Relax-and-Recover.   With the current prices there is no excuse not to use FIT drive as a recovery medium. As of October 2019 256GB SanDisk FIT drive costs around $36.  Samsung also sell FIT form factor USB flash drives.

In large enterprise environment you can use a dedicated server for such "partial" backups and bare metal recovery ISO files. You should do it yourself, as central backup in large, typically highly bureaucratized corporation is often unreliable.  In other words you do need to practice "Shadow IT".

Centralized backup often it is performed by the other department (operators on night shift) and restore is delegated to operators too, who can screw-up already damaged system further instead of helping to recover it just due to the lack of knowledge and understanding of the environment. Then you will need to deal with two problems :-(.

 

Strategies

There are two strategies of recovery of LVM volume.

  1. When hardware is fully operational and data are "mostly" intact:

  2. When harddrive crashed or content is mangled beyond recognition (for example, you have RAID5 PV with two crashed disks). 

    Here you typically face a serious data loss. You can try to recover the last crashed disk in RIAD5 configuration via OnTrack or similar recovery services. But there is not guarantee that the recovered disk will be accepted by RAID controller.

    If you partition consists of two PV and only the second PV crashed you can replace failed drive, recreate the same RAID5 partition and follow Novell recommendation for such a case. Usually you will be able to recover the data on the first PV. See Disk Permanently Removed

    Some corrections:

Never use LVM with Linux software RAID for important data

Software RAID in Linux is generally invitation to troubles. This is a badly written and badly integrated subsystem. Unfortunately Red Hat popularized this horrible mess by including it in the certification.

Combination of Linux software RAID and LVM is especially toxic. As Richard Bullington-McGuire noted (Recovery of RAID and LVM2 Volumes, April 28, 2006 | )

The combination of Linux software RAID (Redundant Array of Inexpensive Disks) and LVM2 (Logical Volume Manager, version 2) offered in modern Linux operating systems offers both robustness and flexibility, but at the cost of complexity should you ever need to recover data from a drive formatted with software RAID and LVM2 partitions. I found this out the hard way when I recently tried to mount a system disk created with RAID and LVM2 on a different computer. The first attempts to read the filesystems on the disk failed in a frustrating manner.

I had attempted to put two hard disks into a small-form-factor computer that was really only designed to hold only one hard disk, running the disks as a mirrored RAID 1 volume. (I refer to that system as raidbox for the remainder of this article.) This attempt did not work, alas. After running for a few hours, it would power-off with an automatic thermal shutdown failure. I already had taken the system apart and started re-installing with only one disk when I realized there were some files on the old RAID volume that I wanted to retrieve.

Recovering the data would have been easy if the system did not use RAID or LVM2. The steps would have been to connect the old drive to another computer, mount the filesystem and copy the files from the failed volume. I first attempted to do so, using a computer I refer to as recoverybox, but this attempt met with frustration.

Importance of having fresh backup and baseline

As always, the most critical thing that distinguishes minor inconvenience from the major SNAFU is the availability of up-to-date backups. There is no replacement for up-to-date backups and baseline (for example, creating the baseline of /etc directory on each login can save you from a lot of troubles) and spending enough time and effort on this issue is really critical for recovery of major LVM screw-ups.

It is much better to spend a couple of hours organizing some additional, private backup and automatic taking of baseline of at least /etc directory on each your login, then to spend 10 hours in cold sweet trying to recover horribly messed LVM partition.  So, on a fundamental level, this is a question of your priorities ;-)

The second option is to have support contract and insist on kernel engineer to perform the recovery ;-).  That might well be not the same day recovery, but a good kernel engineer can do amazing things with the messed system.

Some information about the recovery process

There are very few good articles on the Net that describe the nuts and bolts of the recovery process. I have found just two:

You need to study both first, before jumping into action.  and don't forget to make a dd copies of the disk before attempting recovery.  Instead of a harddrives you can actually work with the images via loopback interface, as described in  LVM partitions recovery - Skytechwiki

One of the most tragic blunders in recovery is the loss on initial configuration.  If you do not have enough disks buy them ASAP. Your data are much more valuable.

Novell recommendations

The information below is from Cool Solutions Recovering a Lost LVM Volume Disk

Logical Volume Management (LVM) provides a high level, flexible view of a server's disk storage. Though robust, problems can occur. The purpose of this document is to review the recovery process when a disk is missing or damaged, and then apply that process to plausible examples. When a disk is accidentally removed or damaged in some way that adversely affects the logical volume, the general recovery process is:
  1. Replace the failed or missing disk
  2. Restore the missing disk's UUID
  3. Restore the LVM meta data
  4. Repair the file system on the LVM device

The recovery process will be demonstrated in three specific cases:

  1. A disk belonging to a logical volume group is removed from the server
  2. The LVM meta data is damaged or corrupted
  3. One disk in a multi-disk volume group has been permanently removed

This article discusses how to restore the LVM meta data. This is a risky proposition. If you restore invalid information, you can loose all the data on the LVM device. An important part of LVM recovery is having backups of the meta data to begin with, and knowing how it's supposed to look when everything is running smoothly. LVM keeps backup and archive copies of it's meta data in /etc/lvm/backup and /etc/lvm/archive. Backup these directories regularly, and be familiar with their contents. You should also manually backup the LVM meta data with vgcfgbackup before starting any maintenance projects on your LVM volumes.

If you are planning on removing a disk from the server that belongs to a volume group, you should refer to the LVM HOWTO before doing so.

Server Configuration

In all three examples, a server with SUSE Linux Enterprise Server 10 with Service Pack 1 (SLES10 SP1) will be used with LVM version 2. The examples will use a volume group called "sales" with a linear logical volume called "reports". The logical volume and it's mount point are shown below. You will need to substitute your mount points and volume names as needed to match your specific environment.

ls-lvm:~ # cat /proc/partitions
major minor  #blocks  name

   8     0    4194304 sda
   8     1     514048 sda1
   8     2    1052257 sda2
   8     3          1 sda3
   8     5     248976 sda5
   8    16     524288 sdb
   8    32     524288 sdc
   8    48     524288 sdd

ls-lvm:~ # pvcreate /dev/sda5 /dev/sd[b-d]
  Physical volume "/dev/sda5" successfully created
  Physical volume "/dev/sdb" successfully created
  Physical volume "/dev/sdc" successfully created
  Physical volume "/dev/sdd" successfully created

ls-lvm:~ # vgcreate sales /dev/sda5 /dev/sd[b-d]
  Volume group "sales" successfully created

ls-lvm:~ # lvcreate -n reports -L +1G sales
  Logical volume "reports" created

ls-lvm:~ # pvscan
  PV /dev/sda5   VG sales   lvm2 [240.00 MB / 240.00 MB free]
  PV /dev/sdb    VG sales   lvm2 [508.00 MB / 0    free]
  PV /dev/sdc    VG sales   lvm2 [508.00 MB / 0    free]
  PV /dev/sdd    VG sales   lvm2 [508.00 MB / 500.00 MB free]
  Total: 4 [1.72 GB] / in use: 4 [1.72 GB] / in no VG: 0 [0   ]

ls-lvm:~ # vgs
  VG    #PV #LV #SN Attr   VSize VFree
  sales   4   1   0 wz--n- 1.72G 740.00M

ls-lvm:~ # lvs
  LV      VG    Attr   LSize Origin Snap%  Move Log Copy%
  reports sales -wi-ao 1.00G

ls-lvm:~ # mount | grep sales
/dev/mapper/sales-reports on /sales/reports type ext3 (rw)

ls-lvm:~ # df -h /sales/reports
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/sales-reports
                     1008M   33M  925M   4% /sales/reports

Disk Belonging to a Volume Group Removed

Removing a disk, belonging to a logical volume group, from the server may sound a bit strange, but with Storage Area Networks (SAN) or fast paced schedules, it happens.

Symptom:

The first thing you may notice when the server boots are messages like:

"Couldn't find all physical volumes for volume group sales."
"Couldn't find device with uuid '56pgEk-0zLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'."
'Volume group "sales" not found'

If you are automatically mounting /dev/sales/reports, then the server will fail to boot and prompt you to login as root to fix the problem.

Boot failed due to invalid fstab entry
  1. Type root's password.
  2. Edit the /etc/fstab file.
  3. Comment out the line with /dev/sales/report
  4. Reboot

The LVM symptom is a missing sales volume group. Typing cat /proc/partitions confirms the server is missing one of it's disks.

ls-lvm:~ # cat /proc/partitions
major minor  #blocks  name

   8     0    4194304 sda
   8     1     514048 sda1
   8     2    1052257 sda2
   8     3          1 sda3
   8     5     248976 sda5
   8    16     524288 sdb
   8    32     524288 sdc

ls-lvm:~ # pvscan
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  PV /dev/sda5        VG sales   lvm2 [240.00 MB / 240.00 MB free]
  PV /dev/sdb         VG sales   lvm2 [508.00 MB / 0    free]
  PV unknown device   VG sales   lvm2 [508.00 MB / 0    free]
  PV /dev/sdc         VG sales   lvm2 [508.00 MB / 500.00 MB free]
  Total: 4 [1.72 GB] / in use: 4 [1.72 GB] / in no VG: 0 [0   ]

Solution:

  1. Fortunately, the meta data and file system on the disk that was /dev/sdc are intact.
  2. So the recovery is to just put the disk back.
  3. Reboot the server.
  4. The /etc/init.d/boot.lvm start script will scan and activate the volume group at boot time.
  5. Don't forget to uncomment the /dev/sales/reports device in the /etc/fstab file.

 

If this procedure does not work, then you may have corrupt LVM meta data.

Corrupted LVM Meta Data

The LVM meta data does not get corrupted very often; but when it does, the file system on the LVM logical volume should also be considered unstable. The goal is to recover the LVM volume, and then check file system integrity.

Symptom 1:

Attempting to activate the volume group gives the following:

ls-lvm:~ # vgchange -ay sales
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  Couldn't read volume group metadata.
  Volume group sales metadata is inconsistent
  Volume group for uuid not found: m4Cg2vkBVSGe1qSMNDf63v3fDHqN4uEkmWoTq5TpHpRQwmnAGD18r44OshLdHj05
  0 logical volume(s) in volume group "sales" now active

This symptom is the result of a minor change in the meta data. In fact, only three bytes were overwritten. Since only a portion of the meta data was damaged, LVM can compare it's internal check sum against the meta data on the device and know it's wrong. There is enough meta data for LVM to know that the "sales" volume group and devices exit, but are unreadable.

ls-lvm:~ # pvscan
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  /dev/sdc: Checksum error
  PV /dev/sda5   VG sales   lvm2 [240.00 MB / 240.00 MB free]
  PV /dev/sdb    VG sales   lvm2 [508.00 MB / 0    free]
  PV /dev/sdc    VG sales   lvm2 [508.00 MB / 0    free]
  PV /dev/sdd    VG sales   lvm2 [508.00 MB / 500.00 MB free]
  Total: 4 [1.72 GB] / in use: 4 [1.72 GB] / in no VG: 0 [0   ]

Notice pvscan shows all devices present and associated with the sales volume group. It's not the device UUID that is not found, but the volume group UUID.

Solution 1:

  1. Since the disk was never removed, leave it as is.
  2. There were no device UUID errors, so don't attempt to restore the UUIDs.
  3. This is a good candidate to just try restoring the LVM meta data.

     

    ls-lvm:~ # vgcfgrestore sales
      /dev/sdc: Checksum error
      /dev/sdc: Checksum error
      Restored volume group sales
    
    ls-lvm:~ # vgchange -ay sales
      1 logical volume(s) in volume group "sales" now active
    
    ls-lvm:~ # pvscan
      PV /dev/sda5   VG sales   lvm2 [240.00 MB / 240.00 MB free]
      PV /dev/sdb    VG sales   lvm2 [508.00 MB / 0    free]
      PV /dev/sdc    VG sales   lvm2 [508.00 MB / 0    free]
      PV /dev/sdd    VG sales   lvm2 [508.00 MB / 500.00 MB free]
      Total: 4 [1.72 GB] / in use: 4 [1.72 GB] / in no VG: 0 [0   ]
    
  4. Run a file system check on /dev/sales/reports.
    ls-lvm:~ # e2fsck /dev/sales/reports
    e2fsck 1.38 (30-Jun-2005)
    /dev/sales/reports: clean, 961/131072 files, 257431/262144 blocks
    
    ls-lvm:~ # mount /dev/sales/reports /sales/reports/
    
    ls-lvm:~ # df -h /sales/reports/
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/mapper/sales-reports
                         1008M  990M     0 100% /sales/reports
    

Symptom 2:

Minor damage to the LVM meta data is easily fixed with vgcfgrestore. If the meta data is gone, or severely damaged, then LVM will consider that disk as an "unknown device." If the volume group contains only one disk, then the volume group and it's logical volumes will simply be gone. In this case the symptom is the same as if the disk was accidentally removed, with the exception of the device name. Since /dev/sdc was not actually removed from the server, the devices are still labeled a through d.

ls-lvm:~ # pvscan
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
  PV /dev/sda5        VG sales   lvm2 [240.00 MB / 240.00 MB free]
  PV /dev/sdb         VG sales   lvm2 [508.00 MB / 0    free]
  PV unknown device   VG sales   lvm2 [508.00 MB / 0    free]
  PV /dev/sdd         VG sales   lvm2 [508.00 MB / 500.00 MB free]
  Total: 4 [1.72 GB] / in use: 4 [1.72 GB] / in no VG: 0 [0   ]

Solution 2:

  1. First, replace the disk. Most likely the disk is already there, just damaged.
  2. Since the UUID on /dev/sdc is not there, a vgcfgrestore will not work.
    ls-lvm:~ # vgcfgrestore sales
      Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
      Couldn't find all physical volumes for volume group sales.
      Restore failed.
    
  3. Comparing the output of cat /proc/partitions and pvscan shows the missing device is /dev/sdc, and pvscan shows which UUID it needs for that device. So, copy and paste the UUID that pvscan shows for /dev/sdc.
    ls-lvm:~ # pvcreate --uuid 56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu /dev/sdc
      Physical volume "/dev/sdc" successfully created
    
  4. Restore the LVM meta data
    ls-lvm:~ # vgcfgrestore sales
      Restored volume group sales
    
    ls-lvm:~ # vgscan
      Reading all physical volumes.  This may take a while...
      Found volume group "sales" using metadata type lvm2
    
    ls-lvm:~ # vgchange -ay sales
      1 logical volume(s) in volume group "sales" now active
    
  5. Run a file system check on /dev/sales/reports.
    ls-lvm:~ # e2fsck /dev/sales/reports
    e2fsck 1.38 (30-Jun-2005)
    /dev/sales/reports: clean, 961/131072 files, 257431/262144 blocks
    
    ls-lvm:~ # mount /dev/sales/reports /sales/reports/
    
    ls-lvm:~ # df -h /sales/reports
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/mapper/sales-reports
                         1008M  990M     0 100% /sales/reports
    
    

Disk Permanently Removed

This is the most severe case. Obviously if the disk is gone and unrecoverable, the data on that disk is likewise unrecoverable. This is a great time to feel good knowing you have a solid backup to rely on. However, if the good feelings are gone, and there is no backup, how do you recover as much data as possible from the remaining disks in the volume group? No attempt will be made to address the data on the unrecoverable disk; this topic will be left to the data recovery experts.

Symptom:

The symptom will be the same as Symptom 2 in the Corrupted LVM Meta Data section above. You will see errors about an "unknown device" and missing device with UUID.

Solution:

  1. Add a replacement disk to the server. Make sure the disk is empty.
  2. Create the LVM meta data on the new disk using the old disk's UUID that pvscan displays.
    ls-lvm:~ # pvcreate --uuid 56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu /dev/sdc
      Physical volume "/dev/sdc" successfully created
    
  3. NOTE: the command should be pvcreate --uuid  56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu --restorefile /etc/lvm/backup/vg1 /dev/sdc
  4. Restore the backup copy of the LVM meta data for the sales volume group.
    ls-lvm:~ # vgcfgrestore sales
      Restored volume group sales
    
    ls-lvm:~ # vgscan
      Reading all physical volumes.  This may take a while...
      Found volume group "sales" using metadata type lvm2
    
    ls-lvm:~ # vgchange -ay sales
      1 logical volume(s) in volume group "sales" now active
    
  5. Run a file system check to rebuild the file system.
    ls-lvm:~ # e2fsck -y /dev/sales/reports
    e2fsck 1.38 (30-Jun-2005)
    --snip--
    Free inodes count wrong for group #5 (16258, counted=16384).
    Fix? yes
    
    Free inodes count wrong (130111, counted=130237).
    Fix? yes
    
    /dev/sales/reports: ***** FILE SYSTEM WAS MODIFIED *****
    /dev/sales/reports: 835/131072 files (5.7% non-contiguous), 137213/262144 blocks
    
  6. Mount the file system and recover as much data as possible.
  7. NOTE: If the missing disk contains the beginning of the file system, then the file system's superblock will be missing. You will need to rebuild or use an alternate superblock. Restoring a file system superblock is outside the scope of this article, please refer to your file system's documentation.

Conclusion

LVM by default keeps backup copies of it's meta data for all LVM devices. These backup files are stored in /etc/lvm/backup and /etc/lvm/archive. If a disk is removed or the meta data gets damaged in some way, it can be easily restored, if you have backups of the meta data. This is why it is highly recommended to never turn off LVM's auto backup feature. Even if a disk is permanently removed from the volume group, it can be reconstructed, and often times the remaining data on the file system recovered.

Additional References

Till Brehm recommendations

the second valuable sourse of information about recovery process is  Recover Data From RAID1 LVM Partitions With Knoppix Linux LiveCD :

Version 1.0
Author: Till Brehm <t.brehm [at] projektfarm [dot] com>
Last edited: 04/11/2007

This tutorial describes how to rescue data from a single hard disk that was part of a LVM2 RAID1 setup like it is created by e.g the Fedora Core installer. Why is it so problematic to recover the data? Every single hard disk that formerly was a part of a LVM RAID1 setup contains all data that was stored in the RAID, but the hard disk cannot simply be mounted. First, a RAID setup must be configured for the partition(s) and then LVM must be set up to use this (these) RAID partition(s) before you will be able to mount it. I will use the Knoppix Linux LiveCD to do the data recovery.

Prerequisites

I used a Knoppix 5.1 LiveCD for this tutorial. Download the CD ISO image from here and burn it on CD, then connect the hard disk which contains the RAID partition(s) to the IDE / ATA controller of your mainboard, put the Knoppix CD in your CD drive and boot from the CD.

The hard disk I used is an IDE drive that is attached to the first IDE controller (hda). In my case, the hard disk contained only one partition.

Restoring The Raid

After Knoppix has booted, open a shell and execute the command:

sudo su

to become the root user.

As I don't have the mdadm.conf file from the original configuration, I create it with this command:

mdadm --examine --scan /dev/hda1 >> /etc/mdadm/mdadm.conf

The result should be similar to this one:

DEVICE partitions
CREATE owner=root group=disk mode=0660 auto=yes metadata=1
MAILADDR root
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=a28090aa:6893be8b:c4024dfc:29cdb07a

Edit the file and add devices=/dev/hda1,missing at the end of the line that describes the RAID array.

vi /etc/mdadm/mdadm.conf

Finally the file looks like this:

DEVICE partitions
CREATE owner=root group=disk mode=0660 auto=yes metadata=1
MAILADDR root
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=a28090aa:6893be8b:c4024dfc:29cdb07a devices=/dev/hda1,missing

The string /dev/hda1 is the hardware device and missing means that the second disk in this RAID array is not present at the moment.

Edit the file /etc/default/mdadm:

and change the line:

AUTOSTART=false

to:

AUTOSTART=true

Now we can start our RAID setup:

/etc/init.d/mdadm start
/etc/init.d/mdadm-raid start	

To check if our RAID device is ok, run the command:

cat /proc/mdstat

The output should look like this:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [ra id10]
md0 : active raid1 hda1[1]
293049600 blocks [2/1] [_U]
unused devices: 

	

Recovering The LVM Setup

The LVM configuration file cannot be created by an easy command like the mdadm.conf, but LVM stores one or more copy(s) of the configuration file content at the beginning of the partition. I use the command dd to extract the first part of the partition and write it to a text file:

dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt

Open the file with a text editor:

vi /tmp/md0.txt

You will find some binary data first and then a configuration file part like this:

VolGroup00 {
	id = "evRkPK-aCjV-HiHY-oaaD-SwUO-zN7A-LyRhoj"
	seqno = 2
	status = ["RESIZEABLE", "READ", "WRITE"]
	extent_size = 65536		# 32 Megabytes
	max_lv = 0
	max_pv = 0

	physical_volumes {

		pv0 {
			id = "uMJ8uM-sfTJ-La9j-oIuy-W3NX-ObiT-n464Rv"
			device = "/dev/md0"	# Hint only

			status = ["ALLOCATABLE"]
			pe_start = 384
			pe_count = 8943	# 279,469 Gigabytes
		}
	}

	logical_volumes {

		LogVol00 {
			id = "ohesOX-VRSi-CsnK-PUoI-GjUE-0nT7-ltxWoy"
			status = ["READ", "WRITE", "VISIBLE"]
			segment_count = 1

			segment1 {
				start_extent = 0
				extent_count = 8942	# 279,438 Gigabytes

				type = "striped"
				stripe_count = 1	# linear

				stripes = [
					"pv0", 0
				]
			}
		}
	}
}

Create the file /etc/lvm/backup/VolGroup00:

vi /etc/lvm/backup/VolGroup00

and insert the configuration data so the file looks similar to the above example.

Now we can start LVM:

/etc/init.d/lvm start

Read in the volume:

vgscan

Reading all physical volumes. This may take a while...
Found volume group "VolGroup00" using metadata type lvm2
pvscan
PV /dev/md0 VG VolGroup00 lvm2 [279,47 GB / 32,00 MB free]
Total: 1 [279,47 GB] / in use: 1 [279,47 GB] / in no VG: 0 [0 ]

Now activate the volume:

vgchange VolGroup00 -a y

1 logical volume(s) in volume group "VolGroup00" now active

Now we are able to mount the partition to /mnt/data:

mkdir /mnt/data
mount /dev/VolGroup00/LogVol00 /mnt/data/

If you recover data from a hard disk with filenames in UTF-8 format, it might be necessary to convert them to your current non-UTF-8 locale. In my case, the RAID hard disk is from a Fedora Core system with UTF-8 encoded filenames. My target locale is ISO-8859-1. In this case, the Perl script convmv helps to convert the filenames to the target locale.

Installation Of convmv

cd /tmp
wget http://j3e.de/linux/convmv/convmv-1.10.tar.gz
tar xvfz convmv-1.10.tar.gz
cd convmv-1.10
cp convmv /usr/bin/convmv

To convert all filenames in /mnt/data to the ISO-8859-1 locale, run this command:

convmv -f UTF-8 -t ISO-8859-1 -r --notest /mnt/data/*

If you want to test the conversion first, use:

convmv -f UTF-8 -t ISO-8859-1 -r /mnt/data/*


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Nov 09, 2019] Mirroring a running system into a ramdisk Oracle Linux Blog

Nov 09, 2019 | blogs.oracle.com

javascript:void(0)

Mirroring a running system into a ramdisk Greg Marsden

In this blog post, Oracle Linux kernel developer William Roche presents a method to mirror a running system into a ramdisk.

A RAM mirrored System ?

There are cases where a system can boot correctly but after some time, can lose its system disk access - for example an iSCSI system disk configuration that has network issues, or any other disk driver problem. Once the system disk is no longer accessible, we rapidly face a hang situation followed by I/O failures, without the possibility of local investigation on this machine. I/O errors can be reported on the console:

 XFS (dm-0): Log I/O Error Detected....

Or losing access to basic commands like:

# ls
-bash: /bin/ls: Input/output error

The approach presented here allows a small system disk space to be mirrored in memory to avoid the above I/O failures situation, which provides the ability to investigate the reasons for the disk loss. The system disk loss will be noticed as an I/O hang, at which point there will be a transition to use only the ram-disk.

To enable this, the Oracle Linux developer Philip "Bryce" Copeland created the following method (more details will follow):

Disk and memory sizes:

As we are going to mirror the entire system installation to the memory, this system installation image has to fit in a fraction of the memory - giving enough memory room to hold the mirror image and necessary running space.

Of course this is a trade-off between the memory available to the server and the minimal disk size needed to run the system. For example a 12GB disk space can be used for a minimal system installation on a 16GB memory machine.

A standard Oracle Linux installation uses XFS as root fs, which (currently) can't be shrunk. In order to generate a usable "small enough" system, it is recommended to proceed to the OS installation on a correctly sized disk space. Of course, a correctly sized installation location can be created using partitions of large physical disk. Then, the needed application filesystems can be mounted from their current installation disk(s). Some system adjustments may also be required (services added, configuration changes, etc...).

This configuration phase should not be underestimated as it can be difficult to separate the system from the needed applications, and keeping both on the same space could be too large for a RAM disk mirroring.

The idea is not to keep an entire system load active when losing disks access, but to be able to have enough system to avoid system commands access failure and analyze the situation.

We are also going to avoid the use of swap. When the system disk access is lost, we don't want to require it for swap data. Also, we don't want to use more memory space to hold a swap space mirror. The memory is better used directly by the system itself.

The system installation can have a swap space (for example a 1.2GB space on our 12GB disk example) but we are neither going to mirror it nor use it.

Our 12GB disk example could be used with: 1GB /boot space, 11GB LVM Space (1.2GB swap volume, 9.8 GB root volume).

Ramdisk memory footprint:

The ramdisk size has to be a little larger (8M) than the root volume size that we are going to mirror, making room for metadata. But we can deal with 2 types of ramdisk:

We can expect roughly 30% to 50% memory space gain from zram compared to brd, but zram must use 4k I/O blocks only. This means that the filesystem used for root has to only deal with a multiple of 4k I/Os.

Basic commands:

Here is a simple list of commands to manually create and use a ramdisk and mirror the root filesystem space. We create a temporary configuration that needs to be undone or the subsequent reboot will not work. But we also provide below a way of automating at startup and shutdown.

Note the root volume size (considered to be ol/root in this example):

?
1 2 3 # lvs --units k -o lv_size ol/root LSize 10268672.00k

Create a ramdisk a little larger than that (at least 8M larger):

?
1 # modprobe brd rd_nr=1 rd_size=$((10268672 + 8*1024))

Verify the created disk:

?
1 2 3 # lsblk /dev/ram0 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT ram0 1:0 0 9.8G 0 disk

Put the disk under lvm control

?
1 2 3 4 5 6 7 8 9 # pvcreate /dev/ram0 Physical volume "/dev/ram0" successfully created. # vgextend ol /dev/ram0 Volume group "ol" successfully extended # vgscan --cache Reading volume groups from cache. Found volume group "ol" using metadata type lvm2 # lvconvert -y -m 1 ol/root /dev/ram0 Logical volume ol/root successfully converted.

We now have ol/root mirror to our /dev/ram0 disk.

?
1 2 3 4 5 6 7 8 # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 40.70 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol Iwi-aor--- 9.79g /dev/ram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/ram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0)

A few minutes (or seconds) later, the synchronization is completed:

?
1 2 3 4 5 6 7 8 # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 100.00 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol iwi-aor--- 9.79g /dev/ram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/ram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0)

We have our mirrored configuration running !

For security, we can also remove the swap and /boot, /boot/efi(if it exists) mount points:

?
1 2 3 # swapoff -a # umount /boot/efi # umount /boot

Stopping the system also requires some actions as you need to cleanup the configuration so that it will not be looking for a gone ramdisk on reboot.

?
1 2 3 4 5 6 7 # lvconvert -y -m 0 ol/root /dev/ram0 Logical volume ol/root successfully converted. # vgreduce ol /dev/ram0 Removed "/dev/ram0" from volume group "ol" # mount /boot # mount /boot/efi # swapon -a
What about in-memory compression ?

As indicated above, zRAM devices can compress data in-memory, but 2 main problems need to be fixed:

Make lvm work with zram:

The lvm configuration file has to be changed to take into account the "zram" type of devices. Including the following "types" entry to the /etc/lvm/lvm.conf file in its "devices" section:

?
1 2 3 devices { types = [ "zram" , 16 ] }
Root file system I/Os:

A standard Oracle Linux installation uses XFS, and we can check the sector size used (depending on the disk type used) with

?
1 2 3 4 5 6 7 8 9 10 # xfs_info / meta-data=/dev/mapper/ol-root isize=256 agcount=4, agsize=641792 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 finobt=0 spinodes=0 data = bsize=4096 blocks=2567168, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

We can notice here that the sector size (sectsz) used on this root fs is a standard 512 bytes. This fs type cannot be mirrored with a zRAM device, and needs to be recreated with 4k sector sizes.

Transforming the root file system to 4k sector size:

This is simply a backup (to a zram disk) and restore procedure after recreating the root FS. To do so, the system has to be booted from another system image. Booting from an installation DVD image can be a good possibility.

?
1 2 3 sh-4.2 # vgchange -a y ol 2 logical volume(s) in volume group "ol" now active sh-4.2 # mount /dev/mapper/ol-root /mnt
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 sh-4.2 # modprobe zram sh-4.2 # echo 10G > /sys/block/zram0/disksize sh-4.2 # mkfs.xfs /dev/zram0 meta-data=/dev/zram0 isize=256 agcount=4, agsize=655360 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0, sparse=0 data = bsize=4096 blocks=2621440, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 sh-4.2 # mkdir /mnt2 sh-4.2 # mount /dev/zram0 /mnt2 sh-4.2 # xfsdump -L BckUp -M dump -f /mnt2/ROOT /mnt xfsdump: using file dump (drive_simple) strategy xfsdump: version 3.1.7 (dump format 3.0) - type ^C for status and control xfsdump: level 0 dump of localhost:/mnt ... xfsdump: dump complete: 130 seconds elapsed xfsdump: Dump Summary: xfsdump: stream 0 /mnt2/ROOT OK (success) xfsdump: Dump Status: SUCCESS sh-4.2 # umount /mnt
?
1 2 3 4 5 6 7 8 9 10 11 12 sh-4.2 # mkfs.xfs -f -s size=4096 /dev/mapper/ol-root meta-data=/dev/mapper/ol-root isize=256 agcount=4, agsize=641792 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0, sparse=0 data = bsize=4096 blocks=2567168, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 sh-4.2 # mount /dev/mapper/ol-root /mnt
?
1 2 3 4 5 6 7 8 9 10 11 sh-4.2 # xfsrestore -f /mnt2/ROOT /mnt xfsrestore: using file dump (drive_simple) strategy xfsrestore: version 3.1.7 (dump format 3.0) - type ^C for status and control xfsrestore: searching media for dump ... xfsrestore: restore complete: 337 seconds elapsed xfsrestore: Restore Summary: xfsrestore: stream 0 /mnt2/ROOT OK (success) xfsrestore: Restore Status: SUCCESS sh-4.2 # umount /mnt sh-4.2 # umount /mnt2
?
1 sh-4.2 # reboot
?
1 2 3 4 5 6 7 8 9 10 $ xfs_info / meta-data=/dev/mapper/ol-root isize=256 agcount=4, agsize=641792 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0 spinodes=0 data = bsize=4096 blocks=2567168, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

With sectsz=4096, our system is now ready for zRAM mirroring.

Basic commands with a zRAM device: ?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 # modprobe zram # zramctl --find --size 10G /dev/zram0 # pvcreate /dev/zram0 Physical volume "/dev/zram0" successfully created. # vgextend ol /dev/zram0 Volume group "ol" successfully extended # vgscan --cache Reading volume groups from cache. Found volume group "ol" using metadata type lvm2 # lvconvert -y -m 1 ol/root /dev/zram0 Logical volume ol/root successfully converted. # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 12.38 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol Iwi-aor--- 9.79g /dev/zram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/zram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0) # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 100.00 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol iwi-aor--- 9.79g /dev/zram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/zram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0) # zramctl NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 lzo 10G 9.8G 5.3G 5.5G 1

The compressed disk uses a total of 5.5GB of memory to mirror a 9.8G volume size (using in this case 8.5G).

Removal is performed the same way as brd, except that the device is /dev/zram0 instead of /dev/ram0.

Automating the process:

Fortunately, the procedure can be automated on system boot and shutdown with the following scripts (given as examples).

The start method: /usr/sbin/start-raid1-ramdisk: [ https://github.com/oracle/linux-blog-sample-code/blob/ramdisk-system-image/start-raid1-ramdisk ]

After a chmod 555 /usr/sbin/start-raid1-ramdisk, running this script on a 4k xfs root file system should show something like:

?
1 2 3 4 5 6 7 8 9 10 11 # /usr/sbin/start-raid1-ramdisk Volume group "ol" is already consistent. RAID1 ramdisk: intending to use 10276864 K of memory for facilitation of [ / ] Physical volume "/dev/zram0" successfully created. Volume group "ol" successfully extended Logical volume ol/root successfully converted. Waiting for mirror to synchronize... LVM RAID1 sync of [ / ] took 00:01:53 sec Logical volume ol/root changed. NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 lz4 9.8G 9.8G 5.5G 5.8G 1

The stop method: /usr/sbin/stop-raid1-ramdisk: [ https://github.com/oracle/linux-blog-sample-code/blob/ramdisk-system-image/stop-raid1-ramdisk ]

After a chmod 555 /usr/sbin/stop-raid1-ramdisk, running this script should show something like:

?
1 2 3 4 5 6 # /usr/sbin/stop-raid1-ramdisk Volume group "ol" is already consistent. Logical volume ol/root changed. Logical volume ol/root successfully converted. Removed "/dev/zram0" from volume group "ol" Labels on physical volume "/dev/zram0" successfully wiped.

A service Unit file can also be created: /etc/systemd/system/raid1-ramdisk.service [https://github.com/oracle/linux-blog-sample-code/blob/ramdisk-system-image/raid1-ramdisk.service]

?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [Unit] Description=Enable RAMdisk RAID 1 on LVM After= local -fs.target Before= shutdown .target reboot.target halt.target [Service] ExecStart=/usr/sbin/start-raid1-ramdisk ExecStop=/usr/sbin/stop-raid1-ramdisk Type=oneshot RemainAfterExit= yes TimeoutSec=0 [Install] WantedBy=multi-user.target
Conclusion:

When the system disk access problem manifests itself, the ramdisk mirror branch will provide the possibility to investigate the situation. This procedure goal is not to keep the system running on this memory mirror configuration, but help investigate a bad situation.

When the problem is identified and fixed, I really recommend to come back to a standard configuration -- enjoying the entire memory of the system, a standard system disk, a possible swap space etc.

Hoping the method described here can help. I also want to thank for their reviews Philip "Bryce" Copeland who also created the first prototype of the above scripts, and Mark Kanda who also helped testing many aspects of this work.

[Nov 08, 2019] A Linux user's guide to Logical Volume Management Opensource.com

Nov 08, 2019 | opensource.com

In Figure 1, two complete physical hard drives and one partition from a third hard drive have been combined into a single volume group. Two logical volumes have been created from the space in the volume group, and a filesystem, such as an EXT3 or EXT4 filesystem has been created on each of the two logical volumes.

Figure 1: LVM allows combining partitions and entire hard drives into Volume Groups.

Adding disk space to a host is fairly straightforward but, in my experience, is done relatively infrequently. The basic steps needed are listed below. You can either create an entirely new volume group or you can add the new space to an existing volume group and either expand an existing logical volume or create a new one.

Adding a new logical volume

There are times when it is necessary to add a new logical volume to a host. For example, after noticing that the directory containing virtual disks for my VirtualBox virtual machines was filling up the /home filesystem, I decided to create a new logical volume in which to store the virtual machine data, including the virtual disks. This would free up a great deal of space in my /home filesystem and also allow me to manage the disk space for the VMs independently.

The basic steps for adding a new logical volume are as follows.

  1. If necessary, install a new hard drive.
  2. Optional: Create a partition on the hard drive.
  3. Create a physical volume (PV) of the complete hard drive or a partition on the hard drive.
  4. Assign the new physical volume to an existing volume group (VG) or create a new volume group.
  5. Create a new logical volumes (LV) from the space in the volume group.
  6. Create a filesystem on the new logical volume.
  7. Add appropriate entries to /etc/fstab for mounting the filesystem.
  8. Mount the filesystem.

Now for the details. The following sequence is taken from an example I used as a lab project when teaching about Linux filesystems.

Example

This example shows how to use the CLI to extend an existing volume group to add more space to it, create a new logical volume in that space, and create a filesystem on the logical volume. This procedure can be performed on a running, mounted filesystem.

WARNING: Only the EXT3 and EXT4 filesystems can be resized on the fly on a running, mounted filesystem. Many other filesystems including BTRFS and ZFS cannot be resized.

Install hard drive

If there is not enough space in the volume group on the existing hard drive(s) in the system to add the desired amount of space it may be necessary to add a new hard drive and create the space to add to the Logical Volume. First, install the physical hard drive, and then perform the following steps.

Create Physical Volume from hard drive

It is first necessary to create a new Physical Volume (PV). Use the command below, which assumes that the new hard drive is assigned as /dev/hdd.

pvcreate /dev/hdd

It is not necessary to create a partition of any kind on the new hard drive. This creation of the Physical Volume which will be recognized by the Logical Volume Manager can be performed on a newly installed raw disk or on a Linux partition of type 83. If you are going to use the entire hard drive, creating a partition first does not offer any particular advantages and uses disk space for metadata that could otherwise be used as part of the PV.

Extend the existing Volume Group

In this example we will extend an existing volume group rather than creating a new one; you can choose to do it either way. After the Physical Volume has been created, extend the existing Volume Group (VG) to include the space on the new PV. In this example the existing Volume Group is named MyVG01.

vgextend /dev/MyVG01 /dev/hdd
Create the Logical Volume

First create the Logical Volume (LV) from existing free space within the Volume Group. The command below creates a LV with a size of 50GB. The Volume Group name is MyVG01 and the Logical Volume Name is Stuff.

lvcreate -L +50G --name Stuff MyVG01
Create the filesystem

Creating the Logical Volume does not create the filesystem. That task must be performed separately. The command below creates an EXT4 filesystem that fits the newly created Logical Volume.

mkfs -t ext4 /dev/MyVG01/Stuff
Add a filesystem label

Adding a filesystem label makes it easy to identify the filesystem later in case of a crash or other disk related problems.

e2label /dev/MyVG01/Stuff Stuff
Mount the filesystem

At this point you can create a mount point, add an appropriate entry to the /etc/fstab file, and mount the filesystem.

You should also check to verify the volume has been created correctly. You can use the df , lvs, and vgs commands to do this.

Resizing a logical volume in an LVM filesystem

The need to resize a filesystem has been around since the beginning of the first versions of Unix and has not gone away with Linux. It has gotten easier, however, with Logical Volume Management.

  1. If necessary, install a new hard drive.
  2. Optional: Create a partition on the hard drive.
  3. Create a physical volume (PV) of the complete hard drive or a partition on the hard drive.
  4. Assign the new physical volume to an existing volume group (VG) or create a new volume group.
  5. Create one or more logical volumes (LV) from the space in the volume group, or expand an existing logical volume with some or all of the new space in the volume group.
  6. If you created a new logical volume, create a filesystem on it. If adding space to an existing logical volume, use the resize2fs command to enlarge the filesystem to fill the space in the logical volume.
  7. Add appropriate entries to /etc/fstab for mounting the filesystem.
  8. Mount the filesystem.
Example

This example describes how to resize an existing Logical Volume in an LVM environment using the CLI. It adds about 50GB of space to the /Stuff filesystem. This procedure can be used on a mounted, live filesystem only with the Linux 2.6 Kernel (and higher) and EXT3 and EXT4 filesystems. I do not recommend that you do so on any critical system, but it can be done and I have done so many times; even on the root (/) filesystem. Use your judgment.

WARNING: Only the EXT3 and EXT4 filesystems can be resized on the fly on a running, mounted filesystem. Many other filesystems including BTRFS and ZFS cannot be resized.

Install the hard drive

If there is not enough space on the existing hard drive(s) in the system to add the desired amount of space it may be necessary to add a new hard drive and create the space to add to the Logical Volume. First, install the physical hard drive and then perform the following steps.

Create a Physical Volume from the hard drive

It is first necessary to create a new Physical Volume (PV). Use the command below, which assumes that the new hard drive is assigned as /dev/hdd.

pvcreate /dev/hdd

It is not necessary to create a partition of any kind on the new hard drive. This creation of the Physical Volume which will be recognized by the Logical Volume Manager can be performed on a newly installed raw disk or on a Linux partition of type 83. If you are going to use the entire hard drive, creating a partition first does not offer any particular advantages and uses disk space for metadata that could otherwise be used as part of the PV.

Add PV to existing Volume Group

For this example, we will use the new PV to extend an existing Volume Group. After the Physical Volume has been created, extend the existing Volume Group (VG) to include the space on the new PV. In this example, the existing Volume Group is named MyVG01.

vgextend /dev/MyVG01 /dev/hdd
Extend the Logical Volume

Extend the Logical Volume (LV) from existing free space within the Volume Group. The command below expands the LV by 50GB. The Volume Group name is MyVG01 and the Logical Volume Name is Stuff.

lvextend -L +50G /dev/MyVG01/Stuff
Expand the filesystem

Extending the Logical Volume will also expand the filesystem if you use the -r option. If you do not use the -r option, that task must be performed separately. The command below resizes the filesystem to fit the newly resized Logical Volume.

resize2fs /dev/MyVG01/Stuff

You should check to verify the resizing has been performed correctly. You can use the df , lvs, and vgs commands to do this.

Tips

Over the years I have learned a few things that can make logical volume management even easier than it already is. Hopefully these tips can prove of some value to you.

I know that, like me, many sysadmins have resisted the change to Logical Volume Management. I hope that this article will encourage you to at least try LVM. I am really glad that I did; my disk management tasks are much easier since I made the switch. Topics Business Linux How-tos and tutorials Sysadmin About the author David Both - David Both is an Open Source Software and GNU/Linux advocate, trainer, writer, and speaker who lives in Raleigh North Carolina. He is a strong proponent of and evangelist for the "Linux Philosophy." David has been in the IT industry for nearly 50 years. He has taught RHCE classes for Red Hat and has worked at MCI Worldcom, Cisco, and the State of North Carolina. He has been working with Linux and Open Source Software for over 20 years. David prefers to purchase the components and build his...

[Nov 02, 2019] LVM spanning over multiple disks What disk is a file on? Can I lose a drive without total loss

Notable quotes:
"... If you lose a drive in a volume group, you can force the volume group online with the missing physical volume, but you will be unable to open the LV's that were contained on the dead PV, whether they be in whole or in part. ..."
"... So, if you had for instance 10 LV's, 3 total on the first drive, #4 partially on first drive and second drive, then 5-7 on drive #2 wholly, then 8-10 on drive 3, you would be potentially able to force the VG online and recover LV's 1,2,3,8,9,10.. #4,5,6,7 would be completely lost. ..."
"... LVM doesn't really have the concept of a partition it uses PVs (Physical Volumes), which can be a partition. These PVs are broken up into extents and then these are mapped to the LVs (Logical Volumes). When you create the LVs you can specify if the data is striped or mirrored but the default is linear allocation. So it would use the extents in the first PV then the 2nd then the 3rd. ..."
"... As Peter has said the blocks appear as 0's if a PV goes missing. So you can potentially do data recovery on files that are on the other PVs. But I wouldn't rely on it. You normally see LVM used in conjunction with RAIDs for this reason. ..."
"... it's effectively as if a huge chunk of your disk suddenly turned to badblocks. You can patch things back together with a new, empty drive to which you give the same UUID, and then run an fsck on any filesystems on logical volumes that went across the bad drive to hope you can salvage something. ..."
Mar 16, 2015 | serverfault.com

LVM spanning over multiple disks: What disk is a file on? Can I lose a drive without total loss? Ask Question Asked 8 years, 10 months ago Active 4 years, 6 months ago Viewed 9k times 7 2 I have three 990GB partitions over three drives in my server. Using LVM, I can create one ~3TB partition for file storage.

1) How does the system determine what partition to use first?
2) Can I find what disk a file or folder is physically on?
3) If I lose a drive in the LVM, do I lose all data, or just data physically on that disk? storage lvm share

edited Mar 16 '15 at 12:53

HopelessN00b 49k 25 25 gold badges 121 121 silver badges 194 194 bronze badges asked Dec 2 '10 at 2:28 Luke has no name Luke has no name 989 10 10 silver badges 13 13 bronze badges

add a comment | 3 Answers 3 active oldest votes 12
  1. The system fills from the first disk in the volume group to the last, unless you configure striping with extents.
  2. I don't think this is possible, but where I'd start to look is in the lvs/vgs commands man pages.
  3. If you lose a drive in a volume group, you can force the volume group online with the missing physical volume, but you will be unable to open the LV's that were contained on the dead PV, whether they be in whole or in part.
  4. So, if you had for instance 10 LV's, 3 total on the first drive, #4 partially on first drive and second drive, then 5-7 on drive #2 wholly, then 8-10 on drive 3, you would be potentially able to force the VG online and recover LV's 1,2,3,8,9,10.. #4,5,6,7 would be completely lost.
Peter Grace Peter Grace 2,676 2 2 gold badges 22 22 silver badges 38 38 bronze badges add a comment | 3

1) How does the system determine what partition to use first?

LVM doesn't really have the concept of a partition it uses PVs (Physical Volumes), which can be a partition. These PVs are broken up into extents and then these are mapped to the LVs (Logical Volumes). When you create the LVs you can specify if the data is striped or mirrored but the default is linear allocation. So it would use the extents in the first PV then the 2nd then the 3rd.

2) Can I find what disk a file or folder is physically on?

You can determine what PVs a LV has allocation extents on. But I don't know of a way to get that information for an individual file.

3) If I lose a drive in the LVM, do I lose all data, or just data physically on that disk?

As Peter has said the blocks appear as 0's if a PV goes missing. So you can potentially do data recovery on files that are on the other PVs. But I wouldn't rely on it. You normally see LVM used in conjunction with RAIDs for this reason.

3dinfluence 3dinfluence 12k 1 1 gold badge 23 23 silver badges 38 38 bronze badges

add a comment | 2 I don't know the answer to #2, so I'll leave that to someone else. I suspect "no", but I'm willing to be happily surprised.

1 is: you tell it, when you combine the physical volumes into a volume group.

3 is: it's effectively as if a huge chunk of your disk suddenly turned to badblocks. You can patch things back together with a new, empty drive to which you give the same UUID, and then run an fsck on any filesystems on logical volumes that went across the bad drive to hope you can salvage something.

And to the overall, unasked question: yeah, you probably don't really want to do that.

A simple introduction to working with LVM

xxx
I got badly bitten by this. Had a disk die on me. Very luckily only one logical volume was using the space on that physical volume.

Not that this is an in depth lvm review - but - just to make it more visible (I spent a _long_ time googling) - if you get a dead PV in your volume group - then - you can run

vgreduce --removemissing <volgrpname>

to get rid of it. Right enough that you will lose any partitions wholly or partly on that PV. But you should be able to rescue the rest :)

#

Re: A simple introduction to working with LVM

Posted by marki (89.173.xx.xx) on Thu 29 Jun 2006 at 23:19

I had this problem - one of the disks in LVM died (I wasn't using RAID on this server - but now I use :)

The failed disk contained part of /home. I had a backup, but it was few days old, so I wanted to try to read new files from /home.

I put all good disks to another machine and booted from Live CD (INSERT or RiPLINUX, I don't remember which one worked). The problem was the VG refused to activate itself because of missing PV. I have found that switch "-P" to vgchange allows it to activate in partial mode. That was OK, but it activates itself only in read-only mode. Problem was the ext3 filesystem on /home, which wasn't unmounted and required recovery - which is not possible on read-only "disk" :(

I had to use mdadm to create bogus PV (which returns all nulls on read) instead of the missing one (it's written in man vgchange). But I had to google on how to create it.

Finally I created a "replacement" PV in RAM. Just created a big enough file on ramdisk, used losetup to make a loopback device of it, then used pvcreate --uuid with uuid of the missing PV. pvscan recognized it, but it didn't show that it is part of VG. Running vgcfgrestore solved also this. This allowed vgchange to activate the VG in read-write mode and I could mount the ext3 fs. I was able to read all data on the good disk.

So using LVM does not make your data unavailable when one of the disks dies (I mean it is possible to get data out of the good ones).

Missing bits found

July 18th, 2007 | linuxjournal.com

On July 18th, 2007 Richard Bullington-McGuire says:

> ... couldn't you have booted the recovery box with a Live CD and simply mounted

only the drive partitions you needed?

That was what I was originally hoping to do, but that did not work automatically. RAID arrays on USB-connected drives are not available to the system when it does its first scan for RAID arrays. Also, if the recovery box has a volume group with the same name, it will not recognize the newly-attached volume group.

I have used USB RAID arrays in production, and you have to take some extra steps to activate them late in the boot process. I typically use a script similar to this to do the job:


#!/bin/sh
#
# Mount a USB raid array
#
# Call from /etc/rc.d/rc.local

DEVICE=/dev/ExampleVolGroup/ExampleVol00
MOUNTPOINT=/mnt/ExampleVol00

# Activate the array. This assumes that /etc/mdadm.conf has an entry for it already
/sbin/mdadm -A -s
# Look for LVM2 volume groups on all connected partitions, including the array
/sbin/vgscan --mknodes
# Activate all LVM partitions, including that on the array
/sbin/vgchange -a y
# Make sure to fsck the device so it stays healthy long-term
fsck -T -a $DEVICE
mount $DEVICE $MOUNTPOINT

> In otherwords, just don't mount the drive in the recovery box that had the equivalent vol group. That way there would have been no conflict right?

That's mostly right. You'd still need to scan for the RAID arrays with 'mdadm --examine --scan $MYDEVICENAME' , then activate them after creating /etc/mdadm.conf.

If you had other md software RAID devices on the system, you might have to fix up the device numbering on the md devices.

> If the recovery box did NOT have any LVM partitions or LVM config native to it.. could i simply plug the raid drive in and the recovery box would automagically find the raid LVM partitions or would I still have to something else to make it work?

On a recovery box without any software RAID or LVM configuration, if you plugged the RAID drive directly into the IDE or SATA connector, it might automagically find the RAID array and LVM volume. I have not done that particular experiment, you might try it and let me know how it goes.

If the drive was attached to the recovery box using a USB enclosure, the RAID and LVM configurations probably won't be autodetected during the early boot stages, and you'll almost certainly have to do a scan / activate procedure on both the RAID and LVM layers.

You might have to scan for RAID partitions, build an /etc/mdadm.conf file, and then scan for volume groups and activate them in either case.

The most difficult part of the recovery outlined in the article was pulling the LVM configuration out of the on-disk ring buffer. You can avoid that by making sure you have a backup of the LVM configuration for that machine stored elsewhere.

Experienced this exact

On September 13th, 2006 Neekofab (not verified) says:

Experienced this exact problem. moved a md0/md1 disk to a recovery workstation that already had an md0/md1 device. they could not coexist, and I could not find a way to move the additional md0/md1 devices to md2/md3. I ended up disconnecting the system md0/md1 devices, booting up with sysresccd and shoving the data over the network.

bleah

I ran into the same issue

On May 9th, 2007 Anonymous (not verified) says:

I ran into the same issue and solved it with a little reading about mdadm. All you have to do is create a new array from the old disks.

# MAKEDEV md1
# mdadm -C /dev/md1 -l 1 -n 2 missing /dev/sdb1

Voila. Your raid array has now been moved from md0 to md1.

Help restoring LVM partition Redhat, RHEL 5, 5.1, LVM

Example of wrong move which destroys LVM control table. You need to use
vgextend VolGroup00 /dev/sdb1 
to add new disk to existing logical volume. See Recovery of RAID and LVM2 Volumes for information about recovery.
Zones: Linux, Linux Administration, Linux Setup

Tags: Redhat, RHEL 5, 5.1, LVM

I was working on a Dell Precision Workstation system with 2 SAS drives.
The first drive /dev/sda was partitioned with the following table

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 36472 292856917+ 8e Linux LVM

[root@lrc200604665 tsm]# cat /etc/fstab
/dev/VolGroup00/LogVol00 / ext3 defaults 1 1
LABEL=/boot /boot ext3 defaults 1 2
devpts /dev/pts devpts gid=5,mode=620 0 0
tmpfs /dev/shm tmpfs defaults 0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs defaults 0 0
/dev/VolGroup00/LogVol01 swap swap defaults 0 0

I wanted to add a second hard drive to the system and mount it as "home"

My idea was to add it to the existing Volume Group VolGroup00

After I formated the drive and using the standard Linux 8E LVM2 partition type, I ran the following command to prepare it for LVM

[root@lrc200604665 home]# pvcreate /dev/sdb1
Can't initialize physical volume "/dev/sdb1" of volume group "VolGroup00" without -ff

Well, I ran the command and it overwrote my LVM table.

pvscan detects a UUID mitchmatch since PVCREATE overwrote the VolGroup00
vgscan and lvscan are also useless.

The system will odiously not boot now

any help would be greatly appreciated

[Sep 14, 2010] Learn Linux, 101 Maintain the integrity of filesystems by Ian Shields

Aug 24, 2010 | developerWorks

Checking filesystems

In cases when your system crashes or loses power, Linux may not be able to cleanly unmount your filesystems. Thus, your filesystems may be left in an inconsistent state, with some changes completed and some not. Operating with a damaged filesystem is not a good idea as you are likely to further compound any existing errors.

The main tool for checking filesystems is fsck, which, like mkfs, is really a front end to filesystem-checking routines for the various filesystem types. Some of the underlying check routines are shown in Listing 1.

Listing 1. Some of the fsck programs
[ian@echidna ~]$ ls /sbin/*fsck*
/sbin/btrfsck  /sbin/fsck         /sbin/fsck.ext3     /sbin/fsck.msdos
/sbin/dosfsck  /sbin/fsck.cramfs  /sbin/fsck.ext4     /sbin/fsck.vfat
/sbin/e2fsck   /sbin/fsck.ext2    /sbin/fsck.ext4dev  /sbin/fsck.xfs

You may be surprised to learn that several of these files are hard links to just one file as shown in Listing 2. Remember that these programs may be used so early in the boot process that the filesystem may not be mounted and symbolic link support may not yet be available. See our article Learn Linux, 101: Create and change hard and symbolic links for more information about hard and symbolic links.

Listing 2. One fsck program with many faces
[ian@echidna ~]$ find /sbin -samefile /sbin/e2fsck
/sbin/fsck.ext4dev
/sbin/e2fsck
/sbin/fsck.ext3
/sbin/fsck.ext4
/sbin/fsck.ext2

The system boot process use fsck with the -A option to check the root filesystem and any other filesystems that are specified for checking in the /etc/fstab control file. If the filesystem was not cleanly unmounted, a consistency check is performed and repairs are made, if they can be done safely. This is controlled by the pass (or passno) field (the sixth field) of the /etc/fstab entry. Filesystems with pass set to zero are not checked at boot time. The root filesystem has a pass value of 1 and is checked first. Other filesystems will usually have a pass value of 2 (or higher), indicating the order in which they should be checked.

Multiple fsck operations can run in parallel if the system determines it is advantageous, so different filesystems are allowed to have the same pass value, as is the case for the /grubfile and //mnt/ext3test filesystems shown in Listing 3. Note that fsck will avoid running multiple filesystem checks on the same physical disk. To learn more about the layout of /etc/fstab, check the man pages for fstab.

Listing 3. Boot checking of filesystems with /etc/fstab entries
                   

filesystem                           mount point  type   options    dump pass
UUID=a18492c0-7ee2-4339-9010-3a15ec0079bb /              ext3    defaults        1   1
UUID=488edd62-6614-4127-812d-cbf58eca85e9 /grubfile      ext3    defaults        1   2
UUID=2d4f10a6-be57-4e1d-92ef-424355bd4b39 swap           swap    defaults        0   0
UUID=ba38c08d-a9e7-46b2-8890-0acda004c510 swap           swap    defaults        0   0
LABEL=EXT3TEST                            /mnt/ext3test  ext3    defaults        0   2
/dev/sda8                                 /mnt/xfstest   xfs     defaults        0   0
LABEL=DOS                                 /dos           vfat    defaults        0   0
tmpfs                   /dev/shm                         tmpfs   defaults        0   0
devpts                  /dev/pts                         devpts  gid=5,mode=620  0   0
sysfs                   /sys                             sysfs   defaults        0   0
proc                    /proc                            proc    defaults        0   0

Some journaling filesystems, such as ReiserFS and XFS, might have a pass value of 0 because the journaling code, rather than fsck, does the filesystem consistency check and repair. On the other hand, some filesystems, such as /proc, are built at initialization time and therefore do need to be checked.

You can check filesystems after the system is booted. You will need root authority, and the filesystem you want to check should be unmounted first. Listing 4 shows how to check two of our filesystems, using the device name, label, or UUID. You can use the blkid command to find the device given a label or UUID, and the label and UUID, given the device.


Listing 4. Using fsck to check filesystems
[root@echidna ~]# # find the device for LABEL=EXT3TEST
[root@echidna ~]# blkid -L EXT3TEST
/dev/sda7
[root@echidna ~]# # Find label and UUID for /dev/sda7
[root@echidna ~]# blkid /dev/sda7
/dev/sda7: LABEL="EXT3TEST" UUID="7803f979-ffde-4e7f-891c-b633eff981f0" SEC_TYPE="ext2" 
 TYPE="ext3" 
[root@echidna ~]# # Check /dev/sda7
[root@echidna ~]# fsck /dev/sda7
fsck from util-linux-ng 2.16.2
e2fsck 1.41.9 (22-Aug-2009)
EXT3TEST: clean, 11/7159808 files, 497418/28637862 blocks
[root@echidna ~]# # Check it by label using fsck.ext3
[root@echidna ~]# fsck.ext3 LABEL=EXT3TEST
e2fsck 1.41.9 (22-Aug-2009)
EXT3TEST: clean, 11/7159808 files, 497418/28637862 blocks
[root@echidna ~]# # Check it by UUID using e2fsck
[root@echidna ~]# e2fsck UUID=7803f979-ffde-4e7f-891c-b633eff981f0
e2fsck 1.41.9 (22-Aug-2009)
EXT3TEST: clean, 11/7159808 files, 497418/28637862 blocks
[root@echidna ~]# # Finally check the vfat partition
[root@echidna ~]# fsck LABEL=DOS
fsck from util-linux-ng 2.16.2
dosfsck 3.0.9, 31 Jan 2010, FAT32, LFN
/dev/sda9: 1 files, 1/513064 clusters

If you attempt to check a mounted filesystem, you will usually see a warning similar to the one in Listing 5 where we try to check our root filesystem. Heed the warning and do not do it!

Listing 5. Do not attempt to check a mounted filesystem

[root@echidna ~]# fsck UUID=a18492c0-7ee2-4339-9010-3a15ec0079bb 
fsck from util-linux-ng 2.16.2
e2fsck 1.41.9 (22-Aug-2009)
/dev/sdb9 is mounted.  

WARNING!!!  Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)? no

check aborted.

It is also a good idea to let fsck figure out which check to run on a filesystem; running the wrong check can corrupt the filesystem. If you want to see what fsck would do for a given filesystem or set of filesystems, use the -N option as shown in Listing 6.


Listing 6. Finding what fsck would do to check /dev/sda7, /dev/sda8, and /dev/sda9

[root@echidna ~]# fsck -N /dev/sda7 /dev/sda[89]
fsck from util-linux-ng 2.16.2
[/sbin/fsck.ext3 (1) -- /mnt/ext3test] fsck.ext3 /dev/sda7 
[/sbin/fsck.xfs (2) -- /mnt/xfstest] fsck.xfs /dev/sda8 
[/sbin/fsck.vfat (3) -- /dos] fsck.vfat /dev/sda9 

... ... ...


Monitoring free space

On a storage device, a file or directory is contained in a collection of blocks. Information about a file is contained in an inode, which records information such who the owner is, when the file was last accessed, how large it is, whether it is a directory, and who can read from or write to it. The inode number is also known as the file serial number and is unique within a particular filesystem. See our article Learn Linux, 101: File and directory management for more information on files and directories.

Data blocks and inodes each take space on a filesystem, so you need to monitor the space usage to ensure that your filesystems have space for growth.

The df command

The df command displays information about mounted filesystems. If you add the -T option, the filesystem type is included in the display; otherwise, it is not. The output from df for the Fedora 12 system that we used above is shown in Listing 8.


Listing 8. Displaying filesystem usage
[ian@echidna ~]$ df -T
Filesystem    Type   1K-blocks      Used Available Use% Mounted on
/dev/sdb9     ext3    45358500  24670140  18384240  58% /
tmpfs        tmpfs     1927044       808   1926236   1% /dev/shm
/dev/sda2     ext3      772976     17760    716260   3% /grubfile
/dev/sda8      xfs    41933232      4272  41928960   1% /mnt/xfstest
/dev/sda7     ext3   112754024    192248 106834204   1% /mnt/ext3test
/dev/sda9     vfat     2052256         4   2052252   1% /dos

Notice that the output includes the total number of blocks as well as the number used and free. Also notice the filesystem, such as /dev/sbd9, and its mount point: / /dev/sdb9. The tmpfs entry is for a virtual memory filesystem. These exist only in RAM or swap space and are created when mounted without need for a mkfs command. You can read more about tmpfs in "Common threads: Advanced filesystem implementor's guide, Part 3".

For specific information on inode usage, use the -i option on the df command. You can exclude certain filesystem types using the -x option, or restrict information to just certain filesystem types using the -t option. Use these multiple times if necessary. See the examples in Listing 9.


Listing 9. Displaying inode usage
[ian@echidna ~]$ df -i -x tmpfs
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sdb9            2883584  308920 2574664   11% /
/dev/sda2              48768      41   48727    1% /grubfile
/dev/sda8            20976832       3 20976829    1% /mnt/xfstest
/dev/sda7            7159808      11 7159797    1% /mnt/ext3test
/dev/sda9                  0       0       0    -  /dos
[ian@echidna ~]$ df -iT -t vfat -t ext3
Filesystem    Type    Inodes   IUsed   IFree IUse% Mounted on
/dev/sdb9     ext3   2883584  308920 2574664   11% /
/dev/sda2     ext3     48768      41   48727    1% /grubfile
/dev/sda7     ext3   7159808      11 7159797    1% /mnt/ext3test
/dev/sda9     vfat         0       0       0    -  /dos

You may not be surprised to see that the FAT32 filesystem does not have inodes. If you had a ReiserFS filesystem, its information would also show no inodes. ReiserFS keeps metadata for files and directories in stat items. And since ReiserFS uses a balanced tree structure, there is no predetermined number of inodes as there are, for example, in ext2, ext3, or xfs filesystems.

There are several other options you may use with df to limit the display to local filesystems or control the format of output. For example, use the -H option to display human readable sizes, such as 1K for 1024, or use the -h (or --si) option to get sizes in powers of 10 (1K=1000).

If you aren't sure which filesystem a particular part of your directory tree lives on, you can give the df command a parameter of a directory name or even a filename as shown in Listing 10.


Listing 10. Human readable output for df
[ian@echidna ~]$ df --si ~ian/index.html
Filesystem             Size   Used  Avail Use% Mounted on
/dev/sdb9               47G    26G    19G  58% /

The tune2fs command

The ext family of filesystems also has a utility called tune2fs, which can be used to inspect information about the block count as well as information about whether the filesystem is journaled (ext3 or ext4) or not (ext2). The command can also be used to set many parameters or convert an ext2 filesystem to ext3 by adding a journal. Listing 11 shows the output for a near-empty ext3 filesystem using the -l option to simply display the existing information.


Listing 11. Using tune2fs to display ext4 filesystem information
[root@echidna ~]# tune2fs -l /dev/sda7
tune2fs 1.41.9 (22-Aug-2009)
Filesystem volume name:   EXT3TEST
Last mounted on:          <not available>
Filesystem UUID:          7803f979-ffde-4e7f-891c-b633eff981f0
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype 
 needs_recovery sparse_super large_file
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              7159808
Block count:              28637862
Reserved block count:     1431893
Free blocks:              28140444
Free inodes:              7159797
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1017
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Filesystem created:       Mon Aug  2 15:23:34 2010
Last mount time:          Tue Aug 10 14:17:53 2010
Last write time:          Tue Aug 10 14:17:53 2010
Mount count:              3
Maximum mount count:      30
Last checked:             Mon Aug  2 15:23:34 2010
Check interval:           15552000 (6 months)
Next check after:         Sat Jan 29 14:23:34 2011
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      2438df0d-fa91-4a3a-ba88-c07b2012f86a
Journal backup:           inode blocks
... ... ...

Listing 13. Using du
[testuser1@echidna ~]$ du -hc *
4.0K	Desktop
4.0K	Documents
4.0K	Downloads
16K	index.html
4.0K	Music
4.0K	Pictures
4.0K	Public
4.0K	Templates
4.0K	Videos
48K	total
[testuser1@echidna ~]$ du -hs .
1.1M	.

The reason for the difference between the 48K total from du -c * and the 1.1M summary from du -s is that the latter includes the entries starting with a dot, such as .bashrc, while the former does not.

One other thing to note about du is that you must be able to read the directories that you are running it against.

So now, let's use du to display the total space used by the /usr tree and each of its first-level subdirectories. The result is shown in Listing 14. Use root authority to make sure you have appropriate access permissions.


Listing 14. Using du on /usr
[root@echidna ~]# du -shc /usr/*
394M	/usr/bin
4.0K	/usr/etc
4.0K	/usr/games
156M	/usr/include
628K	/usr/kerberos
310M	/usr/lib
1.7G	/usr/lib64
110M	/usr/libexec
136K	/usr/local
30M	/usr/sbin
2.9G	/usr/share
135M	/usr/src
0	/usr/tmp
5.7G	total

Repairing filesystems

Occasionally, very occasionally we hope, the worst will happen and you will need to repair a filesystem because of a crash or other failure to unmount cleanly. The fsck command that you saw above can repair filesystems as well as check them. Usually the automatic boot-time check will fix the problems and you can proceed.

If the automatic boot-time check of filesystems is unable to restore consistency, you are usually dumped into a single user shell with some instructions to run fsck manually. For an ext2 filesystem, which is not journaled, you may be presented with a series of requests asking you to confirm proposed actions to fix particular blocks on the filesystem. You should generally allow fsck to attempt to fix problems, by responding y (for yes). When the system reboots, check for any missing data or files.

If you suspect corruption, or want to run a check manually, most of the checking programs require the filesystem to be unmounted, or at least mounted read-only. Because you can't unmount the root filesystem on a running system, the best you can do is drop to single user mode (using telinit 1) and then remount the root filesystem read-only, at which time you should be able to perform a consistency check. A better way to check a filesystem is to boot a recovery system, such as a live CD or a USB memory key, and perform the check of your unmounted filesystems from that.

If fsck cannot fix the problem, you do have some other tools available, although you will generally need advanced knowledge of the filesystem layout to successfully fix it.

Why journal?

An fsck scan of an ext2 disk can take quite a while to complete, because the internal data structure (or metadata) of the filesystem must be scanned completely. As filesystems get larger and larger, this takes longer and longer, even though disks also keep getting faster, so a full check may take one or more hours.

This problem was the impetus for journaled, or journaling, filesystems. Journaled filesystems keep a log of recent changes to the filesystem metadata. After a crash, the filesystem driver inspects the log in order to determine which recently changed parts of the filesystem may possibly have errors. With this design change, checking a journaled filesystem for consistency typically takes just a matter of seconds, regardless of filesystem size. Furthermore, the filesystem driver will usually check the filesystem on mounting, so an external fsck check is generally not required. In fact, for the xfs filesystem, fsck does nothing!

If you do run a manual check of a filesystem, check the man pages for the appropriate fsck command (fsck.ext3, e2fsck , reiserfsck, and so on) to determine the appropriate parameters. The -p option, when used with ext2, ext3, or ext4 filesystems will cause fsck to automatically fix all problems that can be safely fixed. This is, in fact, what happens at boot time.

We'll illustrate the use of e2fsck and xfs_check by first running e2fsck on an empty XFS filesystem and then using xfs_check to fix it. Remember we suggested that you use the fsck front end to be sure you are using the right checker, and we warned you that failure to do so may result in filesystem corruption.

In Listing 15, we start running e2fsck against /dev/sda8, which contains an XFS filesystem. After a few interactions we use ctrl-Break to break out, but it is too late. Warning: Do NOT do this unless you are willing to destroy your filesystem.


Listing 15. Deliberately running e2fsck manually on an XFS filesystem
[root@echidna ~]# e2fsck /dev/sda8
e2fsck 1.41.9 (22-Aug-2009)
/dev/sda8 was not cleanly unmounted, check forced.
Resize inode not valid.  Recreate<y>? yes

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (31223, counted=31224).
Fix<y>? ctrl-Break

/dev/sda8: e2fsck canceled.

/dev/sda8: ***** FILE SYSTEM WAS MODIFIED *****

Even if you broke out at the first prompt, your XFS filesystem would still have been corrupted. Repeat after me. Do NOT do this unless you are willing to destroy your filesystem.

... ... ...

Superblocks

You may be wondering how all these checking and repairing tools know where to start. Linux and UNIX filesystems usually have a superblock, which describes the filesystem metadata, or data describing the filesystem itself. This is usually stored at a known location, frequently at or near the beginning of the filesystem, and replicated at other well-known locations. You can use the -n option of mke2fs to display the superblock locations for an existing filesystem. If you specified parameters such as the bytes per inode ratio, you should invoke mke2fs with the same parameters when you use the -n option. Listing 17 shows the location of the superblocks on /dev/sda7.


Listing 17. Finding superblock locations
[root@echidna ~]# mke2fs -n /dev/sda7
mke2fs 1.41.9 (22-Aug-2009)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
7159808 inodes, 28637862 blocks
1431893 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
874 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872

Advanced tools

There are several more advanced tools that you can use to examine or repair a filesystem. Check the man pages for the correct usage and the Linux Documentation Project (see Resources) for how-to information. Almost all of these commands require a filesystem to be unmounted, although some functions can be used on filesystems that are mounted read-only. A few of the commands are described below.

You should always back up your filesystem before attempting any repairs.

Tools for ext2 and ext3 filesystems

tune2fs
Adjusts parameters on ext2 and ext3 filesystems. Use this to add a journal to an ext2 system, making it an ext3 system, as well as display or set the maximum number of mounts before a check is forced. You can also assign a label and set or disable various optional features.
dumpe2fs
Prints the super block and block group descriptor information for an ext2 or ext3 filesystem.
debugfs
Is an interactive file system debugger. Use it to examine or change the state of an ext2 or ext3file system.

Tools for Reiserfs filesystems

reiserfstune
Displays and adjusts parameters on ReiserFS filesystems.
debugreiserfs
Performs similar functions to dumpe2fs and debugfs for ReiserFS filesystems.

... ... ...

We will wrap up our tools review with an illustration of the debugfs command, which allows you to explore the inner workings of an ext family filesystem. By default, it opens the filesystem in read-only mode. It does have many commands that allow you to attempt undeletion of files or directories, as well as other operations that require write access, so you will specifically have to enable write access with the -w option. Use it with extreme care. Listing 18 shows how to open the root filesystem on my system; navigate to my home directory; display information, including the inode number, about a file called index.html; and finally, map that inode number back to the pathname of the file.

Listing 18. Using debugfs
[root@echidna ~]# debugfs /dev/sdb9
debugfs 1.41.9 (22-Aug-2009)
debugfs:  cd home/ian
debugfs:  pwd
[pwd]   INODE: 165127  PATH: /home/ian
[root]  INODE:      2  PATH: /
debugfs:  stat index.html
Inode: 164815   Type: regular    Mode:  0644   Flags: 0x0
Generation: 2621469650    Version: 0x00000000
User:  1000   Group:  1000   Size: 14713
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 32
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x4bf1a3e9 -- Mon May 17 16:15:37 2010
atime: 0x4c619cf0 -- Tue Aug 10 14:39:44 2010
mtime: 0x4bf1a3e9 -- Mon May 17 16:15:37 2010
Size of extra inode fields: 4
Extended attributes stored in inode body: 
  selinux = "unconfined_u:object_r:user_home_t:s0\000" (37)
BLOCKS:
(0-2):675945-675947, (3):1314836
TOTAL: 4

debugfs:  ncheck 164815
Inode	Pathname
164815	/home/ian/index.html
debugfs:  q

Conclusion

We've covered many tools you can use for checking, modifying, and repairing your filesystems. Remember to always use extreme care when using the tools discussed in this article or any other tools. Data loss may be only a keystroke away.

[Jul 08, 2010] Recovering a Lost LVM Volume Disk Novell User Communities

Contents:

Overview

Logical Volume Management (LVM) provides a high level, flexible view of a server's disk storage. Though robust, problems can occur. The purpose of this document is to review the recovery process when a disk is missing or damaged, and then apply that process to plausible examples. When a disk is accidentally removed or damaged in some way that adversely affects the logical volume, the general recovery process is:

  1. Replace the failed or missing disk
  2. Restore the missing disk's UUID
  3. Restore the LVM meta data
  4. Repair the file system on the LVM device

The recovery process will be demonstrated in three specific cases:

  1. A disk belonging to a logical volume group is removed from the server
  2. The LVM meta data is damaged or corrupted
  3. One disk in a multi-disk volume group has been permanently removed

This article discusses how to restore the LVM meta data. This is a risky proposition. If you restore invalid information, you can loose all the data on the LVM device. An important part of LVM recovery is having backups of the meta data to begin with, and knowing how it's supposed to look when everything is running smoothly. LVM keeps backup and archive copies of it's meta data in /etc/lvm/backup and /etc/lvm/archive. Backup these directories regularly, and be familiar with their contents. You should also manually backup the LVM meta data with vgcfgbackup before starting any maintenance projects on your LVM volumes.

If you are planning on removing a disk from the server that belongs to a volume group, you should refer to the LVM HOWTO before doing so.

[Jun 09, 2010] SLES10 SP2 can't boot after Kernel Update - NOVELL FORUMS

Please note that the SLES10 DVD in rescue mode automatically recognizes the LVM group.

Hi everyone,

Last saturday I've updated my servers running SLES10 SP2 to the latest kernel (to fix the infamous NULL pointer issue on the net stack) and after a reboot the system could not start due to a LVM failure.

Grub works OK but then there come a lot of messages like that: "Waiting for device /dev/system/sys_root to appear. Volume "system" not found. Fallback to sh" and messages repeat until a mini-bash appears. I can't do anything with that bash (or shell?).

At first I though the new kernel (or initrd) doesn't had lvm2 support because I booted with a rescue CD and checked every lvm partition with fsck and I was able to mount them, so I chrooted and run "mkinitrd -f lvm2" and rebooted the system, but nothing changed.

I've ran pvdisplay, vgdisplay and lvdisplay and everything looks fine. (See bottom of the post)

After the chroot, I've ran "yast2 lvm_config" and it recognizes the volume, I even created a new volume to check, but when I restart the server the problem occurs again.

BTW: The SLES10 SP2 DVD in rescue mode automatically recognizes the LVM group.

So I downgraded the kernel (2.6.16-0.42.4) to the previous working one (2.6.16-60-0.39.3), but the problem still persist.

My situation is very similar to these ones bellow, but without any file system corruption:

http://forums.novell.com/novell-prod...st1536272.html

LVM and RAID - Waiting for device to appear - openSUSE Forums

The "/boot" partition is a linux native (0x83) with ReiserFS format, I have 17 LVM partitions inside the "system" group, my machine is a Dell PowerEdge 2950 with a PERC5/i controller and 5 SAS disk in a RAID5 setup, all the disk are OK.

I don't think the problem was the kernel update, the server had an uptime of 160 days at the moment I rebooted it, so the problem could happened days before but went unnoticed until this reboot.

By the moment, I could start the server with the rescue CD, chrooted and started every process manually so the users can work during the week, all the data is perfectly fine.

What can I do to resolve this problem? Is very important to have this server working flawlessly because is the main data server of the company.

Thanks in advance for any advise,

Raul

PS: These are the pvdisplay, lvdisplay and vgdisplay output (I had to shorten the output of lvdisplay because it was too long for the post).

Server:~ # pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda6
  VG Name               system
  PV Size               542,23 GB / not usable 2,53 MB
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              138811
  Free PE               32315
  Allocated PE          106496
  PV UUID               yhllOV-uPt2-XiAX-mMPf-94Hb-xow4-tOHIHN
Server:~ # pvdisplay
  --- Logical volume ---
  LV Name                /dev/system/sys_home
  VG Name                system
  LV UUID                5zR1Ze-ISx2-7NNj-HAGJ-vaUt-5UfJ-x1y4RM
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                30,00 GB
  Current LE             7680
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0

  --- Logical volume ---
  LV Name                /dev/system/sys_root
  VG Name                system
  LV UUID                7nue1u-Qci6-VHtX-U1Yv-cMfx-bWxG-UJ6R2a
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                12,00 GB
  Current LE             3072
  Segments               2
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:1

  --- Logical volume ---
  LV Name                /dev/system/sys_tmp
  VG Name                system
  LV UUID                aNgfsd-Bn7f-TcqP-swoq-HhLx-jtUw-L15gSC
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                2,00 GB
  Current LE             512
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:2

  --- Logical volume ---
  LV Name                /dev/system/sys_usr
  VG Name                system
  LV UUID                lkT9K7-csO8-QMEe-3R9J-BUar-7Oa2-FTUu0r
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                8,00 GB
  Current LE             2048
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:3

  --- Logical volume ---
  LV Name                /dev/system/sys_var
  VG Name                system
  LV UUID                kXcoKf-UeYc-s5t5-8gqR-I8r9-6aT9-vzZtj6
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                10,00 GB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:4

  --- Logical volume ---
  LV Name                /dev/system/sys_compras
  VG Name                system
  LV UUID                xefE83-SlWD-S7Ax-GeHw-0W3T-kPyI-imEjL2
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                8,00 GB
  Current LE             2048
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:5

  --- Logical volume ---
  LV Name                /dev/system/sys_proyectos
  VG Name                system
  LV UUID                ulgdax-bPqI-Vi2f-ynYL-pNm4-V4i1-CBn2q9
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                120,00 GB
  Current LE             30720
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:6

  --- Logical volume ---
  LV Name                /dev/system/sys_restore
  VG Name                system
  LV UUID                0jAS4z-iN1V-bR2p-b0l5-bPYa-rKFY-rMftMV
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                60,00 GB
  Current LE             15360
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:7

  --- Logical volume ---
  LV Name                /dev/system/sys_administracion
  VG Name                system
  LV UUID                GkmdIM-Qa2c-6DHs-R6PB-jrNo-pg1o-Q0H6gt
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                80,00 GB
  Current LE             20480
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:8

  --- Logical volume ---
  LV Name                /dev/system/sys_direccion
  VG Name                system
  LV UUID                uc3wr4-qBqL-Mnco-4JPN-vpmi-XZE6-1KarGn
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                4,00 GB
  Current LE             1024
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:9

  --- Logical volume ---
  LV Name                /dev/system/sys_misc
  VG Name                system
  LV UUID                cl7dYM-c9eJ-FAFS-jz0e-EOQN-9saF-kDuJed
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                10,00 GB
  Current LE             2560
  Segments               2
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:10

  --- Logical volume ---
  LV Name                /dev/system/sys_lulo
  VG Name                system
  LV UUID                2mSZiq-mvZ4-iinE-DMxt-ndF2-GF7U-SwGC9o
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                8,00 GB
  Current LE             2048
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:11
Server:~ # vgdisplay
  --- Volume group ---
  VG Name               system
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  32
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                17
  Open LV               16
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               542,23 GB
  PE Size               4,00 MB
  Total PE              138811
  Alloc PE / Size       106496 / 416,00 GB
  Free  PE / Size       32315 / 126,23 GB
  VG UUID               7SoTGW-QyzO-Ns5d-UpDM-Wjef-JJSj-1KswVA

[May 17, 2010] How to mount a LVM partition on Ubuntu - Server Fault

#fdisk -l 
 
/dev/sdb1   *           1        9702    77931283+  8e  Linux LVM 

I tried the following command:

#mkdir /media/backup 
#mount /dev/sdb1 /media/backup 
 
mount: unknown file system 'LVM2_member' 

How do I mount it?

3 Answers oldest newest votes

the solution

#pvs 
/dev/sdc5 intranet lvm2 a- 372,37G 0 
 
# lvdisplay /dev/intranet 
LV Name                /dev/intranet/root 
 
#mount /dev/intranet/root /media/backup 

[May 17, 2010] Managing Disk Space with LVM

LinuxDevCenter.com

Second, having the root filesystem on LVM can complicate recovery of damaged file systems. Because boot loaders don't support LVM yet, you must also have a non-LVM /boot partition (though it can be on a RAID 1 device).

Third, you need some spare unallocated disk space for the new LVM partition. If you don't have this, use parted to shrink your existing root partition, as described in the LVM HOWTO.

For this example, assume you have your swap space and /boot partitions already set up outside of LVM on their own partitions. You can focus on moving your root filesystem onto a new LVM partition in the partition /dev/hda4. Check that the filesystem type on hda4 is LVM (type 8e).

Initialize LVM and create a new physical volume:

# vgscan
# pvcreate /dev/hda4
# vgcreate rootvg /dev/hda4 

Now create a 5G logical volume, formatted into an xfs file system:

# lvcreate rootvg ---name rootlv -size 5G
# mkfs.xfs /dev/rootvg/rootlv 

Copy the files from the existing root file system to the new LVM one:

# mkdir /mnt/new_root
# mount /dev/rootvg/rootlv /mnt/new_root
# cp -ax /. /mnt/new_root/ 

Next, modify /etc/fstab to mount / on /dev/rootvg/root instead of /dev/hda3.

The trickiest part is to rebuild your initrd to include LVM support. This tends to be distro-specific, but look for mkinitrd or yaird. Your initrd image must have the LVM modules loaded or the root filesystem will not be available. To be safe, leave your original initrd image alone and make a new one named, for example, /boot/initrd-lvm.img.

Finally, update your bootloader. Add a new section for your new root filesystem, duplicating your original boot stanza. In the new copy, change the root from /dev/hda3 to /dev/rootvg/rootlv, and change your initrd to the newly built one.

For example, with grub, if you have:

title=Linux
  root (hd0,0)
  kernel /vmlinuz root=/dev/hda3 ro single
  initrd /initrd.img 

add a new section such as:

title=LinuxLVM

  root (hd0,0)
  kernel /vmlinuz root=/dev/rootvg/root ro single
  initrd /initrd-lvm.img 

Conclusion

LVM is only one of many enterprise technologies in the Linux kernel that has become available for regular users. LVM provides a great deal of flexibility with disk space, and combined with RAID 1, NFS, and a good backup strategy, you can build a bulletproof, easily managed way to store, share, and preserve any quantity of files.

Bryce Harrington is a Senior Performance Engineer at the Open Source Development Labs in Beaverton, Oregon.

Kees Cook is the senior network administrator at OSDL.

[May 15, 2010] Recover Data From RAID1 LVM Partitions With Knoppix Linux LiveCD HowtoForge - Linux Howtos and Tutorials

LVM stores one or more copy(s) of the configuration file content at the beginning of the partition. I use the command dd to extract the first part of the partition and write it to a text file:

dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt

Open the file with a text editor:

vi /tmp/md0.txt

You will find some binary data first and then a configuration file part like this:

VolGroup00 {
	id = "evRkPK-aCjV-HiHY-oaaD-SwUO-zN7A-LyRhoj"
	seqno = 2
	status = ["RESIZEABLE", "READ", "WRITE"]
	extent_size = 65536		# 32 Megabytes
	max_lv = 0
	max_pv = 0

	physical_volumes {

		pv0 {
			id = "uMJ8uM-sfTJ-La9j-oIuy-W3NX-ObiT-n464Rv"
			device = "/dev/md0"	# Hint only

			status = ["ALLOCATABLE"]
			pe_start = 384
			pe_count = 8943	# 279,469 Gigabytes
		}
	}

	logical_volumes {

		LogVol00 {
			id = "ohesOX-VRSi-CsnK-PUoI-GjUE-0nT7-ltxWoy"
			status = ["READ", "WRITE", "VISIBLE"]
			segment_count = 1

			segment1 {
				start_extent = 0
				extent_count = 8942	# 279,438 Gigabytes

				type = "striped"
				stripe_count = 1	# linear

				stripes = [
					"pv0", 0
				]
			}
		}
	}
}

Create the file /etc/lvm/backup/VolGroup00:

vi /etc/lvm/backup/VolGroup00

and insert the configuration data so the file looks similar to the above example.

Now we can start LVM:

/etc/init.d/lvm start

[Nov 11, 2008] EXT3 filesystem recovery in LVM2

EXT3 filesystem recovery in LVM2 
--------------------------------------------------------------------------------
This is the bugzilla bug I started on the fedora buzilla:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=142737
--------------------------------------------------------------------------------
Very good idea to do something like the following, so that you have
an copy of the partition you're trying to recover, in case something
bad happens:
dd if=/dev/hda2 bs=1024k conv=noerror,sync,notrunc | reblock -t 65536 30 |
ssh remote.host.uci.edu 'cat > /recovery/damaged-lvm2-ext3'
--------------------------------------------------------------------------------
e2salvage died with "Terminated".  I assume it OOM'd.
--------------------------------------------------------------------------------
e2extract gave a huge list of 0 length files.  Doesn't seem right,
and it was taking forever, so I decided to move on to other methods.
But does anyone know if this is normal behavior for e2extract on an ext3?
--------------------------------------------------------------------------------
I wrote a small program that searches for ext3 magic numbers.  It's
finding many, EG 438, 30438, 63e438 and so on (hex).  The question is,
how do I convert from that to an fsck -b number?
--------------------------------------------------------------------------------
Running the same program on a known-good ext3, the first offset was the
same, but others were different.  However, they all ended in hex 38...
--------------------------------------------------------------------------------
I'm now running an "fsck -vn -b" with the -b argument ranging from 0 to
999999.  I'm hoping this will locate a suitable -b for me via brute force.
--------------------------------------------------------------------------------
Sent a post to gmane.linux.kernel 2004-12-16 
--------------------------------------------------------------------------------
Robin Green  very helpfully provided the
following instructions, which appear to be getting somewhere:

1) Note down what the root= device is that appears on the kernel
command line (this can be found by going to boot from hard drive and
then examining
the kernel command line in grub, or by looking in /boot/grub/grub.conf )

2) Be booted from rescue disk

3) Sanity check: ensure that the nodes /dev/hda, /dev/hda2 etc. exist

4) Start up LVM2 (assuming it is not already started by the rescue disk!) by
typing:

  lvm vgchange --ignorelockingfailure -P -a y

Looking at my initrd script, it doesn't seem necessary to run any other
commands
to get LVM2 volumes activated - that's it.

5) Find out which major/minor number the root device is. This is the
slightly tricky
bit. You may have to use trial-and-error. In my case, I guessed right
first time:
(no comments about my odd hardware setup please ;)

[root@localhost t]# ls /sys/block
dm-0  dm-2  hdd    loop1  loop3  loop5  loop7  ram0  ram10  ram12  ram14
ram2  ram4  ram6  ram8
dm-1  hdc   loop0  loop2  loop4  loop6  md0    ram1  ram11  ram13  ram15
ram3  ram5  ram7  ram9
[root@localhost t]# cat /sys/block/dm-0/dev
253:0
[root@localhost t]# devmap_name 253 0
Volume01-LogVol02

In the first command, I listed the block devices known to the kernel. dm-*
are the LVM
devices (on my 2.6.9 kernel, anyway). In the second command, I found
out the major:minor
numbers of /dev/dm-0. In the third command, I used devmap_name to check
that the device
mapper name of node with major 253 and minor 0, is the same as the name
of the root device
from my kernel command line (cf. step 1). Apart from a slight punctuation
difference,
it is the same, therefore I have found the root device.

I'm not sure if FC3 includes the devmap_name command. According to
fr2.rpmfind.net, it doesn't.
But you don't really need it, you can just try all the LVM devices in
turn until you find
your root device. Or, I can email you a statically-linked binary of it
if you want.

6) Create the /dev node for the root filesystem if it doesn't already
exist, e.g.:

  mknod /dev/dm-0 b 253 0

using the major-minor numbers found in step 5.

Please note that for the purpose of _rescue_, the node doesn't actually
have to be under
/dev (so /dev doesn't have to be writeable) and its name does not
matter. It just needs
to exist somewhere on a filesystem, and you have to refer to it in the
next command.

7) Do what you want to the root filesystem, e.g.:

  fsck /dev/dm-0
  mount /dev/dm-0 /where/ever

As you probably know, the fsck might actually work, because a fsck
can sometimes
correct filesystem errors that the kernel filesystem modules cannot.

8) If the fsck doesn't work, look in the output of fsck and in dmesg
for signs of
physical drive errors. If you find them, (a) think about calling a
data recovery
specialist, (b) do NOT use the drive!
--------------------------------------------------------------------------------
On FC3's rescue disk, what I actually did was:

1) Do startup network interfaces
2) Don't try to automatically mount the filesystems - not even readonly
3) lvm vgchange --ignorelockingfailure -P -a y
4) fdisk -l, and guess which partition is which based on size: the small
one was /boot, and the large one was /
5) mkdir /mnt/boot
6) mount /dev/hda1 /mnt/boot
7) Look up the device node for the root filesystem in /mnt/boot/grub/grub.conf
8) A first tentative step, to see if things are working: fsck -n
/dev/VolGroup00/LogVol00
9) Dive in: fsck -f -y /dev/VolGroup00/LogVol00
10) Wait a while...  Be patient.  Don't interrupt it
11) Reboot
--------------------------------------------------------------------------------
Are these lvm1 or lvm2?

lvmdiskscan -v
vgchange -ay
vgscan -P
vgchange -ay -P
--------------------------------------------------------------------------------
jeeves:~# lvm version
  LVM version:     2.01.04 (2005-02-09)
  Library version: 1.01.00-ioctl (2005-01-17)
  Driver version:  4.1.0
--------------------------------------------------------------------------------
I think you are making a potentially very dangerous mistake!

Type 8e is a partition type. You don't want to use resize2fs on the PARTITION,
which is not an ext2 partition, but an lvm partition. You want
to resize the filesystem on the logical VOLUME.

And yes, resize2fs is appropriate for logical volumes. But resize the VOLUME
(e.g. /dev/VolGroup00/LogVol00), not the partition or volume group.

On Fri, Mar 04, 2005 at 06:35:31PM +0000, Robert Buick wrote:
> I'm using type 8e, does anyone happen to know if resize2fs is
> appropriate for this type; the man page only mentions type2.
--------------------------------------------------------------------------------
A method of hunting for two text strings in a raw disk, after files
have been deleted.  The data blocks of the disk are read once, but
grep'd twice.

seki-root> reblock -e 75216016 $(expr 1024 \* 1024) 300 <
/dev/mapper/VolGroup00-LogVol00 | mtee 'egrep --binary-files=text -i -B
1000 -A 1000 dptutil > dptutil-hits' 'egrep --binary-files=text -i
-B 1000 -A 1000 dptmgr > dptmgr-hits'
stdin seems seekable, but file length is 0 - no exact percentages
Estimated filetransfer size is 77021200384 bytes
Estimated percentages will only be as accurate as your size estimate
Creating 2 pipes
popening egrep --binary-files=text -i -B 1000 -A 1000 dptutil > dptutil-hits
popening egrep --binary-files=text -i -B 1000 -A 1000 dptmgr > dptmgr-hits
(estimate: 0.1%  0s 56m 11h) Kbytes: 106496.0  Mbits/s: 13.6  Gbytes/hr:
6.0  min: 1.0
(estimate: 0.2%  9s 12m 12h) Kbytes: 214016.0  Mbits/s: 13.3  Gbytes/hr:
5.8  min: 2.0
(estimate: 0.3%  58s 58m 11h) Kbytes: 257024.0  Mbits/s: 13.5  Gbytes/hr:
5.9  min: 2.4
...

references:
http://stromberg.dnsalias.org/~strombrg/reblock.html
http://stromberg.dnsalias.org/~strombrg/mtee.html
egrep --help
--------------------------------------------------------------------------------
Performing the above reblock | mtee, my fedora core 3 system got -very-
slow.  If I were to suspend the pipeline above, performance would be
great.  If I resumed it, very quickly, performance would be bad again.
This command seems to have left my sytem a little bit jerky, but it's
-far- more usable now, despite the pipeline above still pounding the
SATA drive my home directory is on.

seki-root> echo deadline > scheduler 
Wed Mar 09 17:56:58

seki-root> cat scheduler 
noop anticipatory [deadline] cfq 
Wed Mar 09 17:57:00

seki-root> pwd
/sys/block/sdb/queue
Wed Mar 09 17:58:31

BTW, I looked into tagged command queuing for this system as well,
but apparently VIA SATA doesn't support TCQ on linux 2.6.x.
--------------------------------------------------------------------------------
Eventually the reblock | mtee egrep egrep gave:
egrep: memory exhausted
...using GNU egrep 2.5.1.
...so now I'm trying something closer to my classical method:
seki-root> reblock -e 75216016 $(expr 1024 \* 1024) 300 <
/dev/mapper/VolGroup00-LogVol00 | mtee './bgrep dptutil | ./ranges >
dptutil-ranges' './bgrep dptmgr | ./ranges > dptmgr-ranges'
Creating 2 pipes
popening ./bgrep dptutil | ./ranges > dptutil-ranges
popening ./bgrep dptmgr | ./ranges > dptmgr-ranges
stdin seems seekable, but file length is 0 - no exact percentages
Estimated filetransfer size is 77021200384 bytes
Estimated percentages will only be as accurate as your size estimate
(estimate: 1.3%  16s 12m 1h) Kbytes: 1027072.0  Mbits/s: 133.6  Gbytes/hr:
58.7  min: 1.0
(estimate: 2.5%  36s 16m 1h) Kbytes: 1913856.0  Mbits/s: 124.5  Gbytes/hr:
54.7  min: 2.0
(estimate: 3.7%  10s 17m 1h) Kbytes: 2814976.0  Mbits/s: 122.1  Gbytes/hr:
53.6  min: 3.0
(estimate: 4.9%  10s 17m 1h) Kbytes: 3706880.0  Mbits/s: 120.6  Gbytes/hr:
53.0  min: 4.0
...
--------------------------------------------------------------------------------
I've added a -s option to reblock, which makes it sleep for an  arbitrary
number of (fractions of) seconds between blocks.  Between this and the
I/O scheduler change, seki has become very pleasant to work on again,
despite the hunt for my missing palm memo.  :)
--------------------------------------------------------------------------------
From Bryan Ragon  

Here is a detailed list of steps that worked:

;; first backed up the first 512 bytes of /dev/hdb
# dd if=/dev/hdb of=~/hdb.first512 count=1 bs=512
1+0 records in
1+0 records out
 

;; zero them out, per Alasdair
# dd if=/dev/zero of=/dev/hdb count=1 bs=512
1+0 records in
1+0 records out

;; verified
# blockdev --rereadpt /dev/hdb
BLKRRPART: Input/output error

;; find the volumes
# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "media_vg" using metadata type lvm2

# pvscan
  PV /dev/hdb   VG media_vg   lvm2 [111.79 GB / 0    free]
  Total: 1 [111.79 GB] / in use: 1 [111.79 GB] / in no VG: 0 [0   ]

# lvmdiskscan
  /dev/hda1 [      494.16 MB]
  /dev/hda2 [        1.92 GB]
  /dev/hda3 [       18.65 GB]
  /dev/hdb  [      111.79 GB] LVM physical volume
  /dev/hdd1 [       71.59 GB]
  0 disks
  4 partitions
  1 LVM physical volume whole disk
  0 LVM physical volumes

# vgchange -a y
  1 logical volume(s) in volume group "media_vg" now active

;; /media is a defined mount point in fstab, listed below for future archive
searches
# mount /media
# ls /media
graphics  lost+found  movies  music


Success!!  Thank you, Alasdair!!!!

/etc/fstab

/dev/media_vg/media_lv  /media          ext3            noatime
0 0

--------------------------------------------------------------------------------
home blee has:
hdc1 ext3 /big wdc
sda5 xfs /backups
00/00 ext3 hda ibm fc3: too hot?
00/01 swap hda ibm
01/00 ext3 hdd maxtor fc4
01/01 swap hdd maxtor
hdb that samsung dvd drive that overheats
--------------------------------------------------------------------------------

Recommended Links

Recovering a Lost LVM Volume Disk Novell User Communities

LVM by default keeps backup copies of it's meta data for all LVM devices. These backup files are stored in /etc/lvm/backup and /etc/lvm/archive. If a disk is removed or the meta data gets damaged in some way, it can be easily restored, if you have backups of the meta data. This is why it is highly recommended to never turn off LVM's auto backup feature. Even if a disk is permanently removed from the volume group, it can be reconstructed, and often times the remaining data on the file system recovered.

Recovery of RAID and LVM2 Volumes

A simple introduction to working with LVM (look at comments)

Reference

[May 17, 2010] Appendix D. LVM Volume Group Metadata

The configuration details of a volume group are referred to as the metadata. By default, an identical copy of the metadata is maintained in every metadata area in every physical volume within the volume group. LVM volume group metadata is small and stored as ASCII.

If a volume group contains many physical volumes, having many redundant copies of the metadata is inefficient. It is possible to create a physical volume without any metadata copies by using the --metadatacopies 0 option of the pvcreate command. Once you have selected the number of metadata copies the physical volume will contain, you cannot change that at a later point. Selecting 0 copies can result in faster updates on configuration changes. Note, however, that at all times every volume group must contain at least one physical volume with a metadata area (unless you are using the advanced configuration settings that allow you to store volume group metadata in a file system). If you intend to split the volume group in the future, every volume group needs at least one metadata copy.

The core metadata is stored in ASCII. A metadata area is a circular buffer. New metadata is appended to the old metadata and then the pointer to the start of it is updated.

You can specify the size of metadata area with the --metadatasize. option of the pvcreate command. The default size is too small for volume groups with many logical volumes or physical volumes.

The Physical Volume Label

By default, the pvcreate command places the physical volume label in the 2nd 512-byte sector. This label can optionally be placed in any of the first four sectors, since the LVM tools that scan for a physical volume label check the first 4 sectors. The physical volume label begins with the string LABELONE.

The physical volume label Contains:

Metadata locations are stored as offset and size (in bytes). There is room in the label for about 15 locations, but the LVM tools currently use 3: a single data area plus up to two metadata areas.

Metadata Contents

The volume group metadata contains:

The volume group information contains:



Etc

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: November 13, 2019