Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Backup of SGE configuration

News Oracle Grid Engine Recommended Links Job Post Mortem Job stuck in the queue problem   Enabling scheduling information qstat -j
Job or Queue Reported in Error State E MPI startup failure with message memlock is too small on infiniband ulimit problem with infiniband Creating and modifying SGE Queues Parallel Environment Gridengine diag tool  
Monitoring and Controlling Jobs Monitoring Queues qsub -- Submitting Jobs To Queue Instances SGE Submit Scripts MPI    
 Main troubleshooting commandss qstat qacct command qping qhost qmod SGE cheat sheet
Installation of SCE on a small set of multicore servers Usage of NFS Installation of the Master Host Installation of the Execution Hosts Perl Admin Tools and Scripts Humor Etc

Default SGE installation provides a standard backup of SGE configuration script:

$SGE_ROOT/util/upgrade_modules/save_sge_config.sh

This is a pretty big module, approximately 600 lines of "Sun jargon" style of shell programming. The main advantage is that it has the restore module giving you the ability to restore files you saved on a new instance of SGE or upgrade.

You need to invoke it with the directory in which you want to store the backup, for example

[0]root@lustwz99: # ll
total 116
drwxr-xr-x 15 root root 4096 Nov  4 10:23 ./
drwxr-xr-x  4 root root 4096 Nov  4 10:23 ../
-rw-r--r--  1 root root  160 Nov  4 10:23 admin_hosts
-rw-r--r--  1 root root    2 Nov  4 10:23 arseqnum
-rw-r--r--  1 root root   20 Nov  4 10:23 backup_date
drwxr-xr-x  2 root root 4096 Nov  4 10:23 calendars/
drwxr-xr-x  2 root root 4096 Nov  4 10:23 cell/
-rw-r--r--  1 root root 6314 Nov  4 10:23 centry
drwxr-xr-x  2 root root 4096 Nov  4 10:23 ckpt/
drwxr-xr-x  2 root root 4096 Nov  4 10:23 configurations/
drwxr-xr-x  2 root root 4096 Nov  4 10:23 cqueues/
drwxr-xr-x  2 root root 4096 Nov  4 10:23 execution/
drwxr-xr-x  2 root root 4096 Nov  4 10:23 hostgroups/
drwxr-xr-x  2 root root 4096 Nov  4 10:23 jobclasses/
-rw-r--r--  1 root root    5 Nov  4 10:23 jobseqnum
-rw-r--r--  1 root root    5 Nov  4 10:23 managers
-rw-r--r--  1 root root   31 Nov  4 10:23 operators
drwxr-xr-x  2 root root 4096 Nov  4 10:23 pe/
-rw-r--r--  1 root root   90 Nov  4 10:23 ports
drwxr-xr-x  2 root root 4096 Nov  4 10:23 projects/
drwxr-xr-x  2 root root 4096 Nov  4 10:23 resource_quotas/
-rw-r--r--  1 root root 1491 Nov  4 10:23 schedconf
-rw-r--r--  1 root root   34 Nov  4 10:23 sge_cell
-rw-r--r--  1 root root   36 Nov  4 10:23 sge_root
-rw-r--r--  1 root root    0 Nov  4 10:23 sharetree
-rw-r--r--  1 root root  160 Nov  4 10:23 submit_hosts
drwxr-xr-x  2 root root 4096 Nov  4 10:23 users/
drwxr-xr-x  2 root root 4096 Nov  4 10:23 usersets/
-rw-r--r--  1 root root   12 Nov  4 10:23 version

In addition to it you can also use simple but pretty adequate script for saving the main configuration files was published by  Jesse Becker in ul 16 2012  in [gridengine users] in the tread Saving OGE Configuration. It does not save scheduler configuration and couple of config file, but this can be easily added:

I use the attached script.  It's loosely based on something found
packaged with SGE a while back, but didn't fit my needs.

On Mon, Jul 16, 2012 at 04:37:03PM -0400, Joseph Farran wrote:
>Howdy.
>
>Is there a way to save the entire Open Grid Engine configuration to a file ( text hopefully )?
>
>In Torque one can save the entire Torque configuration with a: qmgr -c 'p s' > file
>
>Is there such a command in OGE?      If not, what is the recommended way to backup the configuration files?  Tar backups?
>
>Joseph
>_______________________________________________
>users mailing list
>users at gridengine.org
>https://gridengine.org/mailman/listinfo/users

-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)
-------------- next part --------------
#!/bin/bash

#Source the profile script, to set SGE_ROOT, etc.
# Otherwise, set it by hand.
. /etc/profile.d/sge.sh

# Alter as needed.
TMPDIR=./BACKUPS/SGE_BACKUP.$SGE_CELL.`date "+%Y%m%d"`
mkdir -p $TMPDIR
cd $TMPDIR

set -C

function log {
    echo "Backing up [$*]"
}

# Backups the queues
for queue in `qconf -sql`; do 
    qconf -sq $queue > queue_$queue
    log queue $queue
done

for hostgroup in `qconf -shgrpl`; do
    qconf -shgrp_tree $hostgroup > hostgroup_${hostgroup/@/}
    log hostgroup  $hostgroup
done

#And the default queue
log queue default
qconf -sq > queue_default

log cluster configuration
qconf -sconf > cluster_conf

log scheduler configuration
qconf -ssconf > sched_conf

log complexes
qconf -sc > complexes

log sharetree
qconf -sstree > share_tree

log usersets
qconf -sul | sort > userset_list

log users
qconf -suserl | sort > user_list

log managers
qconf -sm | sort > manager_list

log operators
qconf -so | sort > operator_list

log RQS 
qconf -srqs > resource_quotas

for pe in `qconf -spl`; do 
    log PE $pe
	qconf -sp $pe > pe_$pe
done

for project in `qconf -sprjl`; do 
    log project $project
	qconf -sprj $project > project_$project
done

for su in `qconf -sul`; do 
    log user list $su
	qconf -su $su > ul_$su
done

for e in `qconf -sel`; do 
    log exec host $e
	qconf -se $e > exec_$e
done

log exec host global
qconf -se global > exec_global

log job sequence numbers
cp $SGE_ROOT/$SGE_CELL/spool/qmaster/jobseqnum .
cp $SGE_ROOT/$SGE_CELL/spool/qmaster/arseqnum .

cd -

#gzip -9r $TMPDIR
echo "Making tarball..."
tar cjf ${TMPDIR}.tar.bz2 $TMPDIR -C `dirname $TMPDIR` && rm -rf $TMPDIR

Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Mar 24, 2021] How To Setup Backup Server Using Rsnapshot by Senthil Kumar

Apr 13, 2017 | ostechnix.com

... ... ...

Now, edit rsnapshot config file using command:

$ sudo nano /etc/rsnapshot.conf

The default configuration should just work fine. All you need to to define the backup directories and backup intervals.

First, let us setup the Root backup directory i.e We need to choose the directory where we want to store the file system back ups. In our case, I will store the back ups in /rsnapbackup/ directory.

https://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-6701402139964678&output=html&h=280&adk=2935052334&adf=2291697075&pi=t.aa~a.4159015635~i.80~rp.4&w=780&fwrn=4&fwrnh=100&lmt=1616633116&num_ads=1&rafmt=1&armr=3&sem=mc&pwprc=8125749717&psa=1&ad_type=text_image&format=780x280&url=https%3A%2F%2Fostechnix.com%2Fsetup-backup-server-using-rsnapshot-linux%2F&flash=0&fwr=0&pra=3&rh=195&rw=779&rpe=1&resp_fmts=3&wgl=1&fa=27&adsid=ChAI8MbrggYQlaj876O1srwUEioAzRwCZrDRDgBUvrQaW5GbXDwh86QENBlw-v7-PR-7DnhX3_cVwCq2ufI&dt=1616633105357&bpp=3&bdt=1341&idt=3&shv=r20210322&cbv=r20190131&ptt=9&saldr=aa&abxe=1&cookie=ID%3De131ae5ed4aa45d7-22a89a6e0dc7000a%3AT%3D1616631806%3ART%3D1616631806%3AS%3DALNI_MYN9WDd7gGVc8V-I7ZewIJezifOTg&prev_fmts=728x90%2C780x280%2C340x280%2C0x0%2C780x280%2C780x280%2C780x280&nras=5&correlator=877677215578&frm=20&pv=1&ga_vid=1440358310.1616631807&ga_sid=1616633105&ga_hid=2128223842&ga_fc=0&u_tz=-240&u_his=2&u_java=0&u_h=864&u_w=1536&u_ah=864&u_aw=1536&u_cd=24&u_nplug=3&u_nmime=4&adx=175&ady=5878&biw=1519&bih=762&scr_x=0&scr_y=2900&eid=31060287%2C44738185%2C44739387&oid=3&psts=AGkb-H_3MfY9AQf3__CNSVyjoDCpYu_ZaKaiHYqFHQ1wQJDCJhk-2CFzgXs7lxCtimCs29RaZoqMJvVRxIA%2CAGkb-H9jFVqbzgOeUl3vj0ufHziiJDG88wHSpYyHea1_SuZgYgku_spXI7u_Mw5lq5Lx3672kLVBHMXw5w%2CAGkb-H8awkyuv_oJsZhhOe9IPjgFhtTwqlJq7XJ6gfvkEWF40FhbHLmHilOFpHgD-K83h1G7n8vaRUTehfg%2CAGkb-H_ckOyStZCDLNTeIVabiCebw66dSIyH-MfyFZiH6pq4r1inFyrp81fGuJNHKRHVUVrMh_XNbpv-MLw%2CAGkb-H9SM9DZZmFihNrYkWRPSzDdb43TR0v35Yg8f_jeA4jEtFAhWB2AT2V1ONIP_oGSOumj3xM3sJE4GV43sQ%2CAGkb-H9SuZhdVHNjd3JIq9uWz6juU33Nlwy5JKxcDxmnxl-AC1GFKkElCoVRPBCv17-xB6hWLjhR0FtouuW-vw&pvsid=2810665002744857&pem=289&ref=https%3A%2F%2Fostechnix.com%2Fcategory%2Fbackup-tools%2F&rx=0&eae=0&fc=384&brdim=1536%2C0%2C1536%2C0%2C1536%2C0%2C1536%2C864%2C1536%2C762&vis=1&rsz=%7C%7Cs%7C&abl=NS&fu=8320&bc=31&jar=2021-03-20-21&ifi=8&uci=a!8&btvi=5&fsb=1&xpc=Z7XUbAeR7w&p=https%3A//ostechnix.com&dtd=10931

# All snapshots will be stored under this root directory.
#
snapshot_root   /rsnapbackup/

Again, you should use TAB key between snapshot_root element and your backup directory.

Scroll down a bit, and make sure the following lines (marked in bold) are uncommented:

[...]
#################################
# EXTERNAL PROGRAM DEPENDENCIES #
#################################

# LINUX USERS: Be sure to uncomment "cmd_cp". This gives you extra features.
# EVERYONE ELSE: Leave "cmd_cp" commented out for compatibility.
#
# See the README file or the man page for more details.
#
cmd_cp /usr/bin/cp

# uncomment this to use the rm program instead of the built-in perl routine.
#
cmd_rm /usr/bin/rm

# rsync must be enabled for anything to work. This is the only command that
# must be enabled.
#
cmd_rsync /usr/bin/rsync

# Uncomment this to enable remote ssh backups over rsync.
#
cmd_ssh /usr/bin/ssh

# Comment this out to disable syslog support.
#
cmd_logger /usr/bin/logger

# Uncomment this to specify the path to "du" for disk usage checks.
# If you have an older version of "du", you may also want to check the
# "du_args" parameter below.
#
cmd_du /usr/bin/du

[...]

Next, we need to define the backup intervals:

#########################################
# BACKUP LEVELS / INTERVALS #
# Must be unique and in ascending order #
# e.g. alpha, beta, gamma, etc. #
#########################################

retain alpha 6
retain beta 7
retain gamma 4
#retain delta 3

Here, retain alpha 6 means that every time rsnapshot alpha run, it will make a new snapshot, rotate the old ones, and retain the most recent six (alpha.0 - alpha.5). You can define your own intervals. For more details, refer the rsnapshot man pages.

https://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-6701402139964678&output=html&h=280&adk=2935052334&adf=1889294700&pi=t.aa~a.4159015635~i.94~rp.4&w=780&fwrn=4&fwrnh=100&lmt=1616633121&num_ads=1&rafmt=1&armr=3&sem=mc&pwprc=8125749717&psa=1&ad_type=text_image&format=780x280&url=https%3A%2F%2Fostechnix.com%2Fsetup-backup-server-using-rsnapshot-linux%2F&flash=0&fwr=0&pra=3&rh=195&rw=779&rpe=1&resp_fmts=3&wgl=1&fa=27&adsid=ChAI8MbrggYQlaj876O1srwUEioAzRwCZrDRDgBUvrQaW5GbXDwh86QENBlw-v7-PR-7DnhX3_cVwCq2ufI&dt=1616633105367&bpp=2&bdt=1351&idt=2&shv=r20210322&cbv=r20190131&ptt=9&saldr=aa&abxe=1&cookie=ID%3De131ae5ed4aa45d7-22a89a6e0dc7000a%3AT%3D1616631806%3ART%3D1616631806%3AS%3DALNI_MYN9WDd7gGVc8V-I7ZewIJezifOTg&prev_fmts=728x90%2C780x280%2C340x280%2C0x0%2C780x280%2C780x280%2C780x280%2C780x280&nras=6&correlator=877677215578&frm=20&pv=1&ga_vid=1440358310.1616631807&ga_sid=1616633105&ga_hid=2128223842&ga_fc=0&u_tz=-240&u_his=2&u_java=0&u_h=864&u_w=1536&u_ah=864&u_aw=1536&u_cd=24&u_nplug=3&u_nmime=4&adx=175&ady=7945&biw=1519&bih=762&scr_x=0&scr_y=4898&eid=31060287%2C44738185%2C44739387&oid=3&psts=AGkb-H_3MfY9AQf3__CNSVyjoDCpYu_ZaKaiHYqFHQ1wQJDCJhk-2CFzgXs7lxCtimCs29RaZoqMJvVRxIA%2CAGkb-H9jFVqbzgOeUl3vj0ufHziiJDG88wHSpYyHea1_SuZgYgku_spXI7u_Mw5lq5Lx3672kLVBHMXw5w%2CAGkb-H8awkyuv_oJsZhhOe9IPjgFhtTwqlJq7XJ6gfvkEWF40FhbHLmHilOFpHgD-K83h1G7n8vaRUTehfg%2CAGkb-H_ckOyStZCDLNTeIVabiCebw66dSIyH-MfyFZiH6pq4r1inFyrp81fGuJNHKRHVUVrMh_XNbpv-MLw%2CAGkb-H9SM9DZZmFihNrYkWRPSzDdb43TR0v35Yg8f_jeA4jEtFAhWB2AT2V1ONIP_oGSOumj3xM3sJE4GV43sQ%2CAGkb-H9SuZhdVHNjd3JIq9uWz6juU33Nlwy5JKxcDxmnxl-AC1GFKkElCoVRPBCv17-xB6hWLjhR0FtouuW-vw%2CAGkb-H_vc2WdY5H-Moj-ezEu7IDslUkOhKidPtG9RNqCgdFTwDB78MvRCqHwatWcUx6zfLcmgkpZDH-Ssas&pvsid=2810665002744857&pem=289&ref=https%3A%2F%2Fostechnix.com%2Fcategory%2Fbackup-tools%2F&rx=0&eae=0&fc=384&brdim=1536%2C0%2C1536%2C0%2C1536%2C0%2C1536%2C864%2C1536%2C762&vis=1&rsz=%7C%7Cs%7C&abl=NS&fu=8320&bc=31&jar=2021-03-20-21&ifi=9&uci=a!9&btvi=6&fsb=1&xpc=DkVUC47tnJ&p=https%3A//ostechnix.com&dtd=16546

Next, we need to define the backup directories. Find the following directives in your rsnapshot config file and set the backup directory locations.

###############################
### BACKUP POINTS / SCRIPTS ###
###############################

# LOCALHOST
backup /root/ostechnix/ server/

Here, I am going to backup the contents of /root/ostechnix/ directory and save them in /rsnapbackup/server/ directory. Please note that I didn't specify the full path (/rsnapbackup/server/ ) in the above configuration. Because, we already mentioned the Root backup directory earlier.

Likewise, define the your remote client systems backup location.

# REMOTEHOST
backup [email protected]:/home/sk/test/ client/

Here, I am going to backup the contents of my remote client system's /home/sk/test/ directory and save them in /rsnapbackup/client/ directory in my Backup server. Again, please note that I didn't specify the full path (/rsnapbackup/client/ ) in the above configuration. Because, we already mentioned the Root backup directory before.

Save and close /ect/rsnapshot.conf file.

Once you have made all your changes, run the following command to verify that the config file is syntactically valid.

rsnapshot configtest

If all is well, you will see the following output.

Syntax OK
Testing backups

Run the following command to test backups.

rsnapshot alpha

This take a few minutes depending upon the size of back ups.

Verifying backups

Check the whether the backups are really stored in the Root backup directory in the Backup server.

ls /rsnapbackup/

You will see the following output:

alpha.0

Check the alpha.0 directory:

ls /rsnapbackup/alpha.0/

You will see there are two directories automatically created, one for local backup (server), and another one for remote systems (client).

client/ server/

Check the client system back ups:

ls /rsnapbackup/alpha.0/client

Check the server system(local system) back ups:

ls /rsnapbackup/alpha.0/server
Automate back ups

You don't/can't run the rsnapshot command to make backup every time. Define a cron job and automate the backup job.

sudo vi /etc/cron.d/rsnapshot

Add the following lines:

0 */4 * * *     /usr/bin/rsnapshot alpha
50 23 * * *     /usr/bin/rsnapshot beta
00 22 1 * *     /usr/bin/rsnapshot delta

The first line indicates that there will be six alpha snapshots taken each day (at 0,4,8,12,16, and 20 hours), beta snapshots taken every night at 11:50pm, and delta snapshots will be taken at 10pm on the first day of each month. You can adjust timing as per your wish. Save and close the file.

Done! Rsnapshot will automatically take back ups on the defined time in the cron job. For more details, refer the man pages.

man rsnapshot

That's all for now. Hope this helps. I will soon here with another interesting guide. If you find this guide useful, please share it on your social, professional networks and support OSTechNix.

Cheers!

[Mar 24, 2021] How To Backup Your Entire Linux System Using Rsync by Senthil Kumar

Apr 25, 2017 | ostechnix.com

... ... ..

To backup the entire system, all you have to do is open your Terminal and run the following command as root user:

$ sudo rsync -aAXv / --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} /mnt

This command will backup the entire root ( / ) directory, excluding /dev, /proc, /sys, /tmp, /run, /mnt, /media, /lost+found directories, and save the data in /mnt folder.

[Mar 24, 2021] CYA - System Snapshot And Restore Utility For Linux by Senthil Kumar

Jul 23, 2018 | ostechnix.com

CYA , stands for C over Y our A ssets, is a free, open source system snapshot and restore utility for any Unix-like operating systems that uses BASH shell. Cya is portable and supports many popular filesystems such as EXT2/3/4, XFS, UFS, GPFS, reiserFS, JFS, BtrFS, and ZFS etc. Please note that Cya will not backup the actual user data . It only backups and restores the operating system itself. Cya is actually a system restore utility . By default, it will backup all key directories like /bin/, /lib/, /usr/, /var/ and several others. You can, however, define your own directories and files path to include in the backup, so Cya will pick those up as well. Also, it is possible define some directories/files to skip from the backup. For example, you can skip /var/logs/ if you don't log files. Cya actually uses Rsync backup method under the hood. However, Cya is little bit easier than Rsync when creating rolling backups.

When restoring your operating system, Cya will rollback the OS using your backup profile which you created earlier. You can either restore the entire system or any specific directories only. You can also easily access the backup files even without a complete rollback using your terminal or file manager. Another notable feature is we can generate a custom recovery script to automate the mounting of your system partition(s) when you restore off a live CD, USB, or network image. In a nutshell, CYA can help you to restore your system to previous state when you end-up with a broken system caused by software update, configuration changes and intrusions/hacks etc.

... ... ...

Conclusion

Unlike Systemback and other system restore utilities, Cya is not a distribution-specific restore utility. It supports many Linux operating systems that uses BASH. It is one of the must-have applications in your arsenal. Install it right away and create snapshots. You won't regret when you accidentally crashed your Linux system.

[Aug 21, 2020] ReaR- Backup and Recover your Linux server with confidence by Sreejith Anujan

The article discusses several scarios of using ReaR. Not much new or interesting but looks like OK overview of version 2.4
Aug 19, 2020 | www.redhat.com
... ... ... Deploy ReaR on the server to be backed up

On the production server, install the rear , genisoimage , and syslinux packages. In RHEL, these packages are part of the base repository.

... ... ...

As suggested by the login banner, run rear recover to restore the system by connecting to the storage server. From there, retrieve the backup.tar.gz and restore it to the right destination with appropriate permissions.

RESCUE production: ~ # rear -v -d recover
... ... ...

...ReaR is an integral part of many Linux-based backup solutions. OpenStack and the Red Hat OpenStack Platform use ReaR for the undercloud and control plane backup and restore. Watch for future articles in this space addressing the patching and rollback options for applications and operating systems.

For more on using ReaR in a Red Hat Enterprise Linux production environment, be sure to consult this solution in the Red Hat Customer Portal.

Sreejith Anujan is a cloud technology professional with more than 14 years of experience in on-premise and public cloud providers. He enjoys working with customers on their enablement plans to upskill the technical team on container and automation tooling. More about me

[Jul 14, 2020] Linux stories- When backups saved the day - Enable Sysadmin

Jul 14, 2020 | www.redhat.com

I set up a backup approach that software vendors refer to as instant restore, shadow restore, preemptive restore, or similar term. We ran incremental backup jobs every hour and restored the backups in the background to a new virtual machine. Each full hour, we had a system ready that was four hours back in time and just needed to be finished. So if I choose to restore the incremental from one hour ago, it would take less time than a complete system restore because only the small increments had to be restored to the almost-ready virtual machine.

And the effort paid off

One day, I was on vacation, having a barbecue and some beer, when I got a call from my colleague telling me that the terminal server with the ERP application was broken due to a failed update and the guy who ran the update forgot to take a snapshot first.

The only thing I needed to tell my colleague was to shut down the broken machine, find the UI of our backup/restore system, and then identify the restore job. Finally, I told him how to choose the timestamp from the last four hours when the restore should finish. The restore finished 30 minutes later, and the system was ready to be used again. We were back in action after a total of 30 minutes, and only the work from the last two hours or so was lost! Awesome! Now, back to vacation.

[Mar 23, 2020] Relax-and-Recover - Backup and Recover a Linux System

A good overview
Mar 23, 2020 | www.tecmint.com

...

Relax-and-Recover Key Features:
  1. It has a modular design written in Bash and can be extended using custom functionality.
  2. Supports various boot media including ISO, PXE, OBDR tape, USB or eSATA storage.
  3. Supports a variety of network protocols including FTP, SFTP, HTTP, NFS, and CIFS for storage and backup .
  4. Supports disk layout implementation such as LVM, DRBD, iSCSI, HWRAID (HP SmartArray), SWRAID, multipathing, and LUKS (encrypted partitions and filesystems).
  5. Supports both third-party and internal backup tools including IBM TSM, HP DataProtector, Symantec NetBackup, Bacula; tar and rsync .
  6. Supports booting via PXE, DVD/CD, bootable tape or virtual provisioning.
  7. Supports a simulation model that shows what scripts are run without executing them.
  8. Supports consistent logging and advanced debugging options for troubleshooting purposes.
  9. It can be integrated with monitoring tools such as Nagios and Opsview.
  10. It can also be integrated with job schedulers such as cron .
  11. It also supports various virtualization technologies supported (KVM, Xen, VMware).

In this article, you will learn how to install and configure ReaR to create a rescue system and/or system backup using a USB stick and rescue or restore a bare-metal Linux system after a disaster.

... ... ...

[Mar 05, 2020] The 3-2-1 rule for backups says there should be at least three copies or versions of data stored on two different pieces of media, one of which is off-site

Mar 05, 2020 | www.networkworld.com

As the number of places where we store data increases, the basic concept of what is referred to as the 3-2-1 rule often gets forgotten. This is a problem, because the 3-2-1 rule is easily one of the most foundational concepts for designing . It's important to understand why the rule was created, and how it's currently being interpreted in an increasingly tapeless world.

What is the 3-2-1 rule for backup?

The 3-2-1 rule says there should be at least three copies or versions of data stored on two different pieces of media, one of which is off-site. Let's take a look at each of the three elements and what it addresses.

Mind the air gap

An air gap is a way of securing a copy of data by placing it on a machine on a network that is physically separate from the data it is backing up. It literally means there is a gap of air between the primary and the backup. This air gap accomplishes more than simple disaster recovery; it is also very useful for protecting against hackers.

If all backups are accessible via the same computers that might be attacked, it is possible that a hacker could use a compromised server to attack your backup server. By separating the backup from the primary via an air gap, you make it harder for a hacker to pull that off. It's still not impossible, just harder.

Everyone wants an air gap. The discussion these days is how to accomplish an air gap without using tapes.Back in the days of tape backup, it was easy to provide an air gap. You made a backup copy of your data and put it in a box, then you handed it to an Iron Mountain driver. Instantly, there was a gap of air between your primary and your backup. It was close to impossible for a hacker to attack both the primary and the backup.

That is not to say it was impossible; it just made it harder. For hackers to attack your secondary copy, they needed to resort to a physical attack via social engineering. You might think that tapes stored in an off-site storage facility would be impervious to a physical attack via social engineering, but that is definitely not the case. (I have personally participated in white hat attacks of off-site storage facilities, successfully penetrated them and been left unattended with other people's backups.) Most hackers don't resort to physical attacks because they are just too risky, so air-gapping backups greatly reduces the risk that they will be compromised.

Faulty 3-2-1 implementations

Many things that pass for backup systems now do not pass even the most liberal interpretation of the 3-2-1 rule. A perfect example of this would be various cloud-based services that store the backups on the same servers and the same storage facility that they are protecting, ignoring the "2" and the "1" in this important rule.

[Nov 08, 2019] 13 open source backup solutions by Don Watkins

This is mostly the list. You need to do your own research. Some improtant backup applications are not mentioned. It is unclear from it what are methods used in each, and why each of them is preferable to tar. The stress in the list is on portability (Linux plus Mc and windows, not just Linux)
Mar 07, 2019 | opensource.com

Recently, we published a poll that asked readers to vote on their favorite open source backup solution. We offered six solutions recommended by our moderator community -- Cronopete, Deja Dup, Rclone, Rdiff-backup, Restic, and Rsync -- and invited readers to share other options in the comments. And you came through, offering 13 other solutions (so far) that we either hadn't considered or hadn't even heard of.

By far the most popular suggestion was BorgBackup . It is a deduplicating backup solution that features compression and encryption. It is supported on Linux, MacOS, and BSD and has a BSD License.

Second was UrBackup , which does full and incremental image and file backups; you can save whole partitions or single directories. It has clients for Windows, Linux, and MacOS and has a GNU Affero Public License.

Third was LuckyBackup ; according to its website, "it is simple to use, fast (transfers over only changes made and not all data), safe (keeps your data safe by checking all declared directories before proceeding in any data manipulation), reliable, and fully customizable." It carries a GNU Public License.

Casync is content-addressable synchronization -- it's designed for backup and synchronizing and stores and retrieves multiple related versions of large file systems. It is licensed with the GNU Lesser Public License.

Syncthing synchronizes files between two computers. It is licensed with the Mozilla Public License and, according to its website, is secure and private. It works on MacOS, Windows, Linux, FreeBSD, Solaris, and OpenBSD.

Duplicati is a free backup solution that works on Windows, MacOS, and Linux and a variety of standard protocols, such as FTP, SSH, and WebDAV, and cloud services. It features strong encryption and is licensed with the GPL.

Dirvish is a disk-based virtual image backup system licensed under OSL-3.0. It also requires Rsync, Perl5, and SSH to be installed.

Bacula 's website says it "is a set of computer programs that permits the system administrator to manage backup, recovery, and verification of computer data across a network of computers of different kinds." It is supported on Linux, FreeBSD, Windows, MacOS, OpenBSD, and Solaris and the bulk of its source code is licensed under AGPLv3.

BackupPC "is a high-performance, enterprise-grade system for backing up Linux, Windows, and MacOS PCs and laptops to a server's disk," according to its website. It is licensed under the GPLv3.

Amanda is a backup system written in C and Perl that allows a system administrator to back up an entire network of client machines to a single server using tape, disk, or cloud-based systems. It was developed and copyrighted in 1991 at the University of Maryland and has a BSD-style license.

Back in Time is a simple backup utility designed for Linux. It provides a command line client and a GUI, both written in Python. To do a backup, just specify where to store snapshots, what folders to back up, and the frequency of the backups. BackInTime is licensed with GPLv2.

Timeshift is a backup utility for Linux that is similar to System Restore for Windows and Time Capsule for MacOS. According to its GitHub repository, "Timeshift protects your system by taking incremental snapshots of the file system at regular intervals. These snapshots can be restored at a later date to undo all changes to the system."

Kup is a backup solution that was created to help users back up their files to a USB drive, but it can also be used to perform network backups. According to its GitHub repository, "When you plug in your external hard drive, Kup will automatically start copying your latest changes."

[Nov 07, 2019] 13 open source backup solutions Opensource.com

Nov 07, 2019 | opensource.com

13 open source backup solutions Readers suggest more than a dozen of their favorite solutions for protecting data. 07 Mar 2019 Don Watkins (Community Moderator) Feed 124 up 6 comments Image by : Opensource.com x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0 poll that asked readers to vote on their favorite open source backup solution. We offered six solutions recommended by our moderator community -- Cronopete, Deja Dup, Rclone, Rdiff-backup, Restic, and Rsync -- and invited readers to share other options in the comments. And you came through, offering 13 other solutions (so far) that we either hadn't considered or hadn't even heard of.

By far the most popular suggestion was BorgBackup . It is a deduplicating backup solution that features compression and encryption. It is supported on Linux, MacOS, and BSD and has a BSD License.

Second was UrBackup , which does full and incremental image and file backups; you can save whole partitions or single directories. It has clients for Windows, Linux, and MacOS and has a GNU Affero Public License.

Third was LuckyBackup ; according to its website, "it is simple to use, fast (transfers over only changes made and not all data), safe (keeps your data safe by checking all declared directories before proceeding in any data manipulation), reliable, and fully customizable." It carries a GNU Public License.

Casync is content-addressable synchronization -- it's designed for backup and synchronizing and stores and retrieves multiple related versions of large file systems. It is licensed with the GNU Lesser Public License.

Syncthing synchronizes files between two computers. It is licensed with the Mozilla Public License and, according to its website, is secure and private. It works on MacOS, Windows, Linux, FreeBSD, Solaris, and OpenBSD.

Duplicati is a free backup solution that works on Windows, MacOS, and Linux and a variety of standard protocols, such as FTP, SSH, and WebDAV, and cloud services. It features strong encryption and is licensed with the GPL.

Dirvish is a disk-based virtual image backup system licensed under OSL-3.0. It also requires Rsync, Perl5, and SSH to be installed.

Bacula 's website says it "is a set of computer programs that permits the system administrator to manage backup, recovery, and verification of computer data across a network of computers of different kinds." It is supported on Linux, FreeBSD, Windows, MacOS, OpenBSD, and Solaris and the bulk of its source code is licensed under AGPLv3.

BackupPC "is a high-performance, enterprise-grade system for backing up Linux, Windows, and MacOS PCs and laptops to a server's disk," according to its website. It is licensed under the GPLv3.

Amanda is a backup system written in C and Perl that allows a system administrator to back up an entire network of client machines to a single server using tape, disk, or cloud-based systems. It was developed and copyrighted in 1991 at the University of Maryland and has a BSD-style license.

Back in Time is a simple backup utility designed for Linux. It provides a command line client and a GUI, both written in Python. To do a backup, just specify where to store snapshots, what folders to back up, and the frequency of the backups. BackInTime is licensed with GPLv2.

Timeshift is a backup utility for Linux that is similar to System Restore for Windows and Time Capsule for MacOS. According to its GitHub repository, "Timeshift protects your system by taking incremental snapshots of the file system at regular intervals. These snapshots can be restored at a later date to undo all changes to the system."

Kup is a backup solution that was created to help users back up their files to a USB drive, but it can also be used to perform network backups. According to its GitHub repository, "When you plug in your external hard drive, Kup will automatically start copying your latest changes."

[Nov 06, 2019] My 10 Linux and UNIX Command Line Mistakes by Vivek Gite

May 20, 2018 | www.cyberciti.biz

I had only one backup copy of my QT project and I just wanted to get a directory called functions. I end up deleting entire backup (note -c switch instead of -x):
cd /mnt/bacupusbharddisk
tar -zcvf project.tar.gz functions

I had no backup. Similarly I end up running rsync command and deleted all new files by overwriting files from backup set (now I have switched to rsnapshot )
rsync -av -delete /dest /src
Again, I had no backup.

... ... ...

All men make mistakes, but only wise men learn from their mistakes -- Winston Churchill .
From all those mistakes I have learn that:
  1. You must keep a good set of backups. Test your backups regularly too.
  2. The clear choice for preserving all data of UNIX file systems is dump, which is only tool that guaranties recovery under all conditions. (see Torture-testing Backup and Archive Programs paper).
  3. Never use rsync with single backup directory. Create a snapshots using rsync or rsnapshots .
  4. Use CVS/git to store configuration files.
  5. Wait and read command line twice before hitting the dam [Enter] key.
  6. Use your well tested perl / shell scripts and open source configuration management software such as puppet, Ansible, Cfengine or Chef to configure all servers. This also applies to day today jobs such as creating the users and more.

Mistakes are the inevitable, so have you made any mistakes that have caused some sort of downtime? Please add them into the comments section below.

[Feb 11, 2019] Resuming rsync on a interrupted transfer

May 15, 2013 | stackoverflow.com

Glitches , May 15, 2013 at 18:06

I am trying to backup my file server to a remove file server using rsync. Rsync is not successfully resuming when a transfer is interrupted. I used the partial option but rsync doesn't find the file it already started because it renames it to a temporary file and when resumed it creates a new file and starts from beginning.

Here is my command:

rsync -avztP -e "ssh -p 2222" /volume1/ myaccont@backup-server-1:/home/myaccount/backup/ --exclude "@spool" --exclude "@tmp"

When this command is ran, a backup file named OldDisk.dmg from my local machine get created on the remote machine as something like .OldDisk.dmg.SjDndj23 .

Now when the internet connection gets interrupted and I have to resume the transfer, I have to find where rsync left off by finding the temp file like .OldDisk.dmg.SjDndj23 and rename it to OldDisk.dmg so that it sees there already exists a file that it can resume.

How do I fix this so I don't have to manually intervene each time?

Richard Michael , Nov 6, 2013 at 4:26

TL;DR : Use --timeout=X (X in seconds) to change the default rsync server timeout, not --inplace .

The issue is the rsync server processes (of which there are two, see rsync --server ... in ps output on the receiver) continue running, to wait for the rsync client to send data.

If the rsync server processes do not receive data for a sufficient time, they will indeed timeout, self-terminate and cleanup by moving the temporary file to it's "proper" name (e.g., no temporary suffix). You'll then be able to resume.

If you don't want to wait for the long default timeout to cause the rsync server to self-terminate, then when your internet connection returns, log into the server and clean up the rsync server processes manually. However, you must politely terminate rsync -- otherwise, it will not move the partial file into place; but rather, delete it (and thus there is no file to resume). To politely ask rsync to terminate, do not SIGKILL (e.g., -9 ), but SIGTERM (e.g., pkill -TERM -x rsync - only an example, you should take care to match only the rsync processes concerned with your client).

Fortunately there is an easier way: use the --timeout=X (X in seconds) option; it is passed to the rsync server processes as well.

For example, if you specify rsync ... --timeout=15 ... , both the client and server rsync processes will cleanly exit if they do not send/receive data in 15 seconds. On the server, this means moving the temporary file into position, ready for resuming.

I'm not sure of the default timeout value of the various rsync processes will try to send/receive data before they die (it might vary with operating system). In my testing, the server rsync processes remain running longer than the local client. On a "dead" network connection, the client terminates with a broken pipe (e.g., no network socket) after about 30 seconds; you could experiment or review the source code. Meaning, you could try to "ride out" the bad internet connection for 15-20 seconds.

If you do not clean up the server rsync processes (or wait for them to die), but instead immediately launch another rsync client process, two additional server processes will launch (for the other end of your new client process). Specifically, the new rsync client will not re-use/reconnect to the existing rsync server processes. Thus, you'll have two temporary files (and four rsync server processes) -- though, only the newer, second temporary file has new data being written (received from your new rsync client process).

Interestingly, if you then clean up all rsync server processes (for example, stop your client which will stop the new rsync servers, then SIGTERM the older rsync servers, it appears to merge (assemble) all the partial files into the new proper named file. So, imagine a long running partial copy which dies (and you think you've "lost" all the copied data), and a short running re-launched rsync (oops!).. you can stop the second client, SIGTERM the first servers, it will merge the data, and you can resume.

Finally, a few short remarks:

JamesTheAwesomeDude , Dec 29, 2013 at 16:50

Just curious: wouldn't SIGINT (aka ^C ) be 'politer' than SIGTERM ? – JamesTheAwesomeDude Dec 29 '13 at 16:50

Richard Michael , Dec 29, 2013 at 22:34

I didn't test how the server-side rsync handles SIGINT, so I'm not sure it will keep the partial file - you could check. Note that this doesn't have much to do with Ctrl-c ; it happens that your terminal sends SIGINT to the foreground process when you press Ctrl-c , but the server-side rsync has no controlling terminal. You must log in to the server and use kill . The client-side rsync will not send a message to the server (for example, after the client receives SIGINT via your terminal Ctrl-c ) - might be interesting though. As for anthropomorphizing, not sure what's "politer". :-) – Richard Michael Dec 29 '13 at 22:34

d-b , Feb 3, 2015 at 8:48

I just tried this timeout argument rsync -av --delete --progress --stats --human-readable --checksum --timeout=60 --partial-dir /tmp/rsync/ rsync://$remote:/ /src/ but then it timed out during the "receiving file list" phase (which in this case takes around 30 minutes). Setting the timeout to half an hour so kind of defers the purpose. Any workaround for this? – d-b Feb 3 '15 at 8:48

Cees Timmerman , Sep 15, 2015 at 17:10

@user23122 --checksum reads all data when preparing the file list, which is great for many small files that change often, but should be done on-demand for large files. – Cees Timmerman Sep 15 '15 at 17:10

[Feb 11, 2019] prsync command man page - pssh

Originally from Brent N. Chun ~ Intel Research Berkeley
Feb 11, 2019 | www.mankier.com

prsync -- parallel file sync program

Synopsis

prsync [ - v A r a z ] [ -h hosts_file ] [ -H [ user @] host [: port ]] [ -l user ] [ -p par ] [ -o outdir ] [ -e errdir ] [ -t timeout ] [ -O options ] [ -x args ] [ -X arg ] [ -S args ] local ... remote

Description

prsync is a program for copying files in parallel to a number of hosts using the popular rsync program. It provides features such as passing a password to ssh, saving output to files, and timing out.

Options
-h host_file
--hosts host_file
Read hosts from the given host_file . Lines in the host file are of the form [ user @] host [: port ] and can include blank lines and comments (lines beginning with "#"). If multiple host files are given (the -h option is used more than once), then prsync behaves as though these files were concatenated together. If a host is specified multiple times, then prsync will connect the given number of times.
-H
[ user @] host [: port ]
--host
[ user @] host [: port ]
-H
"[ user @] host [: port ] [ [ user @] host [: port ] ... ]"
--host
"[ user @] host [: port ] [ [ user @] host [: port ] ... ]"

Add the given host strings to the list of hosts. This option may be given multiple times, and may be used in conjunction with the -h option.

-l user
--user user
Use the given username as the default for any host entries that don't specifically specify a user.
-p parallelism
--par parallelism
Use the given number as the maximum number of concurrent connections.
-t timeout
--timeout timeout
Make connections time out after the given number of seconds. With a value of 0, prsync will not timeout any connections.
-o outdir
--outdir outdir
Save standard output to files in the given directory. Filenames are of the form [ user @] host [: port ][. num ] where the user and port are only included for hosts that explicitly specify them. The number is a counter that is incremented each time for hosts that are specified more than once.
-e errdir
--errdir errdir
Save standard error to files in the given directory. Filenames are of the same form as with the -o option.
-x args
--extra-args args
Passes extra rsync command-line arguments (see the rsync(1) man page for more information about rsync arguments). This option may be specified multiple times. The arguments are processed to split on whitespace, protect text within quotes, and escape with backslashes. To pass arguments without such processing, use the -X option instead.
-X arg
--extra-arg arg
Passes a single rsync command-line argument (see the rsync(1) man page for more information about rsync arguments). Unlike the -x option, no processing is performed on the argument, including word splitting. To pass multiple command-line arguments, use the option once for each argument.
-O options
--options options
SSH options in the format used in the SSH configuration file (see the ssh_config(5) man page for more information). This option may be specified multiple times.
-A
--askpass
Prompt for a password and pass it to ssh. The password may be used for either to unlock a key or for password authentication. The password is transferred in a fairly secure manner (e.g., it will not show up in argument lists). However, be aware that a root user on your system could potentially intercept the password.
-v
--verbose
Include error messages from rsync with the -i and \ options.
-r
--recursive
Recursively copy directories.
-a
--archive
Use rsync archive mode (rsync's -a option).
-z
--compress
Use rsync compression.
-S args
--ssh-args args
Passes extra SSH command-line arguments (see the ssh(1) man page for more information about SSH arguments). The given value is appended to the ssh command (rsync's -e option) without any processing.
Tips

The ssh_config file can include an arbitrary number of Host sections. Each host entry specifies ssh options which apply only to the given host. Host definitions can even behave like aliases if the HostName option is included. This ssh feature, in combination with pssh host files, provides a tremendous amount of flexibility.

Exit Status

The exit status codes from prsync are as follows:

0
Success
1
Miscellaneous error
2
Syntax or usage error
3
At least one process was killed by a signal or timed out.
4
All processes completed, but at least one rsync process reported an error (exit status other than 0).
Authors

Written by Brent N. Chun <[email protected]> and Andrew McNabb <[email protected]>.

https://github.com/lilydjwg/pssh

See Also

rsync(1) , ssh(1) , ssh_config(5) , pssh(1) , prsync (1), pslurp(1) , pnuke(1) ,

Referenced By

pnuke(1) , pscp.pssh(1) , pslurp(1) , pssh(1) .

[Feb 04, 2019] Do not play those dangerous games with resing of partitions unless absolutly nessesary

Copying to additional drive (can be USB), repartitioning and then copying everything back is a safer bet
May 07, 2017 | superuser.com
womble

In theory, you could reduce the size of sda1, increase the size of the extended partition, shift the contents of the extended partition down, then increase the size of the PV on the extended partition and you'd have the extra room.

However, the number of possible things that can go wrong there is just astronomical

So I'd recommend either buying a second hard drive (and possibly transferring everything onto it in a more sensible layout, then repartitioning your current drive better) or just making some bind mounts of various bits and pieces out of /home into / to free up a bit more space.

--womble

[Jan 29, 2019] How to Setup DRBD to Replicate Storage on Two CentOS 7 Servers by Aaron Kili

Notable quotes:
"... It mirrors the content of block devices such as hard disks, partitions, logical volumes etc. between servers. ..."
"... It involves a copy of data on two storage devices, such that if one fails, the data on the other can be used. ..."
"... Originally, DRBD was mainly used in high availability (HA) computer clusters, however, starting with version 9, it can be used to deploy cloud storage solutions. In this article, we will show how to install DRBD in CentOS and briefly demonstrate how to use it to replicate storage (partition) on two servers. ..."
www.thegeekdiary.com
The DRBD (stands for Distributed Replicated Block Device ) is a distributed, flexible and versatile replicated storage solution for Linux. It mirrors the content of block devices such as hard disks, partitions, logical volumes etc. between servers.

It involves a copy of data on two storage devices, such that if one fails, the data on the other can be used.

You can think of it somewhat like a network RAID 1 configuration with the disks mirrored across servers. However, it operates in a very different way from RAID and even network RAID.

Originally, DRBD was mainly used in high availability (HA) computer clusters, however, starting with version 9, it can be used to deploy cloud storage solutions. In this article, we will show how to install DRBD in CentOS and briefly demonstrate how to use it to replicate storage (partition) on two servers.

... ... ...

For the purpose of this article, we are using two nodes cluster for this setup.

... ... ...

Reference : The DRBD User's Guide .
Summary
Jan 19, 2019 | www.tecmint.com

DRBD is extremely flexible and versatile, which makes it a storage replication solution suitable for adding HA to just about any application. In this article, we have shown how to install DRBD in CentOS 7 and briefly demonstrated how to use it to replicate storage. Feel free to share your thoughts with us via the feedback form below.

[Jan 29, 2019] Extra security can be a dangerious thing

Viewing backup logs is vital. Often it only looks that backup is going fine...
Notable quotes:
"... Things looked fine until someone noticed that a directory with critically important and sensitive data was missing. Turned out that some manager had decided to 'secure' the directory by doing 'chmod 000 dir' to protect the data from inquisitive eyes when the data was not being used. ..."
"... Of course, tar complained about the situation and returned with non-null status, but since the backup procedure had seemed to work fine, no one thought it necessary to view the logs... ..."
Jul 20, 2017 | www.linuxjournal.com

Anonymous, 11/08/2002

At an unnamed location it happened thus... The customer had been using a home built 'tar' -based backup system for a long time. They were informed enough to have even tested and verified that recovery would work also.

Everything had been working fine, and they even had to do a recovery which went fine. Well, one day something evil happened to a disk and they had to replace the unit and do a full recovery.

Things looked fine until someone noticed that a directory with critically important and sensitive data was missing. Turned out that some manager had decided to 'secure' the directory by doing 'chmod 000 dir' to protect the data from inquisitive eyes when the data was not being used.

Of course, tar complained about the situation and returned with non-null status, but since the backup procedure had seemed to work fine, no one thought it necessary to view the logs...

[Jan 29, 2019] Backing things up with rsync

Notable quotes:
"... I RECURSIVELY DELETED ALL THE LIVE CORPORATE WEBSITES ON FRIDAY AFTERNOON AT 4PM! ..."
"... This is why it's ALWAYS A GOOD IDEA to use Midnight Commander or something similar to delete directories!! ..."
"... rsync with ssh as the transport mechanism works very well with my nightly LAN backups. I've found this page to be very helpful: http://www.mikerubel.org/computers/rsync_snapshots/ ..."
Jul 20, 2017 | www.linuxjournal.com

Anonymous on Fri, 11/08/2002 - 03:00.

The Subject, not the content, really brings back memories.

Imagine this, your tasked with complete control over the network in a multi-million dollar company. You've had some experience in the real world of network maintaince, but mostly you've learned from breaking things at home.

Time comes to implement (yes this was a startup company), a backup routine. You carefully consider the best way to do it and decide copying data to a holding disk before the tape run would be perfect in the situation, faster restore if the holding disk is still alive.

So off you go configuring all your servers for ssh pass through, and create the rsync scripts. Then before the trial run you think it would be a good idea to create a local backup of all the websites.

You logon to the web server, create a temp directory and start testing your newly advance rsync skills. After a couple of goes, you think your ready for the real thing, but you decide to run the test one more time.

Everything seems fine so you delete the temp directory. You pause for a second and your month drops open wider than it has ever opened before, and a feeling of terror overcomes you. You want to hide in a hole and hope you didn't see what you saw.

I RECURSIVELY DELETED ALL THE LIVE CORPORATE WEBSITES ON FRIDAY AFTERNOON AT 4PM!

Anonymous on Sun, 11/10/2002 - 03:00.

This is why it's ALWAYS A GOOD IDEA to use Midnight Commander or something similar to delete directories!!

...Root for (5) years and never trashed a filesystem yet (knockwoody)...

Anonymous on Fri, 11/08/2002 - 03:00.

rsync with ssh as the transport mechanism works very well with my nightly LAN backups. I've found this page to be very helpful: http://www.mikerubel.org/computers/rsync_snapshots/

[Jan 28, 2019] The ghost of the failed restore

Notable quotes:
"... "Of course! You told me that I had to stay a couple of extra hours to perform that task," I answered. "Exactly! But you preferred to leave early without finishing that task," he said. "Oh my! I thought it was optional!" I exclaimed. ..."
"... "It was, it was " ..."
"... Even with the best solution that promises to make the most thorough backups, the ghost of the failed restoration can appear, darkening our job skills, if we don't make a habit of validating the backup every time. ..."
Nov 01, 2018 | opensource.com

In a well-known data center (whose name I do not want to remember), one cold October night we had a production outage in which thousands of web servers stopped responding due to downtime in the main database. The database administrator asked me, the rookie sysadmin, to recover the database's last full backup and restore it to bring the service back online.

But, at the end of the process, the database was still broken. I didn't worry, because there were other full backup files in stock. However, even after doing the process several times, the result didn't change.

With great fear, I asked the senior sysadmin what to do to fix this behavior.

"You remember when I showed you, a few days ago, how the full backup script was running? Something about how important it was to validate the backup?" responded the sysadmin.

"Of course! You told me that I had to stay a couple of extra hours to perform that task," I answered. "Exactly! But you preferred to leave early without finishing that task," he said. "Oh my! I thought it was optional!" I exclaimed.

"It was, it was "

Moral of the story: Even with the best solution that promises to make the most thorough backups, the ghost of the failed restoration can appear, darkening our job skills, if we don't make a habit of validating the backup every time.

[Jan 28, 2019] The danger of a single backup harddrive (USB or not)

The most typical danger is dropping of the hard drive on the floor.
Notable quotes:
"... Also, backing up to another disk in the same computer will probably not save you when lighting strikes, as the backup disk is just as likely to be fried as the main disk. ..."
"... In real life, the backup strategy and hardware/software choices to support it is (as most other things) a balancing act. The important thing is that you have a strategy, and that you test it regularly to make sure it works as intended (as the main point is in the article). Also, realizing that achieving 100% backup security is impossible might save a lot of time in setting up the strategy. ..."
Nov 08, 2002 | www.linuxjournal.com

Anonymous on Fri, 11/08/2002

Why don't you just buy an extra hard disk and have a copy of your important data there. With today's prices it doesn't cost anything.

Anonymous on Fri, 11/08/2002 - 03:00. A lot of people seams to have this idea, and in many situations it should work fine.

However, there is the human factor. Sometimes simple things go wrong (as simple as copying a file), and it takes a while before anybody notices that the contents of this file is not what is expected. This means you have to have many "generations" of backup of the file in order to be able to restore it, and in order to not put all the "eggs in the same basket" each of the file backups should be on a physical device.

Also, backing up to another disk in the same computer will probably not save you when lighting strikes, as the backup disk is just as likely to be fried as the main disk.

In real life, the backup strategy and hardware/software choices to support it is (as most other things) a balancing act. The important thing is that you have a strategy, and that you test it regularly to make sure it works as intended (as the main point is in the article). Also, realizing that achieving 100% backup security is impossible might save a lot of time in setting up the strategy.

(I.e. you have to say that this strategy has certain specified limits, like not being able to restore a file to its intermediate state sometime during a workday, only to the state it had when it was last backed up, which should be a maximum of xxx hours ago and so on...)

Hallvard P

[Nov 13, 2018] Resuming rsync partial (-P/--partial) on a interrupted transfer

Notable quotes:
"... should ..."
May 15, 2013 | stackoverflow.com

Glitches , May 15, 2013 at 18:06

I am trying to backup my file server to a remove file server using rsync. Rsync is not successfully resuming when a transfer is interrupted. I used the partial option but rsync doesn't find the file it already started because it renames it to a temporary file and when resumed it creates a new file and starts from beginning.

Here is my command:

rsync -avztP -e "ssh -p 2222" /volume1/ myaccont@backup-server-1:/home/myaccount/backup/ --exclude "@spool" --exclude "@tmp"

When this command is ran, a backup file named OldDisk.dmg from my local machine get created on the remote machine as something like .OldDisk.dmg.SjDndj23 .

Now when the internet connection gets interrupted and I have to resume the transfer, I have to find where rsync left off by finding the temp file like .OldDisk.dmg.SjDndj23 and rename it to OldDisk.dmg so that it sees there already exists a file that it can resume.

How do I fix this so I don't have to manually intervene each time?

Richard Michael , Nov 6, 2013 at 4:26

TL;DR : Use --timeout=X (X in seconds) to change the default rsync server timeout, not --inplace .

The issue is the rsync server processes (of which there are two, see rsync --server ... in ps output on the receiver) continue running, to wait for the rsync client to send data.

If the rsync server processes do not receive data for a sufficient time, they will indeed timeout, self-terminate and cleanup by moving the temporary file to it's "proper" name (e.g., no temporary suffix). You'll then be able to resume.

If you don't want to wait for the long default timeout to cause the rsync server to self-terminate, then when your internet connection returns, log into the server and clean up the rsync server processes manually. However, you must politely terminate rsync -- otherwise, it will not move the partial file into place; but rather, delete it (and thus there is no file to resume). To politely ask rsync to terminate, do not SIGKILL (e.g., -9 ), but SIGTERM (e.g., pkill -TERM -x rsync - only an example, you should take care to match only the rsync processes concerned with your client).

Fortunately there is an easier way: use the --timeout=X (X in seconds) option; it is passed to the rsync server processes as well.

For example, if you specify rsync ... --timeout=15 ... , both the client and server rsync processes will cleanly exit if they do not send/receive data in 15 seconds. On the server, this means moving the temporary file into position, ready for resuming.

I'm not sure of the default timeout value of the various rsync processes will try to send/receive data before they die (it might vary with operating system). In my testing, the server rsync processes remain running longer than the local client. On a "dead" network connection, the client terminates with a broken pipe (e.g., no network socket) after about 30 seconds; you could experiment or review the source code. Meaning, you could try to "ride out" the bad internet connection for 15-20 seconds.

If you do not clean up the server rsync processes (or wait for them to die), but instead immediately launch another rsync client process, two additional server processes will launch (for the other end of your new client process). Specifically, the new rsync client will not re-use/reconnect to the existing rsync server processes. Thus, you'll have two temporary files (and four rsync server processes) -- though, only the newer, second temporary file has new data being written (received from your new rsync client process).

Interestingly, if you then clean up all rsync server processes (for example, stop your client which will stop the new rsync servers, then SIGTERM the older rsync servers, it appears to merge (assemble) all the partial files into the new proper named file. So, imagine a long running partial copy which dies (and you think you've "lost" all the copied data), and a short running re-launched rsync (oops!).. you can stop the second client, SIGTERM the first servers, it will merge the data, and you can resume.

Finally, a few short remarks:

JamesTheAwesomeDude , Dec 29, 2013 at 16:50

Just curious: wouldn't SIGINT (aka ^C ) be 'politer' than SIGTERM ? – JamesTheAwesomeDude Dec 29 '13 at 16:50

Richard Michael , Dec 29, 2013 at 22:34

I didn't test how the server-side rsync handles SIGINT, so I'm not sure it will keep the partial file - you could check. Note that this doesn't have much to do with Ctrl-c ; it happens that your terminal sends SIGINT to the foreground process when you press Ctrl-c , but the server-side rsync has no controlling terminal. You must log in to the server and use kill . The client-side rsync will not send a message to the server (for example, after the client receives SIGINT via your terminal Ctrl-c ) - might be interesting though. As for anthropomorphizing, not sure what's "politer". :-) – Richard Michael Dec 29 '13 at 22:34

d-b , Feb 3, 2015 at 8:48

I just tried this timeout argument rsync -av --delete --progress --stats --human-readable --checksum --timeout=60 --partial-dir /tmp/rsync/ rsync://$remote:/ /src/ but then it timed out during the "receiving file list" phase (which in this case takes around 30 minutes). Setting the timeout to half an hour so kind of defers the purpose. Any workaround for this? – d-b Feb 3 '15 at 8:48

Cees Timmerman , Sep 15, 2015 at 17:10

@user23122 --checksum reads all data when preparing the file list, which is great for many small files that change often, but should be done on-demand for large files. – Cees Timmerman Sep 15 '15 at 17:10

[Nov 08, 2018] Can rsync resume after being interrupted?

Sep 15, 2012 | unix.stackexchange.com

Tim , Sep 15, 2012 at 23:36

I used rsync to copy a large number of files, but my OS (Ubuntu) restarted unexpectedly.

After reboot, I ran rsync again, but from the output on the terminal, I found that rsync still copied those already copied before. But I heard that rsync is able to find differences between source and destination, and therefore to just copy the differences. So I wonder in my case if rsync can resume what was left last time?

Gilles , Sep 16, 2012 at 1:56

Yes, rsync won't copy again files that it's already copied. There are a few edge cases where its detection can fail. Did it copy all the already-copied files? What options did you use? What were the source and target filesystems? If you run rsync again after it's copied everything, does it copy again? – Gilles Sep 16 '12 at 1:56

Tim , Sep 16, 2012 at 2:30

@Gilles: Thanks! (1) I think I saw rsync copied the same files again from its output on the terminal. (2) Options are same as in my other post, i.e. sudo rsync -azvv /home/path/folder1/ /home/path/folder2 . (3) Source and target are both NTFS, buy source is an external HDD, and target is an internal HDD. (3) It is now running and hasn't finished yet. – Tim Sep 16 '12 at 2:30

jwbensley , Sep 16, 2012 at 16:15

There is also the --partial flag to resume partially transferred files (useful for large files) – jwbensley Sep 16 '12 at 16:15

Tim , Sep 19, 2012 at 5:20

@Gilles: What are some "edge cases where its detection can fail"? – Tim Sep 19 '12 at 5:20

Gilles , Sep 19, 2012 at 9:25

@Tim Off the top of my head, there's at least clock skew, and differences in time resolution (a common issue with FAT filesystems which store times in 2-second increments, the --modify-window option helps with that). – Gilles Sep 19 '12 at 9:25

DanielSmedegaardBuus , Nov 1, 2014 at 12:32

First of all, regarding the "resume" part of your question, --partial just tells the receiving end to keep partially transferred files if the sending end disappears as though they were completely transferred.

While transferring files, they are temporarily saved as hidden files in their target folders (e.g. .TheFileYouAreSending.lRWzDC ), or a specifically chosen folder if you set the --partial-dir switch. When a transfer fails and --partial is not set, this hidden file will remain in the target folder under this cryptic name, but if --partial is set, the file will be renamed to the actual target file name (in this case, TheFileYouAreSending ), even though the file isn't complete. The point is that you can later complete the transfer by running rsync again with either --append or --append-verify .

So, --partial doesn't itself resume a failed or cancelled transfer. To resume it, you'll have to use one of the aforementioned flags on the next run. So, if you need to make sure that the target won't ever contain files that appear to be fine but are actually incomplete, you shouldn't use --partial . Conversely, if you want to make sure you never leave behind stray failed files that are hidden in the target directory, and you know you'll be able to complete the transfer later, --partial is there to help you.

With regards to the --append switch mentioned above, this is the actual "resume" switch, and you can use it whether or not you're also using --partial . Actually, when you're using --append , no temporary files are ever created. Files are written directly to their targets. In this respect, --append gives the same result as --partial on a failed transfer, but without creating those hidden temporary files.

So, to sum up, if you're moving large files and you want the option to resume a cancelled or failed rsync operation from the exact point that rsync stopped, you need to use the --append or --append-verify switch on the next attempt.

As @Alex points out below, since version 3.0.0 rsync now has a new option, --append-verify , which behaves like --append did before that switch existed. You probably always want the behaviour of --append-verify , so check your version with rsync --version . If you're on a Mac and not using rsync from homebrew , you'll (at least up to and including El Capitan) have an older version and need to use --append rather than --append-verify . Why they didn't keep the behaviour on --append and instead named the newcomer --append-no-verify is a bit puzzling. Either way, --append on rsync before version 3 is the same as --append-verify on the newer versions.

--append-verify isn't dangerous: It will always read and compare the data on both ends and not just assume they're equal. It does this using checksums, so it's easy on the network, but it does require reading the shared amount of data on both ends of the wire before it can actually resume the transfer by appending to the target.

Second of all, you said that you "heard that rsync is able to find differences between source and destination, and therefore to just copy the differences."

That's correct, and it's called delta transfer, but it's a different thing. To enable this, you add the -c , or --checksum switch. Once this switch is used, rsync will examine files that exist on both ends of the wire. It does this in chunks, compares the checksums on both ends, and if they differ, it transfers just the differing parts of the file. But, as @Jonathan points out below, the comparison is only done when files are of the same size on both ends -- different sizes will cause rsync to upload the entire file, overwriting the target with the same name.

This requires a bit of computation on both ends initially, but can be extremely efficient at reducing network load if for example you're frequently backing up very large files fixed-size files that often contain minor changes. Examples that come to mind are virtual hard drive image files used in virtual machines or iSCSI targets.

It is notable that if you use --checksum to transfer a batch of files that are completely new to the target system, rsync will still calculate their checksums on the source system before transferring them. Why I do not know :)

So, in short:

If you're often using rsync to just "move stuff from A to B" and want the option to cancel that operation and later resume it, don't use --checksum , but do use --append-verify .

If you're using rsync to back up stuff often, using --append-verify probably won't do much for you, unless you're in the habit of sending large files that continuously grow in size but are rarely modified once written. As a bonus tip, if you're backing up to storage that supports snapshotting such as btrfs or zfs , adding the --inplace switch will help you reduce snapshot sizes since changed files aren't recreated but rather the changed blocks are written directly over the old ones. This switch is also useful if you want to avoid rsync creating copies of files on the target when only minor changes have occurred.

When using --append-verify , rsync will behave just like it always does on all files that are the same size. If they differ in modification or other timestamps, it will overwrite the target with the source without scrutinizing those files further. --checksum will compare the contents (checksums) of every file pair of identical name and size.

UPDATED 2015-09-01 Changed to reflect points made by @Alex (thanks!)

UPDATED 2017-07-14 Changed to reflect points made by @Jonathan (thanks!)

Alex , Aug 28, 2015 at 3:49

According to the documentation --append does not check the data, but --append-verify does. Also, as @gaoithe points out in a comment below, the documentation claims --partial does resume from previous files. – Alex Aug 28 '15 at 3:49

DanielSmedegaardBuus , Sep 1, 2015 at 13:29

Thank you @Alex for the updates. Indeed, since 3.0.0, --append no longer compares the source to the target file before appending. Quite important, really! --partial does not itself resume a failed file transfer, but rather leaves it there for a subsequent --append(-verify) to append to it. My answer was clearly misrepresenting this fact; I'll update it to include these points! Thanks a lot :) – DanielSmedegaardBuus Sep 1 '15 at 13:29

Cees Timmerman , Sep 15, 2015 at 17:21

This says --partial is enough. – Cees Timmerman Sep 15 '15 at 17:21

DanielSmedegaardBuus , May 10, 2016 at 19:31

@CMCDragonkai Actually, check out Alexander's answer below about --partial-dir -- looks like it's the perfect bullet for this. I may have missed something entirely ;) – DanielSmedegaardBuus May 10 '16 at 19:31

Jonathan Y. , Jun 14, 2017 at 5:48

What's your level of confidence in the described behavior of --checksum ? According to the man it has more to do with deciding which files to flag for transfer than with delta-transfer (which, presumably, is rsync 's default behavior). – Jonathan Y. Jun 14 '17 at 5:48

[Jul 05, 2018] Can rsync resume after being interrupted

Notable quotes:
"... as if it were successfully transferred ..."
Jul 05, 2018 | unix.stackexchange.com

Tim ,Sep 15, 2012 at 23:36

I used rsync to copy a large number of files, but my OS (Ubuntu) restarted unexpectedly.

After reboot, I ran rsync again, but from the output on the terminal, I found that rsync still copied those already copied before. But I heard that rsync is able to find differences between source and destination, and therefore to just copy the differences. So I wonder in my case if rsync can resume what was left last time?

Gilles ,Sep 16, 2012 at 1:56

Yes, rsync won't copy again files that it's already copied. There are a few edge cases where its detection can fail. Did it copy all the already-copied files? What options did you use? What were the source and target filesystems? If you run rsync again after it's copied everything, does it copy again? – Gilles Sep 16 '12 at 1:56

Tim ,Sep 16, 2012 at 2:30

@Gilles: Thanks! (1) I think I saw rsync copied the same files again from its output on the terminal. (2) Options are same as in my other post, i.e. sudo rsync -azvv /home/path/folder1/ /home/path/folder2 . (3) Source and target are both NTFS, buy source is an external HDD, and target is an internal HDD. (3) It is now running and hasn't finished yet. – Tim Sep 16 '12 at 2:30

jwbensley ,Sep 16, 2012 at 16:15

There is also the --partial flag to resume partially transferred files (useful for large files) – jwbensley Sep 16 '12 at 16:15

Tim ,Sep 19, 2012 at 5:20

@Gilles: What are some "edge cases where its detection can fail"? – Tim Sep 19 '12 at 5:20

Gilles ,Sep 19, 2012 at 9:25

@Tim Off the top of my head, there's at least clock skew, and differences in time resolution (a common issue with FAT filesystems which store times in 2-second increments, the --modify-window option helps with that). – Gilles Sep 19 '12 at 9:25

DanielSmedegaardBuus ,Nov 1, 2014 at 12:32

First of all, regarding the "resume" part of your question, --partial just tells the receiving end to keep partially transferred files if the sending end disappears as though they were completely transferred.

While transferring files, they are temporarily saved as hidden files in their target folders (e.g. .TheFileYouAreSending.lRWzDC ), or a specifically chosen folder if you set the --partial-dir switch. When a transfer fails and --partial is not set, this hidden file will remain in the target folder under this cryptic name, but if --partial is set, the file will be renamed to the actual target file name (in this case, TheFileYouAreSending ), even though the file isn't complete. The point is that you can later complete the transfer by running rsync again with either --append or --append-verify .

So, --partial doesn't itself resume a failed or cancelled transfer. To resume it, you'll have to use one of the aforementioned flags on the next run. So, if you need to make sure that the target won't ever contain files that appear to be fine but are actually incomplete, you shouldn't use --partial . Conversely, if you want to make sure you never leave behind stray failed files that are hidden in the target directory, and you know you'll be able to complete the transfer later, --partial is there to help you.

With regards to the --append switch mentioned above, this is the actual "resume" switch, and you can use it whether or not you're also using --partial . Actually, when you're using --append , no temporary files are ever created. Files are written directly to their targets. In this respect, --append gives the same result as --partial on a failed transfer, but without creating those hidden temporary files.

So, to sum up, if you're moving large files and you want the option to resume a cancelled or failed rsync operation from the exact point that rsync stopped, you need to use the --append or --append-verify switch on the next attempt.

As @Alex points out below, since version 3.0.0 rsync now has a new option, --append-verify , which behaves like --append did before that switch existed. You probably always want the behaviour of --append-verify , so check your version with rsync --version . If you're on a Mac and not using rsync from homebrew , you'll (at least up to and including El Capitan) have an older version and need to use --append rather than --append-verify . Why they didn't keep the behaviour on --append and instead named the newcomer --append-no-verify is a bit puzzling. Either way, --append on rsync before version 3 is the same as --append-verify on the newer versions.

--append-verify isn't dangerous: It will always read and compare the data on both ends and not just assume they're equal. It does this using checksums, so it's easy on the network, but it does require reading the shared amount of data on both ends of the wire before it can actually resume the transfer by appending to the target.

Second of all, you said that you "heard that rsync is able to find differences between source and destination, and therefore to just copy the differences."

That's correct, and it's called delta transfer, but it's a different thing. To enable this, you add the -c , or --checksum switch. Once this switch is used, rsync will examine files that exist on both ends of the wire. It does this in chunks, compares the checksums on both ends, and if they differ, it transfers just the differing parts of the file. But, as @Jonathan points out below, the comparison is only done when files are of the same size on both ends -- different sizes will cause rsync to upload the entire file, overwriting the target with the same name.

This requires a bit of computation on both ends initially, but can be extremely efficient at reducing network load if for example you're frequently backing up very large files fixed-size files that often contain minor changes. Examples that come to mind are virtual hard drive image files used in virtual machines or iSCSI targets.

It is notable that if you use --checksum to transfer a batch of files that are completely new to the target system, rsync will still calculate their checksums on the source system before transferring them. Why I do not know :)

So, in short:

If you're often using rsync to just "move stuff from A to B" and want the option to cancel that operation and later resume it, don't use --checksum , but do use --append-verify .

If you're using rsync to back up stuff often, using --append-verify probably won't do much for you, unless you're in the habit of sending large files that continuously grow in size but are rarely modified once written. As a bonus tip, if you're backing up to storage that supports snapshotting such as btrfs or zfs , adding the --inplace switch will help you reduce snapshot sizes since changed files aren't recreated but rather the changed blocks are written directly over the old ones. This switch is also useful if you want to avoid rsync creating copies of files on the target when only minor changes have occurred.

When using --append-verify , rsync will behave just like it always does on all files that are the same size. If they differ in modification or other timestamps, it will overwrite the target with the source without scrutinizing those files further. --checksum will compare the contents (checksums) of every file pair of identical name and size.

UPDATED 2015-09-01 Changed to reflect points made by @Alex (thanks!)

UPDATED 2017-07-14 Changed to reflect points made by @Jonathan (thanks!)

Alex ,Aug 28, 2015 at 3:49

According to the documentation --append does not check the data, but --append-verify does. Also, as @gaoithe points out in a comment below, the documentation claims --partial does resume from previous files. – Alex Aug 28 '15 at 3:49

DanielSmedegaardBuus ,Sep 1, 2015 at 13:29

Thank you @Alex for the updates. Indeed, since 3.0.0, --append no longer compares the source to the target file before appending. Quite important, really! --partial does not itself resume a failed file transfer, but rather leaves it there for a subsequent --append(-verify) to append to it. My answer was clearly misrepresenting this fact; I'll update it to include these points! Thanks a lot :) – DanielSmedegaardBuus Sep 1 '15 at 13:29

Cees Timmerman ,Sep 15, 2015 at 17:21

This says --partial is enough. – Cees Timmerman Sep 15 '15 at 17:21

DanielSmedegaardBuus ,May 10, 2016 at 19:31

@CMCDragonkai Actually, check out Alexander's answer below about --partial-dir -- looks like it's the perfect bullet for this. I may have missed something entirely ;) – DanielSmedegaardBuus May 10 '16 at 19:31

Jonathan Y. ,Jun 14, 2017 at 5:48

What's your level of confidence in the described behavior of --checksum ? According to the man it has more to do with deciding which files to flag for transfer than with delta-transfer (which, presumably, is rsync 's default behavior). – Jonathan Y. Jun 14 '17 at 5:48

Alexander O'Mara ,Jan 3, 2016 at 6:34

TL;DR:

Just specify a partial directory as the rsync man pages recommends:

--partial-dir=.rsync-partial

Longer explanation:

There is actually a built-in feature for doing this using the --partial-dir option, which has several advantages over the --partial and --append-verify / --append alternative.

Excerpt from the rsync man pages:
--partial-dir=DIR
      A  better way to keep partial files than the --partial option is
      to specify a DIR that will be used  to  hold  the  partial  data
      (instead  of  writing  it  out to the destination file).  On the
      next transfer, rsync will use a file found in this dir  as  data
      to  speed  up  the resumption of the transfer and then delete it
      after it has served its purpose.

      Note that if --whole-file is specified (or  implied),  any  par-
      tial-dir  file  that  is  found for a file that is being updated
      will simply be removed (since rsync  is  sending  files  without
      using rsync's delta-transfer algorithm).

      Rsync will create the DIR if it is missing (just the last dir --
      not the whole path).  This makes it easy to use a relative  path
      (such  as  "--partial-dir=.rsync-partial")  to have rsync create
      the partial-directory in the destination file's  directory  when
      needed,  and  then  remove  it  again  when  the partial file is
      deleted.

      If the partial-dir value is not an absolute path, rsync will add
      an  exclude rule at the end of all your existing excludes.  This
      will prevent the sending of any partial-dir files that may exist
      on the sending side, and will also prevent the untimely deletion
      of partial-dir items on the receiving  side.   An  example:  the
      above  --partial-dir  option would add the equivalent of "-f '-p
      .rsync-partial/'" at the end of any other filter rules.

By default, rsync uses a random temporary file name which gets deleted when a transfer fails. As mentioned, using --partial you can make rsync keep the incomplete file as if it were successfully transferred , so that it is possible to later append to it using the --append-verify / --append options. However there are several reasons this is sub-optimal.

  1. Your backup files may not be complete, and without checking the remote file which must still be unaltered, there's no way to know.
  2. If you are attempting to use --backup and --backup-dir , you've just added a new version of this file that never even exited before to your version history.

However if we use --partial-dir , rsync will preserve the temporary partial file, and resume downloading using that partial file next time you run it, and we do not suffer from the above issues.

trs ,Apr 7, 2017 at 0:00

This is really the answer. Hey everyone, LOOK HERE!! – trs Apr 7 '17 at 0:00

JKOlaf ,Jun 28, 2017 at 0:11

I agree this is a much more concise answer to the question. the TL;DR: is perfect and for those that need more can read the longer bit. Strong work. – JKOlaf Jun 28 '17 at 0:11

N2O ,Jul 29, 2014 at 18:24

You may want to add the -P option to your command.

From the man page:

--partial By default, rsync will delete any partially transferred file if the transfer
         is interrupted. In some circumstances it is more desirable to keep partially
         transferred files. Using the --partial option tells rsync to keep the partial
         file which should make a subsequent transfer of the rest of the file much faster.

  -P     The -P option is equivalent to --partial --progress.   Its  pur-
         pose  is to make it much easier to specify these two options for
         a long transfer that may be interrupted.

So instead of:

sudo rsync -azvv /home/path/folder1/ /home/path/folder2

Do:

sudo rsync -azvvP /home/path/folder1/ /home/path/folder2

Of course, if you don't want the progress updates, you can just use --partial , i.e.:

sudo rsync --partial -azvv /home/path/folder1/ /home/path/folder2

gaoithe ,Aug 19, 2015 at 11:29

@Flimm not quite correct. If there is an interruption (network or receiving side) then when using --partial the partial file is kept AND it is used when rsync is resumed. From the manpage: "Using the --partial option tells rsync to keep the partial file which should <b>make a subsequent transfer of the rest of the file much faster</b>." – gaoithe Aug 19 '15 at 11:29

DanielSmedegaardBuus ,Sep 1, 2015 at 14:11

@Flimm and @gaoithe, my answer wasn't quite accurate, and definitely not up-to-date. I've updated it to reflect version 3 + of rsync . It's important to stress, though, that --partial does not itself resume a failed transfer. See my answer for details :) – DanielSmedegaardBuus Sep 1 '15 at 14:11

guettli ,Nov 18, 2015 at 12:28

@DanielSmedegaardBuus I tried it and the -P is enough in my case. Versions: client has 3.1.0 and server has 3.1.1. I interrupted the transfer of a single large file with ctrl-c. I guess I am missing something. – guettli Nov 18 '15 at 12:28

Yadunandana ,Sep 16, 2012 at 16:07

I think you are forcibly calling the rsync and hence all data is getting downloaded when you recall it again. use --progress option to copy only those files which are not copied and --delete option to delete any files if already copied and now it does not exist in source folder...
rsync -avz --progress --delete -e  /home/path/folder1/ /home/path/folder2

If you are using ssh to login to other system and copy the files,

rsync -avz --progress --delete -e "ssh -o UserKnownHostsFile=/dev/null -o \
StrictHostKeyChecking=no" /home/path/folder1/ /home/path/folder2

let me know if there is any mistake in my understanding of this concept...

Fabien ,Jun 14, 2013 at 12:12

Can you please edit your answer and explain what your special ssh call does, and why you advice to do it? – Fabien Jun 14 '13 at 12:12

DanielSmedegaardBuus ,Dec 7, 2014 at 0:12

@Fabien He tells rsync to set two ssh options (rsync uses ssh to connect). The second one tells ssh to not prompt for confirmation if the host he's connecting to isn't already known (by existing in the "known hosts" file). The first one tells ssh to not use the default known hosts file (which would be ~/.ssh/known_hosts). He uses /dev/null instead, which is of course always empty, and as ssh would then not find the host in there, it would normally prompt for confirmation, hence option two. Upon connecting, ssh writes the now known host to /dev/null, effectively forgetting it instantly :) – DanielSmedegaardBuus Dec 7 '14 at 0:12

DanielSmedegaardBuus ,Dec 7, 2014 at 0:23

...but you were probably wondering what effect, if any, it has on the rsync operation itself. The answer is none. It only serves to not have the host you're connecting to added to your SSH known hosts file. Perhaps he's a sysadmin often connecting to a great number of new servers, temporary systems or whatnot. I don't know :) – DanielSmedegaardBuus Dec 7 '14 at 0:23

moi ,May 10, 2016 at 13:49

"use --progress option to copy only those files which are not copied" What? – moi May 10 '16 at 13:49

Paul d'Aoust ,Nov 17, 2016 at 22:39

There are a couple errors here; one is very serious: --delete will delete files in the destination that don't exist in the source. The less serious one is that --progress doesn't modify how things are copied; it just gives you a progress report on each file as it copies. (I fixed the serious error; replaced it with --remove-source-files .) – Paul d'Aoust Nov 17 '16 at 22:39

[Jun 13, 2018] parsync - a parallel rsync wrapper for large data transfers by Harry Mangalam

Jan 22, 2017 | nac.uci.edu

[email protected]

[email protected]

v1.67 (Mac Beta) Table of Contents

  1. Download
  2. Dependencies
  3. Overview
  4. parsync help

1. Download

If you already know you want it, get it here: parsync+utils.tar.gz (contains parsync plus the kdirstat-cache-writer , stats , and scut utilities below) Extract it into a dir on your $PATH and after verifying the other dependencies below, give it a shot.

While parsync is developed for and test on Linux, the latest version of parsync has been modified to (mostly) work on the Mac (tested on OSX 10.9.5). A number of the Linux-specific dependencies have been removed and there are a number of Mac-specific work arounds.

Thanks to Phil Reese < [email protected] > for the code mods needed to get it started. It's the same package and instructions for both platforms.

2. Dependencies

parsync requires the following utilities to work:

non-default Perl utility: URI::Escape qw(uri_escape)
sudo yum install perl-URI  # CentOS-like

sudo apt-get install liburi-perl  # Debian-like
parsync needs to be installed only on the SOURCE end of the transfer and uses whatever rsync is available on the TARGET. It uses a number of Linux- specific utilities so if you're transferring between Linux and a FreeBSD host, install parsync on the Linux side. In fact, as currently written, it will only PUSH data to remote targets ; it will not pull data as rsync itself can do. This will probably in the near future. 3. Overview rsync is a fabulous data mover. Possibly more bytes have been moved (or have been prevented from being moved) by rsync than by any other application. So what's not to love? For transferring large, deep file trees, rsync will pause while it generates lists of files to process. Since Version 3, it does this pretty fast, but on sluggish filesystems, it can take hours or even days before it will start to actually exchange rsync data. Second, due to various bottlenecks, rsync will tend to use less than the available bandwidth on high speed networks. Starting multiple instances of rsync can improve this significantly. However, on such transfers, it is also easy to overload the available bandwidth, so it would be nice to both limit the bandwidth used if necessary and also to limit the load on the system. parsync tries to satisfy all these conditions and more by:
Important Only use for LARGE data transfers The main use case for parsync is really only very large data transfers thru fairly fast network connections (>1Gb/s). Below this speed, a single rsync can saturate the connection, so there's little reason to use parsync and in fact the overhead of testing the existence of and starting more rsyncs tends to worsen its performance on small transfers to slightly less than rsync alone.
Beyond this introduction, parsync's internal help is about all you'll need to figure out how to use it; below is what you'll see when you type parsync -h . There are still edge cases where parsync will fail or behave oddly, especially with small data transfers, so I'd be happy to hear of such misbehavior or suggestions to improve it. Download the complete tarball of parsync, plus the required utilities here: parsync+utils.tar.gz Unpack it, move the contents to a dir on your $PATH , chmod it executable, and try it out.
parsync --help
or just
parsync
Below is what you should see:

4. parsync help

parsync version 1.67 (Mac compatibility beta) Jan 22, 2017
by Harry Mangalam <[email protected]> || <[email protected]>

parsync is a Perl script that wraps Andrew Tridgell's miraculous 'rsync' to
provide some load balancing and parallel operation across network connections
to increase the amount of bandwidth it can use.

parsync is primarily tested on Linux, but (mostly) works on MaccOSX
as well.

parsync needs to be installed only on the SOURCE end of the
transfer and only works in local SOURCE -> remote TARGET mode
(it won't allow remote local SOURCE <- remote TARGET, emitting an
error and exiting if attempted).

It uses whatever rsync is available on the TARGET.  It uses a number
of Linux-specific utilities so if you're transferring between Linux
and a FreeBSD host, install parsync on the Linux side.

The only native rsync option that parsync uses is '-a' (archive) &
'-s' (respect bizarro characters in filenames).
If you need more, then it's up to you to provide them via
'--rsyncopts'. parsync checks to see if the current system load is
too heavy and tries to throttle the rsyncs during the run by
monitoring and suspending / continuing them as needed.

It uses the very efficient (also Perl-based) kdirstat-cache-writer
from kdirstat to generate lists of files which are summed and then
crudely divided into NP jobs by size.

It appropriates rsync's bandwidth throttle mechanism, using '--maxbw'
as a passthru to rsync's 'bwlimit' option, but divides it by NP so
as to keep the total bw the same as the stated limit.  It monitors and
shows network bandwidth, but can't change the bw allocation mid-job.
It can only suspend rsyncs until the load decreases below the cutoff.
If you suspend parsync (^Z), all rsync children will suspend as well,
regardless of current state.

Unless changed by '--interface', it tried to figure out how to set the
interface to monitor.  The transfer will use whatever interface routing
provides, normally set by the name of the target.  It can also be used for
non-host-based transfers (between mounted filesystems) but the network
bandwidth continues to be (usually pointlessly) shown.

[[NB: Between mounted filesystems, parsync sometimes works very poorly for
reasons still mysterious.  In such cases (monitor with 'ifstat'), use 'cp'
or 'tnc' (https://goo.gl/5FiSxR) for the initial data movement and a single
rsync to finalize.  I believe the multiple rsync chatter is interfering with
the transfer.]]

It only works on dirs and files that originate from the current dir (or
specified via "--rootdir").  You cannot include dirs and files from
discontinuous or higher-level dirs.

** the ~/.parsync files **
The ~/.parsync dir contains the cache (*.gz), the chunk files (kds*), and the
time-stamped log files. The cache files can be re-used with '--reusecache'
(which will re-use ALL the cache and chunk files.  The log files are
datestamped and are NOT overwritten.

** Odd characters in names **
parsync will sometimes refuse to transfer some oddly named files, altho
recent versions of rsync allow the '-s' flag (now a parsync default)
which tries to respect names with spaces and properly escaped shell
characters.  Filenames with embedded newlines, DOS EOLs, and other
odd chars will be recorded in the log files in the ~/.parsync dir.

** Because of the crude way that files are chunked, NP may be
adjusted slightly to match the file chunks. ie '--NP 8' -> '--NP 7'.
If so, a warning will be issued and the rest of the transfer will be
automatically adjusted.

OPTIONS
=======
[i] = integer number
[f] = floating point number
[s] = "quoted string"
( ) = the default if any

--NP [i] (sqrt(#CPUs)) ...............  number of rsync processes to start
      optimal NP depends on many vars.  Try the default and incr as needed
--startdir [s] (`pwd`)  .. the directory it works relative to. If you omit
                           it, the default is the CURRENT dir. You DO have
                           to specify target dirs.  See the examples below.
--maxbw [i] (unlimited) ..........  in KB/s max bandwidth to use (--bwlimit
       passthru to rsync).  maxbw is the total BW to be used, NOT per rsync.
--maxload [f] (NP+2)  ........ max total system load - if sysload > maxload,
                                               sleeps an rsync proc for 10s
--checkperiod [i] (5) .......... sets the period in seconds between updates
--rsyncopts [s]  ...  options passed to rsync as a quoted string (CAREFUL!)
           this opt triggers a pause before executing to verify the command.
--interface [s]  .............  network interface to /monitor/, not nec use.
      default: `/sbin/route -n | grep "^0.0.0.0" | rev | cut -d' ' -f1 | rev`
      above works on most simple hosts, but complex routes will confuse it.
--reusecache  ..........  don't re-read the dirs; re-use the existing caches
--email [s]  .....................  email address to send completion message
                                      (requires working mail system on host)
--barefiles   .....  set to allow rsync of individual files, as oppo to dirs
--nowait  ................  for scripting, sleep for a few s instead of wait
--version  .................................  dumps version string and exits
--help  .........................................................  this help

Examples
========
-- Good example 1 --
% parsync  --maxload=5.5 --NP=4 --startdir='/home/hjm' dir1 dir2 dir3
hjm@remotehost:~/backups

where
  = "--startdir='/home/hjm'" sets the working dir of this operation to
      '/home/hjm' and dir1 dir2 dir3 are subdirs from '/home/hjm'
  = the target "hjm@remotehost:~/backups" is the same target rsync would use
  = "--NP=4" forks 4 instances of rsync
  = -"-maxload=5.5" will start suspending rsync instances when the 5m system
      load gets to 5.5 and then unsuspending them when it goes below it.

  It uses 4 instances to rsync dir1 dir2 dir3 to hjm@remotehost:~/backups

-- Good example 2 --
% parsync --rsyncopts="--ignore-existing" --reusecache  --NP=3
  --barefiles  *.txt   /mount/backups/txt

where
  =  "--rsyncopts='--ignore-existing'" is an option passed thru to rsync
     telling it not to disturb any existing files in the target directory.
  = "--reusecache" indicates that the filecache shouldn't be re-generated,
    uses the previous filecache in ~/.parsync
  = "--NP=3" for 3 copies of rsync (with no "--maxload", the default is 4)
  = "--barefiles" indicates that it's OK to transfer barefiles instead of
    recursing thru dirs.
  = "/mount/backups/txt" is the target - a local disk mount instead of a network host.

  It uses 3 instances to rsync *.txt from the current dir to "/mount/backups/txt".

-- Error Example 1 --
% pwd
/home/hjm  # executing parsync from here

% parsync --NP4 --compress /usr/local  /media/backupdisk

why this is an error:
  = '--NP4' is not an option (parsync will say "Unknown option: np4")
    It should be '--NP=4'
  = if you were trying to rsync '/usr/local' to '/media/backupdisk',
    it will fail since there is no /home/hjm/usr/local dir to use as
    a source. This will be shown in the log files in
    ~/.parsync/rsync-logfile-<datestamp>_#
    as a spew of "No such file or directory (2)" errors
  = the '--compress' is a native rsync option, not a native parsync option.
    You have to pass it to rsync with "--rsyncopts='--compress'"

The correct version of the above command is:

% parsync --NP=4  --rsyncopts='--compress' --startdir=/usr  local
/media/backupdisk

-- Error Example 2 --
% parsync --start-dir /home/hjm  mooslocal  [email protected]:/usr/local

why this is an error:
  = this command is trying to PULL data from a remote SOURCE to a
    local TARGET.  parsync doesn't support that kind of operation yet.

The correct version of the above command is:

# ssh to hjm@moo, install parsync, then:
% parsync  --startdir=/usr  local  hjm@remote:/home/hjm/mooslocal

[Jun 03, 2018] What is the best way to transfer a single large file over a high-speed, high-latency WAN link

Notable quotes:
"... I've been dealing with a similar situation, with ~200GB of SQL .bak, except the only way I've been able to get the WAN link to saturate is with FTP. I ended up using 7-zip with zero compression to break it into 512MB chunks. ..."
Jun 03, 2018 | serverfault.com

This looks related to this one , but it's somewhat different.

There is this WAN link between two company sites, and we need to transfer a single very large file (Oracle dump, ~160 GB).

We've got full 100 Mbps bandwidth (tested), but looks like a single TCP connection just can't max it out due to how TCP works (ACKs, etc.). We tested the link with iperf , and results change dramatically when increasing the TCP Window Size: with base settings we get ~5 Mbps throughput, with a bigger WS we can get up to ~45 Mbps, but not any more than that. The network latency is around 10 ms.

Out of curiosity, we ran iperf using more than a single connections, and we found that, when running four of them, they would indeed achieve a speed of ~25 Mbps each, filling up all the available bandwidth; so the key looks to be in running multiple simultaneous transfers.

With FTP, things get worse: even with optimized TCP settings (high Window Size, max MTU, etc.) we can't get more than 20 Mbps on a single transfer. We tried FTPing some big files at the same time, and indeed things got a lot better than when transferring a single one; but then the culprit became disk I/O, because reading and writing four big files from the same disk bottlenecks very soon; also, we don't seem to be able to split that single large file into smaller ones and then merge it back, at least not in acceptable times (obviously we can't spend splicing/merging back the file a time comparable to that of transferring it).

The ideal solution here would be a multithreaded tool that could transfer various chunks of the file at the same time; sort of like peer-to-peer programs like eMule or BitTorrent already do, but from a single source to a single destination. Ideally, the tool would allow us to choose how many parallel connections to use, and of course optimize disk I/O to not jump (too) madly between various sections of the file.

Does anyone know of such a tool?

Or, can anyone suggest a better solution and/or something we already didn't try?

P.S. We already thought of backing that up to tape/disk and physically sending it to destination; that would be our extreme measure if WAN just doesn't cut it, but, as A.S. Tanenbaum said, "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway." networking bandwidth tcp file-transfer share edited Apr 13 '17 at 12:14 Community ♦ 1 asked Feb 11 '10 at 7:19 Massimo 50.9k 36 157 269 locked by Tom O'Connor Aug 21 '13 at 9:15

This post has been locked due to the high amount of off-topic comments generated. For extended discussions, please use chat .

dbush 148 8 answered Feb 11 '10

Searching for "high latency file transfer" brings up a lot of interesting hits. Clearly, this is a problem that both the CompSci community and the commercial community has put thougth into.

A few commercial offerings that appear to fit the bill:

In the open-source world, the uftp project looks promising. You don't particularly need its multicast capabilities, but the basic idea of blasting out a file to receivers, receiving NAKs for missed blocks at the end of the transfer, and then blasting out the NAK'd blocks (lather, rinse, repeat) sounds like it would do what you need, since there's no ACK'ing (or NAK'ing) from the receiver until after the file transfer has completed once. Assuming the network is just latent, and not lossy, this might do what you need, too.

Evan Anderson 133k 13 165 306

add a comment | up vote 9 down vote Really odd suggestion this one.. Set up a simple web server to host the file on your network (I suggest nginx, incidentally), then set up a pc with firefox on the other end, and install the DownThemAll extension.

It's a download accelerator that supports chunking and re-assembly.
You can break each download into 10 chunks for re-assembly, and it does actually make things quicker!

(caveat: I've never tried it on anything as big as 160GB, but it does work well with 20GB iso files) share answered Feb 11 '10 at 8:23 Tom O'Connor 24.5k 8 60 137

add a comment | up vote 7 down vote The UDT transport is probably the most popular transport for high latency communications. This leads onto their other software called Sector/Sphere a "High Performance Distributed File System and Parallel Data Processing Engine" which might be worthwhile to have a look at. share answered Mar 18 '11 at 3:21 Steve-o 764 4 11 add a comment | up vote 5 down vote My answer is a bit late, but I just found this question, while looking for fasp. During that search I also found this : http://tsunami-udp.sourceforge.net/ , the "Tsunami UDP Protocol".

From their website :

A fast user-space file transfer protocol that uses TCP control and UDP data for transfer over very high speed long distance networks (≥ 1 Gbps and even 10 GE), designed to provide more throughput than possible with TCP over the same networks.the same networks.

As far as speed goes, the page mentions this result (using a link between Helsinki, Finland to Bonn, Germany over a 1GBit link:

Figure 1 - international transfer over the Internet, averaging 800 Mbit/second

If you want to use a download accelerator, have a look at lftp , this is the only download accelerator that can do a recursive mirror, as far as I know. share answered Jun 24 '10 at 20:59 Jan van Haarst 51 1 3

add a comment | up vote 3 down vote The bbcp utility from the very relevant page 'How to transfer large amounts of data via network' seems to be the simplest solution. share answered Jun 27 '12 at 12:23 Robert Polson 31 1 add a comment | protected by Community ♦ Aug 21 '13 at 9:14

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count ).

Would you like to answer one of these unanswered questions instead?

[Jun 02, 2018] Low-latency continuous rsync

Notable quotes:
"... Low-latency continuous rsync ..."
Jun 02, 2018 | www.danplanet.com

Right Angles

Okay, so " lowish -latency" would be more appropriate.

I regularly work on systems that are fairly distant, over relatively high-latency links. That means that I don't want to run my editor there because 300ms between pressing a key and seeing it show up is maddening. Further, with something as large as the Linux kernel, editor integration with cscope is a huge time saver and pushing enough configuration to do that on each box I work on is annoying. Lately, the speed of the notebook I'm working from often outpaces that of the supposedly-fast machine I'm working on. For many tasks, a four-core, two threads per core, 10GB RAM laptop with an Intel SSD will smoke a 4GHz PowerPC LPAR with 2GB RAM.

I don't really want to go to the trouble of cross-compiling the kernels on my laptop, so that's the only piece I want to do remotely. Thus, I want to have high-speed access to the tree I'm working on from my local disk for editing, grep'ing, and cscope'ing. But, I want the changes to be synchronized (without introducing any user-perceived delay) to the distant machine in the background for when I'm ready to compile. Ideally, this would be some sort of rsync-like tool that uses inotify to notice changes and keep them synchronized to the remote machine over a persistent connection. However, I know of no such tool and haven't been sufficiently annoyed to sit down and write one.

One can, however, achieve a reasonable approximation of this by gluing existing components together. The inotifywait tool from the inotify-tools provides a way to watch a directory and spit out a live list of changed files without much effort. Of course, rsync can handle the syncing for you, but not with a persistent connection. This script mostly does what I want:

#!/bin/bash

DEST="$1"

if [ -z "$DEST" ]; then exit 1; fi

inotifywait -r -m -e close_write --format '%w%f' . |\
while read file
do
        echo $file
        rsync -azvq $file ${DEST}/$file
        echo -n 'Completed at '
        date
done

That will monitor the local directory and synchronize it to the remote host every time a file changes. I run it like this:

sync.sh [email protected]:my-kernel-tree/

It's horribly inefficient of course, but it does the job. The latency for edits to show up on the other end, although not intolerable, is higher than I'd like. The boxes I'm working on these days are in Minnesota, and I have to access them over a VPN which terminates in New York. That means packets leave Portland for Seattle, jump over to Denver, Chicago, Washington DC, then up to New York before they bounce back to Minnesota. Initiating an SSH connection every time the script synchronizes a file requires some chatting back and forth over that link, and thus is fairly slow.

Looking at how I might reduce the setup time for the SSH links, I stumbled across an incredibly cool feature available in recent versions of OpenSSH: connection multiplexing. With this enabled, you pay the high setup cost only the first time you connect to a host. Subsequent connections re-use the same tunnel as the first one, making the process nearly instant. To get this enabled for just the host I'm using, I added this to my ~/.ssh/config file:

Host myhost.domain.com
    ControlMaster auto
    ControlPath /tmp/%h%p%r

Now, all I do is ssh to the box each time I boot it (which I would do anyway) and the sync.sh script from above re-uses that connection for file synchronization. It's still not the same as a shared filesystem, but it's pretty dang close, especially for a few lines of config and shell scripting. Kernel development on these distant boxes is now much less painful. Category(s): Codemonkeying
Tags: inotify , linux , rsync The beauty of automated builds Field Day 2012 4 Responses to Low-latency continuous rsync

  1. Christof Schmitt says: June 25, 2012 at 22:06 This is a great approach. I had the same problem when editing code locally and testing the changes on a remote system. Thanks for sharing, i will give it a try.
  2. Callum says: May 12, 2013 at 15:02 Are you familiar with lsyncd? I think it might do exactly what you want but potentially more easily. It uses inotify or libnotify or something or other to watch a local directory, and then pushes changes every X seconds to a remote host. It's pretty powerful and can even be setup to sync mv commands with a remote ssh mv instead of rsync which can be expensive. It's fairly neat in theory, although I've never used it in practice myself.
  3. johan says: March 29, 2015 at 16:44 Have you tried mosh? It's different protocol from ssh, more suited to your use-case.

    https://mosh.mit.edu/

    Since it's different approach to solving your problem, it has different pros and cons. E.g. jumping and instant searching would still be slow. It's effectively trying to hide the problem by being a bit more intelligent. (It does this by using UDP, previewing keystrokes, robust reconnection, and only updating visible screen so as to avoid freezes due to 'cat my_humongous_log.txt'.)

    -- -- -- -- -- –

    (copy paste)

    Mosh
    (mobile shell)

    Remote terminal application that allows roaming, supports intermittent connectivity, and provides intelligent local echo and line editing of user keystrokes.

    Mosh is a replacement for SSH. It's more robust and responsive, especially over Wi-Fi, cellular, and long-distance links.

[Jun 02, 2018] Parallelise rsync using GNU Parallel

Jun 02, 2018 | unix.stackexchange.com

up vote 7 down vote favorite 4


Mandar Shinde ,Mar 13, 2015 at 6:51

I have been using a rsync script to synchronize data at one host with the data at another host. The data has numerous small-sized files that contribute to almost 1.2TB.

In order to sync those files, I have been using rsync command as follows:

rsync -avzm --stats --human-readable --include-from proj.lst /data/projects REMOTEHOST:/data/

The contents of proj.lst are as follows:

+ proj1
+ proj1/*
+ proj1/*/*
+ proj1/*/*/*.tar
+ proj1/*/*/*.pdf
+ proj2
+ proj2/*
+ proj2/*/*
+ proj2/*/*/*.tar
+ proj2/*/*/*.pdf
...
...
...
- *

As a test, I picked up two of those projects (8.5GB of data) and I executed the command above. Being a sequential process, it tool 14 minutes 58 seconds to complete. So, for 1.2TB of data it would take several hours.

If I would could multiple rsync processes in parallel (using & , xargs or parallel ), it would save my time.

I tried with below command with parallel (after cd ing to source directory) and it took 12 minutes 37 seconds to execute:

parallel --will-cite -j 5 rsync -avzm --stats --human-readable {} REMOTEHOST:/data/ ::: .

This should have taken 5 times less time, but it didn't. I think, I'm going wrong somewhere.

How can I run multiple rsync processes in order to reduce the execution time?

Ole Tange ,Mar 13, 2015 at 7:25

Are you limited by network bandwidth? Disk iops? Disk bandwidth? – Ole Tange Mar 13 '15 at 7:25

Mandar Shinde ,Mar 13, 2015 at 7:32

If possible, we would want to use 50% of total bandwidth. But, parallelising multiple rsync s is our first priority. – Mandar Shinde Mar 13 '15 at 7:32

Ole Tange ,Mar 13, 2015 at 7:41

Can you let us know your: Network bandwidth, disk iops, disk bandwidth, and the bandwidth actually used? – Ole Tange Mar 13 '15 at 7:41

Mandar Shinde ,Mar 13, 2015 at 7:47

In fact, I do not know about above parameters. For the time being, we can neglect the optimization part. Multiple rsync s in parallel is the primary focus now. – Mandar Shinde Mar 13 '15 at 7:47

Mandar Shinde ,Apr 11, 2015 at 13:53

Following steps did the job for me:
  1. Run the rsync --dry-run first in order to get the list of files those would be affected.

rsync -avzm --stats --safe-links --ignore-existing --dry-run --human-readable /data/projects REMOTE-HOST:/data/ > /tmp/transfer.log

  1. I fed the output of cat transfer.log to parallel in order to run 5 rsync s in parallel, as follows:

cat /tmp/transfer.log | parallel --will-cite -j 5 rsync -avzm --relative --stats --safe-links --ignore-existing --human-readable {} REMOTE-HOST:/data/ > result.log

Here, --relative option ( link ) ensured that the directory structure for the affected files, at the source and destination, remains the same (inside /data/ directory), so the command must be run in the source folder (in example, /data/projects ).

Sandip Bhattacharya ,Nov 17, 2016 at 21:22

That would do an rsync per file. It would probably be more efficient to split up the whole file list using split and feed those filenames to parallel. Then use rsync's --files-from to get the filenames out of each file and sync them. rm backups.* split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync --progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/ – Sandip Bhattacharya Nov 17 '16 at 21:22

Mike D ,Sep 19, 2017 at 16:42

How does the second rsync command handle the lines in result.log that are not files? i.e. receiving file list ... done created directory /data/ . – Mike D Sep 19 '17 at 16:42

Cheetah ,Oct 12, 2017 at 5:31

On newer versions of rsync (3.1.0+), you can use --info=name in place of -v , and you'll get just the names of the files and directories. You may want to use --protect-args to the 'inner' transferring rsync too if any files might have spaces or shell metacharacters in them. – Cheetah Oct 12 '17 at 5:31

Mikhail ,Apr 10, 2017 at 3:28

I would strongly discourage anybody from using the accepted answer, a better solution is to crawl the top level directory and launch a proportional number of rync operations.

I have a large zfs volume and my source was was a cifs mount. Both are linked with 10G, and in some benchmarks can saturate the link. Performance was evaluated using zpool iostat 1 .

The source drive was mounted like:

mount -t cifs -o username=,password= //static_ip/70tb /mnt/Datahoarder_Mount/ -o vers=3.0

Using a single rsync process:

rsync -h -v -r -P -t /mnt/Datahoarder_Mount/ /StoragePod

the io meter reads:

StoragePod  30.0T   144T      0  1.61K      0   130M
StoragePod  30.0T   144T      0  1.61K      0   130M
StoragePod  30.0T   144T      0  1.62K      0   130M

This in synthetic benchmarks (crystal disk), performance for sequential write approaches 900 MB/s which means the link is saturated. 130MB/s is not very good, and the difference between waiting a weekend and two weeks.

So, I built the file list and tried to run the sync again (I have a 64 core machine):

cat /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount.log | parallel --will-cite -j 16 rsync -avzm --relative --stats --safe-links --size-only --human-readable {} /StoragePod/ > /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount_result.log

and it had the same performance!

StoragePod  29.9T   144T      0  1.63K      0   130M
StoragePod  29.9T   144T      0  1.62K      0   130M
StoragePod  29.9T   144T      0  1.56K      0   129M

As an alternative I simply ran rsync on the root folders:

rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/Marcello_zinc_bone /StoragePod/Marcello_zinc_bone
rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/fibroblast_growth /StoragePod/fibroblast_growth
rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/QDIC /StoragePod/QDIC
rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/sexy_dps_cell /StoragePod/sexy_dps_cell

This actually boosted performance:

StoragePod  30.1T   144T     13  3.66K   112K   343M
StoragePod  30.1T   144T     24  5.11K   184K   469M
StoragePod  30.1T   144T     25  4.30K   196K   373M

In conclusion, as @Sandip Bhattacharya brought up, write a small script to get the directories and parallel that. Alternatively, pass a file list to rsync. But don't create new instances for each file.

Julien Palard ,May 25, 2016 at 14:15

I personally use this simple one:
ls -1 | parallel rsync -a {} /destination/directory/

Which only is usefull when you have more than a few non-near-empty directories, else you'll end up having almost every rsync terminating and the last one doing all the job alone.

Ole Tange ,Mar 13, 2015 at 7:25

A tested way to do the parallelized rsync is: http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Parallelizing-rsync

rsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.

The following will start one rsync per big file in src-dir to dest-dir on the server fooserver:

cd src-dir; find . -type f -size +100000 | \
parallel -v ssh fooserver mkdir -p /dest-dir/{//}\; \
  rsync -s -Havessh {} fooserver:/dest-dir/{}

The directories created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:

rsync -Havessh src-dir/ fooserver:/dest-dir/

If you are unable to push data, but need to pull them and the files are called digits.png (e.g. 000000.png) you might be able to do:

seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/

Mandar Shinde ,Mar 13, 2015 at 7:34

Any other alternative in order to avoid find ? – Mandar Shinde Mar 13 '15 at 7:34

Ole Tange ,Mar 17, 2015 at 9:20

Limit the -maxdepth of find. – Ole Tange Mar 17 '15 at 9:20

Mandar Shinde ,Apr 10, 2015 at 3:47

If I use --dry-run option in rsync , I would have a list of files that would be transferred. Can I provide that file list to parallel in order to parallelise the process? – Mandar Shinde Apr 10 '15 at 3:47

Ole Tange ,Apr 10, 2015 at 5:51

cat files | parallel -v ssh fooserver mkdir -p /dest-dir/{//}\; rsync -s -Havessh {} fooserver:/dest-dir/{} – Ole Tange Apr 10 '15 at 5:51

Mandar Shinde ,Apr 10, 2015 at 9:49

Can you please explain the mkdir -p /dest-dir/{//}\; part? Especially the {//} thing is a bit confusing. – Mandar Shinde Apr 10 '15 at 9:49

,

For multi destination syncs, I am using
parallel rsync -avi /path/to/source ::: host1: host2: host3:

Hint: All ssh connections are established with public keys in ~/.ssh/authorized_keys

[Jun 02, 2018] Parallelizing rsync

Jun 02, 2018 | www.gnu.org

rsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.

The following will start one rsync per big file in src-dir to dest-dir on the server fooserver :

  cd src-dir; find . -type f -size +100000 | \
    parallel -v ssh fooserver mkdir -p /dest-dir/{//}\; \
      rsync -s -Havessh {} fooserver:/dest-dir/{}

The dirs created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:

  rsync -Havessh src-dir/ fooserver:/dest-dir/

If you are unable to push data, but need to pull them and the files are called digits.png (e.g. 000000.png) you might be able to do:

  seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/

[Apr 29, 2018] How not to do system bare-metal backup with tar

He excluded /dev. This is a mistake if we are talking about bare metal recovery.
Apr 29, 2018 | serverfault.com

The backup is made with Tar. I backup the whole system into the Tar file.

If the HDD on my webserver dies, I got all my backups in a safe place.

But what would be the best way to do a Bare Metal Restore on a new HDD with a differential backup make the previous day? Can I boot with a boot cd, and then format a new HDD and untar the backup file into it? How do I do that exactly?

EDIT:

This is my backup script:

#!/bin/sh
# Backup script

BACKUPDIR="/backups"
BACKUPFILE=$BACKUPDIR/backup_$(date +%y-%m-%d).tgz

if [ ! -d $BACKUPDIR ]; then
        mkdir $BACKUPDIR
fi

if [ -f $BACKUPFILE ]; then
        echo "Backup file already exists and will be replaced."
        rm $BACKUPFILE
fi

apt-get clean

tar czpf $BACKUPFILE --same-owner \
--exclude=$BACKUPDIR \
--exclude=/boot/grub/menu.lst* \
--exclude=/home/error.log \
--exclude=/proc \
--exclude=/media \
--exclude=/dev/* \
--exclude=/mnt \
--exclude=/sys/* \
--exclude=/cdrom \
--exclude=/lost+found \
--exclude=/var/cache/* \
--exclude=/tmp / 2>/home/error.log
linux backup debian tar share improve this question edited Dec 22 '11 at 13:25 asked Dec 22 '11 at 3:44 Jonathan Rioux 1,087 4 22 47 add a comment 4 Answers active oldest votes up vote 3 down vote accepted Simply restoring the HDD will not be enough, you're probably will want your boot record too which I hardly believe exists in your backup (am I wrong?, it's better for you if i do!)...

Lest assume you got the server to the point it can boot (i personally prefer creating the additional partition mounted to /boot which will have kernel and initrd with busybox or something similar to allow you basic maintenance tasks). You can also use a live CD of your Linux distribution.

Mount your future root partition somewhere and restore your backup.

tar was created for tapes so it support appending to archive files with same name. If you used this method just untar -xvpf backup.tar -C /mnt if not you'll need to restore "last sunday" backup and applying deferential parts up to needed day.

You should keep in mind that there is a lot of stuff that you should not backup, things like: /proc , /dev , /sys , /media , /mnt (and probably some more which depend on your needs). You'll need to take care of it before creating backup, or it may became severe pain while in restore process!

There is many points that you can easily miss with that backup method for whole server:

Some good points on that exact method can be found on Ubuntu Wiki:BackupYourSystem/TAR . Look for Restoring.

BTW:

P.P.S

I recommend reading couple of Jeff Atwood posts about backups http://www.codinghorror.com/blog/2008/01/whats-your-backup-strategy.html and http://www.codinghorror.com/blog/2009/12/international-backup-awareness-day.html

[Apr 29, 2018] Bare-metal server restore using tar by Keith Winston

The idea of restoring only selected directories after creating "skeleton" linux OS from Red Hat DVD is viable. But this is not optimal bare matalrestore method with tar
Apr 29, 2018 | www.linux.com

... ... ...

The backup tape from the previous night was still on site (our off-site rotations happen once a week). Once I restored the filelist.txt file, I browsed through the list to determine the order that the directories were written to the tape. Then, I placed that list in this restore script:

#!/bin/sh

# Restore everything
# This script restores all system files from tape.
#
# Initialize the tape drive
if /bin/mt -f "/dev/nst0" tell > /dev/null 2>&1
then
    # Rewind before restore
    /bin/mt -f "/dev/nst0" rewind > /dev/null 2>&1
else
    echo "Restore aborted: No tape loaded"
    exit 1
fi

# Do restore
# The directory order must match the order on the tape.
#
/bin/tar --extract --verbose --preserve --file=/dev/nst0 var etc root usr lib boot bin home sbin backup

# note: in many cases, these directories don't need to be restored:
# initrd opt misc tmp mnt

# Rewind tape when done
/bin/mt -f "/dev/nst0" rewind

In the script, the list of directories to restore is passed as parameters to tar. Just as in the backup script, it is important to use the
--preserve switch so that file permissions are restored to the way they were before the backup. I could have just restored the / directory, but
there were a couple of directories I wanted to exclude, so I decided to be explicit about what to restore. If you want to use this script for your own restores, be sure the list of directories matches the order they were backed up on your system.

Although it is listed in the restore script, I removed the /boot directory from my restore, because I suspected my file system problem was related to a kernel upgrade I had done three days earlier. By not restoring the /boot directory, the system would continue to use the stock kernel that shipped on the CDs until I upgraded it. I also wanted to exclude the /tmp directory and a few other directories that I knew were not important.

The restore ran for a long time, but uneventfully. Finally, I rebooted the system, reloaded the MySQL databases from the dumps, and the system was fully restored and working perfectly. Just over four hours elapsed from total meltdown to complete restore. I probably could trim at least an hour off that time if I had to do it a second time.

Postmortem

I filed a bug report with Red Hat Bugzilla , but I could only provide log files from the day before the crash. All core files and logs from the day of the crash were lost when I tried to repair the file system. I exchanged posts with a Red Hat engineer, but we were not able to nail down the cause. I suspect the problem was either in the RAID driver code or ext3 code. I should note that the server is a relatively new HP ProLiant server with an Intel hyperthreaded Pentium 4 processor. Because the Linux kernel sees a hyperthreaded processor as a dual processor, I was using an SMP kernel when the problem arose. I reasoned that I might squeeze a few percentage points of performance out of the SMP kernel. This bug may only manifest when running on a hyperthreaded processor in SMP mode. I don't have a spare server to try to recreate it.

After the restore, I went back to the uniprocessor kernel and have not yet patched it back up to the level it had been. Happily, the ext3 error has not returned. I scan the logs every day, but it has been well over a month since the restore and there are still no signs of trouble. I am looking forward to my next full restore -- hopefully not until sometime in 2013.

[Apr 29, 2018] Clear unused space with zeros (ext3, ext4)

Notable quotes:
"... Purpose: I'd like to compress partition images, so filling unused space with zeros is highly recommended. ..."
"... Such an utility is zerofree . ..."
"... Be careful - I lost ext4 filesystem using zerofree on Astralinux (Debian based) ..."
"... If the "disk" your filesystem is on is thin provisioned (e.g. a modern SSD supporting TRIM, a VM file whose format supports sparseness etc.) and your kernel says the block device understands it, you can use e2fsck -E discard src_fs to discard unused space (requires e2fsprogs 1.42.2 or higher). ..."
"... If you have e2fsprogs 1.42.9, then you can use e2image to create the partition image without the free space in the first place, so you can skip the zeroing step. ..."
Apr 29, 2018 | unix.stackexchange.com

Grzegorz Wierzowiecki, Jul 29, 2012 at 10:02

How to clear unused space with zeros ? (ext3, ext4)

I'm looking for something smarter than

cat /dev/zero > /mnt/X/big_zero ; sync; rm /mnt/X/big_zero

Like FSArchiver is looking for "used space" and ignores unused, but opposite site.

Purpose: I'd like to compress partition images, so filling unused space with zeros is highly recommended.

Btw. For btrfs : Clear unused space with zeros (btrfs)

Mat, Jul 29, 2012 at 10:18

Check this out: superuser.com/questions/19326/Mat Jul 29 '12 at 10:18

Totor, Jan 5, 2014 at 2:57

Two different kind of answer are possible. What are you trying to achieve? Either 1) security, by forbidding someone to read those data, or 2) optimizing compression of the whole partition or [SSD performance]( en.wikipedia.org/wiki/Trim_(computing) ? – Totor Jan 5 '14 at 2:57

enzotib, Jul 29, 2012 at 11:45

Such an utility is zerofree .

From its description:

Zerofree finds the unallocated, non-zeroed blocks in an ext2 or ext3 file-system and fills them with zeroes. This is useful if the device on which this file-system resides is a disk image. In this case, depending on the type of disk image, a secondary utility may be able to reduce the size of the disk image after zerofree has been run. Zerofree requires the file-system to be unmounted or mounted read-only.

The usual way to achieve the same result (zeroing the unused blocks) is to run "dd" do create a file full of zeroes that takes up the entire free space on the drive, and then delete this file. This has many disadvantages, which zerofree alleviates:

  • it is slow
  • it makes the disk image (temporarily) grow to its maximal extent
  • it (temporarily) uses all free space on the disk, so other concurrent write actions may fail.

Zerofree has been written to be run from GNU/Linux systems installed as guest OSes inside a virtual machine. If this is not your case, you almost certainly don't need this package.

UPDATE #1

The description of the .deb package contains the following paragraph now which would imply this will work fine with ext4 too.

Description: zero free blocks from ext2, ext3 and ext4 file-systems Zerofree finds the unallocated blocks with non-zero value content in an ext2, ext3 or ext4 file-system and fills them with zeroes...

Grzegorz Wierzowiecki, Jul 29, 2012 at 14:08

Is it official page of the tool intgat.tigress.co.uk/rmy/uml/index.html ? Do you think it's safe to use with ext4 ? – Grzegorz Wierzowiecki Jul 29 '12 at 14:08

enzotib, Jul 29, 2012 at 14:12

@GrzegorzWierzowiecki: yes, that is the page, but for debian and friends it is already in the repos. I used on a ext4 partition on a virtual disk to successively shrink the disk file image, and had no problem. – enzotib Jul 29 '12 at 14:12

jlh, Mar 4, 2016 at 10:10

This isn't equivalent to the crude dd method in the original question, since it doesn't work on mounted file systems. – jlh Mar 4 '16 at 10:10

endolith, Oct 14, 2016 at 16:33

zerofree page talks about a patch that lets you do "filesystem is mounted with the zerofree option" so that it always zeros out deleted files continuously. does this require recompiling the kernel then? is there an easier way to accomplish the same thing? – endolith Oct 14 '16 at 16:33

Hubbitus, Nov 23, 2016 at 22:20

Be careful - I lost ext4 filesystem using zerofree on Astralinux (Debian based)Hubbitus Nov 23 '16 at 22:20

Anon, Dec 27, 2015 at 17:53

Summary of the methods (as mentioned in this question and elsewhere) to clear unused space on ext2/ext3/ext4: Zeroing unused space File system is not mounted File system is mounted

Having the filesystem unmounted will give better results than having it mounted. Discarding tends to be the fastest method when a lot of previously used space needs to be zeroed but using zerofree after the discard process can sometimes zero a little bit extra (depending on how discard is implemented on the "disk").

Making the image file smaller Image is in a dedicated VM format

You will need to use an appropriate disk image tool (such as qemu-img convert src_image dst_image ) to enable the zeroed space to be reclaimed and to allow the file representing the image to become smaller.

Image is a raw file

One of the following techniques can be used to make the file sparse (so runs of zero stop taking up space):

These days it might easier to use a tool like virt-sparsify to do these steps and more in one go.

Sources

cas, Jul 29, 2012 at 11:45

sfill from secure-delete can do this and several other related jobs.

e.g.

sfill -l -l -z /mnt/X
UPDATE #1

There is a source tree that appears to be used by the ArchLinux project on github that contains the source for sfill which is a tool included in the package Secure-Delete.

Also a copy of sfill 's man page is here:

cas, Jul 29, 2012 at 12:04

that URL is obsolete. no idea where its home page is now (or even if it still has one), but it's packaged for debian and ubuntu. probably other distros too. if you need source code, that can be found in the debian archives if you can't find it anywhere else. – cas Jul 29 '12 at 12:04

mwfearnley, Jul 31, 2017 at 13:04

The obsolete manpage URL is fixed now. Looks like "Digipedia" is no longer a thing. – mwfearnley Jul 31 '17 at 13:04

psusi, Apr 2, 2014 at 15:27

If you have e2fsprogs 1.42.9, then you can use e2image to create the partition image without the free space in the first place, so you can skip the zeroing step.

mwfearnley, Mar 3, 2017 at 13:36

I couldn't (easily) find any info online about these parameters, but they are indeed given in the 1.42.9 release notes: e2fsprogs.sf.net/e2fsprogs-release.html#1.42.9mwfearnley Mar 3 '17 at 13:36

user64219, Apr 2, 2014 at 14:39

You can use sfill . It's a better solution for thin volumes.

Anthon, Apr 2, 2014 at 15:01

If you want to comment on cas answer, wait until you have enough reputation to do so. – Anthon Apr 2 '14 at 15:01

derobert, Apr 2, 2014 at 17:01

I think the answer is referring to manpages.ubuntu.com/manpages/lucid/man1/sfill.1.html ... which is at least an attempt at answering. ("online" in this case meaning "with the filesystem mounted", not "on the web"). – derobert Apr 2 '14 at 17:01

[Apr 28, 2018] GitHub - ch1x0r-LinuxRespin Fork of remastersys - updates

Apr 28, 2018 | github.com

Fork of remastersys - updates

This tool is used to backup your image, create distributions, create live cd/dvds. install respin

If you are using Ubuntu - Consider switching to Debian. This is NOT officially for Ubuntu. Debian.

We have an Ubuntu GUI version now. Thank you to the members of the Ubuntu Community for working with us!!! We were also featured in LinuxJournal! http://www.linuxjournal.com/content/5-minute-foss-spinning-custom-linux-distribution

Respin

For more information, please visit http://www.linuxrespin.org

See also: 5 Minute FOSS Spinning a custom Linux distribution Linux Journal by Petros Koutoupis on March 23, 2018

[Apr 22, 2018] Happy Sysadmin Appreciation Day 2016 Opensource.com

Apr 22, 2018 | opensource.com

Necessity is frequently the mother of invention. I knew very little about BASH scripting but that was about to change rapidly. Working with the existing script and using online help forums, search engines, and some printed documentation, I setup Linux network attached storage computer running on Fedora Core. I learned how to create an SSH keypair and configure that along with rsync to move the backup file from the email server to the storage server. That worked well for a few days until I noticed that the storage servers disk space was rapidly disappearing. What was I going to do?

That's when I learned more about Bash scripting. I modified my rsync command to delete backed up files older than ten days. In both cases I learned that a little knowledge can be a dangerous thing but in each case my experience and confidence as Linux user and system administrator grew and due to that I functioned as a resource for other. On the plus side, we soon realized that the disk to disk backup system was superior to tape when it came to restoring email files. In the long run it was a win but there was a lot of uncertainty and anxiety along the way.

[Apr 20, 2018] GitHub - teejee2008-timeshift System restore tool for Linux. Creates filesystem snapshots using rsync+hardlinks, or BTRFS snap

Notable quotes:
"... System Restore ..."
Apr 20, 2018 | github.com

Timeshift

Timeshift for Linux is an application that provides functionality similar to the System Restore feature in Windows and the Time Machine tool in Mac OS. Timeshift protects your system by taking incremental snapshots of the file system at regular intervals. These snapshots can be restored at a later date to undo all changes to the system.

In RSYNC mode, snapshots are taken using rsync and hard-links . Common files are shared between snapshots which saves disk space. Each snapshot is a full system backup that can be browsed with a file manager.

In BTRFS mode, snapshots are taken using the in-built features of the BTRFS filesystem. BTRFS snapshots are supported only on BTRFS systems having an Ubuntu-type subvolume layout (with @ and @home subvolumes).

Timeshift is similar to applications like rsnapshot , BackInTime and TimeVault but with different goals. It is designed to protect only system files and settings. User files such as documents, pictures and music are excluded. This ensures that your files remains unchanged when you restore your system to an earlier date. If you need a tool to backup your documents and files please take a look at the excellent BackInTime application which is more configurable and provides options for saving user files.

[Apr 03, 2018] Relax and Recover

Apr 03, 2018 | blog.nasmart.me

The main purpose of ReaR is to create a bootable image, based on what is currently installed on a Linux host, that can be used to partition disks and retrieve a backup of the system. There are options for where to create the bootable image and what to do with it after it has been created.

The bootable image can be a USB device, an ISO file or a number of other options.

If you create a bootable image on a USB device then you may also wish to create a backup of your system on the same device, which ReaR will support.

When creating a bootable image as an ISO file you have a multitude of options for what do to with the file in order to get it off the box so that it can be used for recovery. The two options I have used are rsync and TSM .

The misconception I mentioned earlier is the belief that ReaR will backup your system. It can do that, but it is not a given and depends on your configuration acheter du cialis 5 .

... ... ...

Example Procedure

The following is an example of the produce to protect a system with ReaR and TSM during some operating system patching activities (assumes TSM is already installed):

  1. Install ReaR (rpms are available here ).
  2. Configure ReaR to use TSM and to create an ISO file by updating /etc/rear/local.conf with a line of OUTPUT=ISO and another with BACKUP=TSM.
  3. Run "rear -v mkrescue" to create the bootable ISO and send it to TSM (mkbackup would have the same effect in this case as TSM will be handling the file system backups independently – I feel mkrescue makes it clearer what you're doing).
  4. Perform a incremental backup of your file systems with TSM using "dsmc inc ".
  5. Do your patching activity.

If all goes well then you don't need to boot from the ReaR ISO and restore you operating system. But, let's say it didn't go well. Your system will no longer boot and there's no immediately obvious way forward. You decide to restore. The procedure is:

  1. Restore the ReaR ISO to a location that will allow you to present it to the server. This is most likely to be your desktop so you can present the ISO file as a virtual CD-ROM over the ILOM interface.
  2. Present the ISO to the host to be recovered.
  3. Boot the host from the ISO – It is highly likely that you'll need to change the boot order or get a pop-up menu to select the ISO as the boot media.
  4. Select "Recover <hostname>" at the grub prompt.
  5. Log in as root (password not required).
  6. Run "rear -v recover" and answer the interactive prompts.

Issues

Since starting to use ReaR I have encountered two problems:

  1. When recovering a host that used an ext4 file system for /boot I found myself facing at message of "Error 16: Inconsistent filesystem structure." from grub. After a bit of digging around and trying to understand what the issue was I ended up modifying the /var/lib/rear/layout/disklayout.conf ReaR file to change the file system type for /boot from ext4 to ext2. I initially tried ext3, but as the system did not use ext3 for any of the file systems the module was not available.
  2. The version of ReaR that I was using had a bug ( tracked on GitHub ) that affected systems that do not have a separate /boot partition. There is a patch for the bug available, but if like me you're happy to have a manual workaround, you need to perform the following actions after the restore completes:
# chroot /mnt/local
# PATH=/bin:/sbin:/usr/bin
# grub-install <disk path>
# exit
# reboot

Finally, it's worth mentioning that ReaR is written in shell and is open source.

[Mar 13, 2018] GitHub - intoli-exodus Painless relocation of Linux binaries and all of their dependencies without containers.

Mar 13, 2018 | github.com

Painless relocation of Linux binaries–and all of their dependencies–without containers.

The Problem Being Solved

If you simply copy an executable file from one system to another, then you're very likely going to run into problems. Most binaries available on Linux are dynamically linked and depend on a number of external library files. You'll get an error like this when running a relocated binary when it has a missing dependency.

aria2c: error while loading shared libraries: libgnutls.so.30: cannot open shared object file: No such file or directory

You can try to install these libraries manually, or to relocate them and set LD_LIBRARY_PATH to wherever you put them, but it turns out that the locations of the ld-linux linker and the glibc libraries are hardcoded. Things can very quickly turn into a mess of relocation errors,

aria2c: relocation error: /lib/libpthread.so.0: symbol __getrlimit, version
GLIBC_PRIVATE not defined in file libc.so.6 with link time reference

segmentation faults,

Segmentation fault (core dumped)

or, if you're really unlucky, this very confusing symptom of a missing linker.

$ ./aria2c
bash: ./aria2c: No such file or directory
$ ls -lha ./aria2c
-rwxr-xr-x 1 sangaline sangaline 2.8M Jan 30 21:18 ./aria2c

Exodus works around these issues by compiling a small statically linked launcher binary that invokes the relocated linker directly with any hardcoded RPATH library paths overridden. The relocated binary will run with the exact same linker and libraries that it ran with on its origin machine.

[Mar 13, 2018] How To's

Mar 13, 2018 | linuxtechlab.com

Tips & Tricks How to restore deleted files in Linux with Foremost

by Shusain · March 2, 2018

It might have happened to you at one point or another that you deleted a file or an image by mistake & than regretted it immediately. So can we restore such a deleted file/image on Linux machine. In this tutorial, we are going to discuss just that i.e. how to restore a deleted file on Linux machine.

To restore a deleted file on Linux machine, we will be using an application called 'Foremost' . Foremost is a Linux based program data for recovering deleted files. The program uses a configuration file to specify headers and footers to search for. Intended to be run on disk images, foremost can search through most any kind of data without worrying about the format.

Note:- We can only restore deleted files in Linux as long as those sectors have not been overwritten on the hard disk.

We will now discuss how to recover the data with foremost. Let's start tutorial by installation of Foremost on CentOS & Ubuntu systems.

( Recommended Read: Complete guide for creating Vagrant boxes with VirtualBox )

(Also Read: Checking website statistics using Webalizer )

Install Foremost

To install Foremost on CentOS, we will download & install the foremost rpm from official webpage. Open terminal & execute the following command,

$ sudo yum install https://forensics.cert.org/centos/cert/7/x86_64//foremost-1.5.7-13.1.el7.x86_64.rpm –y

With Ubuntu, the foremost package is available with default repository. To install foremost on Ubuntu, run the following command from terminal,

$ sudo apt-get install foremost

Restore deleted files in Linux

For this scenario, we have kept an image named 'dan.jpg ' on our system. We will now delete it from the system with the following command,

$ sudo rm –rf dan.jpg

Now we will use the foremost utility to restore the image, run the following command to restore the file,

$ foremost –t jpeg –I /dev/sda1

Here, with option 't' , we have defined the type of file that needs to be restored,

-I , tells the foremost to look for the file in partition ' /dev/sda1' . We can check the partition with 'mount' command.

Upon successful execution of the command, the file will be restored in current folder. We can also add option to restore the file in a particular folder with option 'o'

$ foremost –t jpeg –I /dev/sda1 –o /root/test_folder

Note:- The restored file will not have the same file name of the original file as the filename is not stored with file itself. So file name will be different but the data should all be there.

With this we now end our tutorial on how to restore deleted files in Linux machine using Foremost. Please feel free to send in any questions or suggestion using the comment box below.

[Dec 09, 2017] How to rsync only a specific list of files - Stack Overflow

Notable quotes:
"... The filenames that are read from the FILE are all relative to the source dir ..."
Dec 09, 2017 | stackoverflow.com

ash, May 11, 2015 at 20:05

There is a flag --files-from that does exactly what you want. From man rsync :
--files-from=FILE

Using this option allows you to specify the exact list of files to transfer (as read from the specified FILE or - for standard input). It also tweaks the default behavior of rsync to make transferring just the specified files and directories easier:

The filenames that are read from the FILE are all relative to the source dir -- any leading slashes are removed and no ".." references are allowed to go higher than the source dir. For example, take this command:

rsync -a --files-from=/tmp/foo /usr remote:/backup

If /tmp/foo contains the string "bin" (or even "/bin"), the /usr/bin directory will be created as /backup/bin on the remote host. If it contains "bin/" (note the trailing slash), the immediate contents of the directory would also be sent (without needing to be explicitly mentioned in the file -- this began in version 2.6.4). In both cases, if the -r option was enabled, that dir's entire hierarchy would also be transferred (keep in mind that -r needs to be specified explicitly with --files-from, since it is not implied by -a). Also note that the effect of the (enabled by default) --relative option is to duplicate only the path info that is read from the file -- it does not force the duplication of the source-spec path (/usr in this case).

In addition, the --files-from file can be read from the remote host instead of the local host if you specify a "host:" in front of the file (the host must match one end of the transfer). As a short-cut, you can specify just a prefix of ":" to mean "use the remote end of the transfer". For example:

rsync -a --files-from=:/path/file-list src:/ /tmp/copy

This would copy all the files specified in the /path/file-list file that was located on the remote "src" host.

If the --iconv and --protect-args options are specified and the --files-from filenames are being sent from one host to another, the filenames will be translated from the sending host's charset to the receiving host's charset.

NOTE: sorting the list of files in the --files-from input helps rsync to be more efficient, as it will avoid re-visiting the path elements that are shared between adjacent entries. If the input is not sorted, some path elements (implied directories) may end up being scanned multiple times, and rsync will eventually unduplicate them after they get turned into file-list elements.

Nicolas Mattia, Feb 11, 2016 at 11:06

Note that you still have to specify the directory where the files listed are located, for instance: rsync -av --files-from=file-list . target/ for copying files from the current dir. – Nicolas Mattia Feb 11 '16 at 11:06

ash, Feb 12, 2016 at 2:25

Yes, and to reiterate: The filenames that are read from the FILE are all relative to the source dir . – ash Feb 12 '16 at 2:25

Michael ,Nov 2, 2016 at 0:09

if the files-from file has anything starting with .. rsync appears to ignore the .. giving me an error like rsync: link_stat "/home/michael/test/subdir/test.txt" failed: No such file or directory (in this case running from the "test" dir and trying to specify "../subdir/test.txt" which does exist. – Michael Nov 2 '16 at 0:09

xxx,

--files-from= parameter needs trailing slash if you want to keep the absolute path intact. So your command would become something like below:
rsync -av --files-from=/path/to/file / /tmp/

This could be done like there are a large number of files and you want to copy all files to x path. So you would find the files and throw output to a file like below:

find /var/* -name *.log > file

[Oct 03, 2017] Timeshift A System Restore Utility Tool Review - LinuxAndUbuntu - Linux News Apps Reviews Linux Tutorials HowTo

Look like technologically this is a questionable approach although technical details are unclear. Rsync is better done by other tools and BTRFS is a niche filesystem.
www.unz.com

TimeShift is a system restore tool for Linux. It provides functionality that is quite similar to the System Restore feature in Windows or the Time Machine tool in MacOS. TimeShift protects your system by making incremental snapshots of the file system manually or at regular automated intervals.

These snapshots can then be restored at a later point to undo all changes to the system and restore it to the previous state. Snapshots are made using rsync and hard-links and the tool shares common files amongst snapshots in order to save disk space. Now that we have an idea about what Timeshift is, let us take take a detail look at setting up and using this tool. ​​

... ... ...

Timeshift supports 2 snapshot formats. The first is by using Rsync and the second is by using the in-built features of BTRFS file system that allows snapshots to be created. So you can select the BTRFS format if you are using that particular filesystem. Other than that, you have to choose the Rsync format.

[Aug 29, 2017] backup-etc.sh -- A script to backup the /etc directory

This is simple script that generated "dot" progression lines. Backup name includes a timestamp. No rotation implemented.
Aug 29, 2017 | wpollock.com
   #!/bin/bash
# Script to backup the /etc heirarchy
#
# Written 4/2002 by Wayne Pollock, Tampa Florida USA
#
#  $Id: backup-etc,v 1.6 2004/08/25 01:42:26 wpollock Exp $
#
# $Log: backup-etc,v $
# Revision 1.6  2004/08/25 01:42:26  wpollock
# Changed backup name to include the hostname and 4 digit years.
#
# Revision 1.5  2004/01/07 18:07:33  wpollock
# Fixed dots routine to count files first, then calculate files per dot.
#
# Revision 1.4  2003/04/03 08:10:12  wpollock
# Changed how the version number is obtained, so the file
# can be checked out normally.
#
# Revision 1.3  2003/04/03 08:01:25  wpollock
# Added ultra-fancy dots function for verbose mode.
#
# Revision 1.2  2003/04/01 15:03:33  wpollock
# Eliminated the use of find, and discovered that tar was working
# as intended all along!  (Each directory that find found was
# recursively backed-up, so for example /etc, then /etc/mail,
# caused /etc/mail/sendmail.mc to be backuped three times.)
#
# Revision 1.1  2003/03/23 18:57:29  wpollock
# Modified by Wayne Pollock:
#
# Discovered not all files were being backed up, so
# added "-print0 --force-local" to find and "--null -T -"
# to tar (eliminating xargs), to fix the problem when filenames
# contain metacharacters such as whitespace.
# Although this now seems to work, the current version of tar
# seems to have a bug causing it to backup every file two or
# three times when using these options!  This is still better
# than not backing up some files at all.)
#
# Changed the logger level from "warning" to "error".
#
# Added '-v, --verbose' options to display dots every 60 files,
# just to give feedback to a user.
#
# Added '-V, --version' and '-h, --help' options.
#
# Removed the lock file mechanism and backup file renaming
# (from foo to foo.1), in favor of just including a time-stamp
# of the form "yymmdd-hhmm" to the filename.
#
#

PATH=/bin:/usr/bin

# The backups should probably be stored in /var somplace:
REPOSITORY=/root
TIMESTAMP=$(date '+%Y%m%d-%H%M')
HOSTNAME=$(hostname)
FILE="$REPOSITORY/$HOSTNAME-etc-full-backup-$TIMESTAMP.tgz"

ERRMSGS=/tmp/backup-etc.$$
PROG=${0##*/}
VERSION=$(echo $Revision: 1.6 $ |awk '{print$2}')
VERBOSE=off

usage()
{  echo "This script creates a full backup of /etc via tar in $REPOSITORY."
   echo "Usage: $PROG [OPTIONS]"
   echo '  Options:'
   echo '    -v, --verbose   displays some feedback (dots) during backup'
   echo '    -h, --help      displays this message'
   echo '    -V, --version   display program version and author info'
   echo
}

dots()
{  MAX_DOTS=50
   NUM_FILES=`find /etc|wc -l`
   let 'FILES_PER_DOT = NUM_FILES / MAX_DOTS'
   bold=`tput smso`
   norm=`tput rmso`
   tput sc
   tput civis
   echo -n "$bold(00%)$norm"
   while read; do
      let "cnt = (cnt + 1) % FILES_PER_DOT"
      if [ "$cnt" -eq 0 ]
      then
         let '++num_dots'
         let 'percent = (100 * num_dots) / MAX_DOTS'
         [ "$percent" -gt "100" ] && percent=100
         tput rc
         printf "$bold(%02d%%)$norm" "$percent"
         tput smir
         echo -n "."
         tput rmir
      fi
   done
   tput cnorm
   echo
}

# Command line argument processing:
while [ $# -gt 0 ]
do
   case "$1" in
      -v|--verbose)  VERBOSE=on; ;;
      -h|--help)     usage; exit 0; ;;
      -V|--version)  echo -n "$PROG version $VERSION "
                     echo 'Written by Wayne Pollock '
                     exit 0; ;;
      *)             usage; exit 1; ;;
   esac
   shift
done

trap "rm -f $ERRMSGS" EXIT

cd /etc

# create backup, saving any error messages:
if [ "$VERBOSE" != "on" ]
then
    tar -cz --force-local -f $FILE . 2> $ERRMSGS 
else
    tar -czv --force-local -f $FILE . 2> $ERRMSGS | dots
fi

# Log any error messages produced:
if [ -s "$ERRMSGS" ]
then logger -p user.error -t $PROG "$(cat $ERRMSGS)"
else logger -t $PROG "Completed full backup of /etc"
fi

exit 0

[Aug 28, 2017] Rsync over ssh with root access on both sides

Aug 28, 2017 | serverfault.com

I have one older ubuntu server, and one newer debian server and I am migrating data from the old one to the new one. I want to use rsync to transfer data across to make final migration easier and quicker than the equivalent tar/scp/untar process.

As an example, I want to sync the home folders one at a time to the new server. This requires root access at both ends as not all files at the source side are world readable and the destination has to be written with correct permissions into /home. I can't figure out how to give rsync root access on both sides.

I've seen a few related questions, but none quite match what I'm trying to do.

I have sudo set up and working on both servers. ubuntu ssh debian rsync root

share improve this question asked Apr 28 '10 at 9:18 Tim Abell 732 20
add a comment | 3 Answers active oldest votes
up vote down vote accepted Actually you do NOT need to allow root authentication via SSH to run rsync as Antoine suggests. The transport and system authentication can be done entirely over user accounts as long as you can run rsync with sudo on both ends for reading and writing the files.

As a user on your destination server you can suck the data from your source server like this:

sudo rsync -aPe ssh --rsync-path='sudo rsync' boron:/home/fred /home/

The user you run as on both servers will need passwordless* sudo access to the rsync binary, but you do NOT need to enable ssh login as root anywhere. If the user you are using doesn't match on the other end, you can add user@boron: to specify a different remote user.

Good luck.

*or you will need to have entered the password manually inside the timeout window.

share improve this answer edited Jun 30 '10 at 13:51 answered Apr 28 '10 at 22:06 Caleb 9,089 27 43
1
Although this is an old question I'd like to add word of CAUTION to this accepted answer. From my understanding allowing passwordless "sudo rsync" is equivalent to open the root account to remote login. This is because with this it is very easy to gain full root access, e.g. because all system files can be downloaded, modified and replaced without a password. – Ascurion Jan 8 '16 at 16:30
add a comment |
up vote down vote If your data is not highly sensitive, you could use tar and socat. In my experience this is often faster as rsync over ssh.

You need socat or netcat on both sides.

On the target host, go to the directory where you would like to put your data, after that run: socat TCP-LISTEN:4444 - | tar xzf -

If the target host is listening, start it on the source like: tar czf - /home/fred /home/ | socat - TCP:ip-of-remote-server:4444

For this setup you'll need a reliably connection between the 2 servers.

share improve this answer answered Apr 28 '10 at 21:20 Jeroen Moors
Good point. In a trusted environment, you'll pick up a lot of speed by not encrypting. It might not matter on small files, but with GBs of data it will. – pboin May 18 '10 at 10:53
add a comment |
up vote down vote Ok, i've pieced together all the clues to get something that works for me.

Lets call the servers "src" & "dst".

Set up a key pair for root on the destination server, and copy the public key to the source server:

dest $ sudo -i
dest # ssh-keygen
dest # exit
dest $ scp /root/id_rsa.pub src:

Add the public key to root's authorized keys on the source server

src $ sudo -i
src # cp /home/tim/id_rsa.pub .ssh/authorized_keys

Back on the destination server, pull the data across with rsync:

dest $ sudo -i
dest # rsync -aP src:/home/fred /home/

[Aug 28, 2017] Unix Rsync Copy Hidden Dot Files and Directories Only by Vivek Gite

Feb 06, 2014 | www.cyberciti.biz
November 9, 2012 February 6, 2014 in Categories Commands , File system , Linux , UNIX last updated February 6, 2014

How do I use the rsync tool to copy only the hidden files and directory (such as ~/.ssh/, ~/.foo, and so on) from /home/jobs directory to the /mnt/usb directory under Unix like operating system?

The rsync program is used for synchronizing files over a network or local disks. To view or display only hidden files with ls command:

ls -ld ~/.??*

OR

ls -ld ~/.[^.]*

Sample outputs:

ls command: List only hidden files in Unix / Linux terminal

Fig:01 ls command to view only hidden files

rsync not synchronizing all hidden .dot files?

In this example, you used the pattern .[^.]* or .??* to select and display only hidden files using ls command . You can use the same pattern with any Unix command including rsync command. The syntax is as follows to copy hidden files with rsync:

rsync -av /path/to/dir/.??* /path/to/dest
rsync -avzP /path/to/dir/.??* /mnt/usb
rsync -avzP $HOME/.??* [email protected]:/path/to/backup/users/u/user1
rsync -avzP ~/.[^.]* [email protected]:/path/to/backup/users/u/user1

rsync -av /path/to/dir/.??* /path/to/dest rsync -avzP /path/to/dir/.??* /mnt/usb rsync -avzP $HOME/.??* [email protected]:/path/to/backup/users/u/user1 rsync -avzP ~/.[^.]* [email protected]:/path/to/backup/users/u/user1

In this example, copy all hidden files from my home directory to /mnt/test:

rsync -avzP ~/.[^.]* /mnt/test

rsync -avzP ~/.[^.]* /mnt/test

Sample outputs:

Rsync example to copy only hidden files

Fig.02 Rsync example to copy only hidden files

Vivek Gite is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating system/Unix shell scripting. He has worked with global clients and in various industries, including IT, education, defense and space research, and the nonprofit sector. Follow him on Twitter , Facebook , Google+ .

[Aug 28, 2017] rsync doesn't copy files with restrictive permissions

Aug 28, 2017 | superuser.com
up vote down vote favorite Trying to copy files with rsync, it complains:
rsync: send_files failed to open "VirtualBox/Machines/Lubuntu/Lubuntu.vdi" \
(in media): Permission denied (13)

That file is not copied. Indeed the file permissions of that file are very restrictive on the server side:

-rw-------    1 1000     1000     3133181952 Nov  1  2011 Lubuntu.vdi

I call rsync with

sudo rsync -av --fake-super root@sheldon::media /mnt/media

The rsync daemon runs as root on the server. root can copy that file (of course). rsyncd has "fake super = yes" set in /etc/rsyncd.conf.

What can I do so that the file is copied without changing the permissions of the file on the server? rsync file-permissions

share improve this question asked Dec 29 '12 at 10:15 Torsten Bronger 207
If you use RSync as daemon on destination, please post grep rsync /var/log/daemon to improve your question – F. Hauri Dec 29 '12 at 13:23
add a comment |
1 Answer active oldest votes
up vote down vote As you appear to have root access to both servers have you tried a: --force ?

Alternatively you could bypass the rsync daemon and try a direct sync e.g.

rsync -optg --rsh=/usr/bin/ssh --rsync-path=/usr/bin/rsync --verbose --recursive --delete-after --force  root@sheldon::media /mnt/media
share improve this answer edited Jan 2 '13 at 10:55 answered Dec 29 '12 at 13:21 arober11 376
Using ssh means encryption, which makes things slower. --force does only affect directories, if I read the man page correctly. – Torsten Bronger Jan 1 '13 at 23:08
Unless your using ancient kit, the CPU overhead of encrypting / decrypting the traffic shouldn't be noticeable, but you will loose 10-20% of your bandwidth, through the encapsulation process. Then again 80% of a working link is better than 100% of a non working one :) – arober11 Jan 2 '13 at 10:52
do have an "ancient kit". ;-) (Slow ARM CPU on a NAS.) But I now mount the NAS with NFS and use rsync (with "sudo") locally. This solves the problem (and is even faster). However, I still think that my original problem must be solvable using the rsync protocol (remote, no ssh). – Torsten Bronger Jan 4 '13 at 7:55

[Aug 28, 2017] Using rsync under target user to copy home directories

Aug 28, 2017 | unix.stackexchange.com

up vote down vote favorite

nixnotwin , asked Sep 21 '12 at 5:11

On my Ubuntu server there are about 150 shell accounts. All usernames begin with the prefix u12.. I have root access and I am trying to copy a directory named "somefiles" to all the home directories. After copying the directory the user and group ownership of the directory should be changed to user's. Username, group and home-dir name are same. How can this be done?

Gilles , answered Sep 21 '12 at 23:44

Do the copying as the target user. This will automatically make the target files. Make sure that the original files are world-readable (or at least readable by all the target users). Run chmod afterwards if you don't want the copied files to be world-readable.
getent passwd |
awk -F : '$1 ~ /^u12/ {print $1}' |
while IFS= read -r user; do
  su "$user" -c 'cp -Rp /original/location/somefiles ~/'
done

[Aug 28, 2017] rsync over SSH preserve ownership only for www-data owned files

Aug 28, 2017 | stackoverflow.com
up vote 10 down vote favorite 4

jeffery_the_wind , asked Mar 6 '12 at 15:36

I am using rsync to replicate a web folder structure from a local server to a remote server. Both servers are ubuntu linux. I use the following command, and it works well:
rsync -az /var/www/ [email protected]:/var/www/

The usernames for the local system and the remote system are different. From what I have read it may not be possible to preserve all file and folder owners and groups. That is OK, but I would like to preserve owners and groups just for the www-data user, which does exist on both servers.

Is this possible? If so, how would I go about doing that?

Thanks!

** EDIT **

There is some mention of rsync being able to preserve ownership and groups on remote file syncs here: http://lists.samba.org/archive/rsync/2005-August/013203.html

** EDIT 2 **

I ended up getting the desired affect thanks to many of the helpful comments and answers here. Assuming the IP of the source machine is 10.1.1.2 and the IP of the destination machine is 10.1.1.1. I can use this line from the destination machine:

sudo rsync -az [email protected]:/var/www/ /var/www/

This preserves the ownership and groups of the files that have a common user name, like www-data. Note that using rsync without sudo does not preserve these permissions.

ghoti , answered Mar 6 '12 at 19:01

You can also sudo the rsync on the target host by using the --rsync-path option:
# rsync -av --rsync-path="sudo rsync" /path/to/files user@targethost:/path

This lets you authenticate as user on targethost, but still get privileged write permission through sudo . You'll have to modify your sudoers file on the target host to avoid sudo's request for your password. man sudoers or run sudo visudo for instructions and samples.

You mention that you'd like to retain the ownership of files owned by www-data, but not other files. If this is really true, then you may be out of luck unless you implement chown or a second run of rsync to update permissions. There is no way to tell rsync to preserve ownership for just one user .

That said, you should read about rsync's --files-from option.

rsync -av /path/to/files user@targethost:/path
find /path/to/files -user www-data -print | \
  rsync -av --files-from=- --rsync-path="sudo rsync" /path/to/files user@targethost:/path

I haven't tested this, so I'm not sure exactly how piping find's output into --files-from=- will work. You'll undoubtedly need to experiment.

xato , answered Mar 6 '12 at 15:39

As far as I know, you cannot chown files to somebody else than you, if you are not root. So you would have to rsync using the www-data account, as all files will be created with the specified user as owner. So you need to chown the files afterwards.

user2485267 , answered Jun 14 '13 at 8:22

I had a similar problem and cheated the rsync command,

rsync -avz --delete [email protected]:/home//domains/site/public_html/ /home/domains2/public_html && chown -R wwwusr:wwwgrp /home/domains2/public_html/

the && runs the chown against the folder when the rsync completes successfully (1x '&' would run the chown regardless of the rsync completion status)

Graham , answered Mar 6 '12 at 15:51

The root users for the local system and the remote system are different.

What does this mean? The root user is uid 0. How are they different?

Any user with read permission to the directories you want to copy can determine what usernames own what files. Only root can change the ownership of files being written .

You're currently running the command on the source machine, which restricts your writes to the permissions associated with [email protected]. Instead, you can try to run the command as root on the target machine. Your read access on the source machine isn't an issue.

So on the target machine (10.1.1.1), assuming the source is 10.1.1.2:

# rsync -az [email protected]:/var/www/ /var/www/

Make sure your groups match on both machines.

Also, set up access to [email protected] using a DSA or RSA key, so that you can avoid having passwords floating around. For example, as root on your target machine, run:

# ssh-keygen -d

Then take the contents of the file /root/.ssh/id_dsa.pub and add it to ~user/.ssh/authorized_keys on the source machine. You can ssh [email protected] as root from the target machine to see if it works. If you get a password prompt, check your error log to see why the key isn't working.

ghoti , answered Mar 6 '12 at 18:54

Well, you could skip the challenges of rsync altogether, and just do this through a tar tunnel.
sudo tar zcf - /path/to/files | \
  ssh user@remotehost "cd /some/path; sudo tar zxf -"

You'll need to set up your SSH keys as Graham described.

Note that this handles full directory copies, not incremental updates like rsync.

The idea here is that:

[Aug 28, 2017] rsync and file permissions

Aug 28, 2017 | superuser.com
up vote down vote favorite I'm trying to use rsync to copy a set of files from one system to another. I'm running the command as a normal user (not root). On the remote system, the files are owned by apache and when copied they are obviously owned by the local account (fred).

My problem is that every time I run the rsync command, all files are re-synched even though they haven't changed. I think the issue is that rsync sees the file owners are different and my local user doesn't have the ability to change ownership to apache, but I'm not including the -a or -o options so I thought this would not be checked. If I run the command as root, the files come over owned by apache and do not come a second time if I run the command again. However I can't run this as root for other reasons. Here is the command:

/usr/bin/rsync --recursive --rsh=/usr/bin/ssh --rsync-path=/usr/bin/rsync --verbose [email protected]:/src/dir/ /local/dir
unix rsync
share improve this question edited May 2 '11 at 23:53 Gareth 13.9k 11 44 58 asked May 2 '11 at 23:43 Fred Snertz 11
Why can't you run rsync as root? On the remote system, does fred have read access to the apache-owned files? – chrishiestand May 3 '11 at 0:32
Ah, I left out the fact that there are ssh keys set up so that local fred can become remote root, so yes fred/root can read them. I know this is a bit convoluted but its real. – Fred Snertz May 3 '11 at 14:50
Always be careful when root can ssh into the machine. But if you have password and challenge response authentication disabled it's not as bad. – chrishiestand May 3 '11 at 17:32
add a comment |
1 Answer active oldest votes
up vote down vote Here's the answer to your problem:
-c, --checksum
      This changes the way rsync checks if the files have been changed and are in need of a  transfer.   Without  this  option,
      rsync  uses  a "quick check" that (by default) checks if each file's size and time of last modification match between the
      sender and receiver.  This option changes this to compare a 128-bit checksum for each file  that  has  a  matching  size.
      Generating  the  checksums  means  that both sides will expend a lot of disk I/O reading all the data in the files in the
      transfer (and this is prior to any reading that will be done to transfer changed files), so this  can  slow  things  down
      significantly.

      The  sending  side  generates  its checksums while it is doing the file-system scan that builds the list of the available
      files.  The receiver generates its checksums when it is scanning for changed files, and will checksum any file  that  has
      the  same  size  as the corresponding sender's file:  files with either a changed size or a changed checksum are selected
      for transfer.

      Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by  checking
      a  whole-file  checksum  that is generated as the file is transferred, but that automatic after-the-transfer verification
      has nothing to do with this option's before-the-transfer "Does this file need to be updated?" check.

      For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5.  For older protocols, the checksum  used
      is MD4.

So run:

/usr/bin/rsync -c --recursive --rsh=/usr/bin/ssh --rsync-path=/usr/bin/rsync --verbose [email protected]:/src/dir/ /local/dir

Note there may be a time+disk churn tradeoff by using this option. Personally, I'd probably just sync the file's mtimes too:

/usr/bin/rsync -t --recursive --rsh=/usr/bin/ssh --rsync-path=/usr/bin/rsync --verbose [email protected]:/src/dir/ /local/dir
share improve this answer edited May 3 '11 at 17:55 answered May 3 '11 at 17:48 chrishiestand 1,098 10
Awesome. Thank you. Looks like the second option is going to work for me and I found the first very interesting. – Fred Snertz May 3 '11 at 18:40
psst, hit the green checkbox to give my answer credit ;-) Thx. – chrishiestand May 12 '11 at 1:56

[Aug 28, 2017] Why does rsync fail to copy files from /sys in Linux?

Notable quotes:
"... pseudo file system ..."
"... pseudo filesystems ..."
Aug 28, 2017 | unix.stackexchange.com

up vote 11 down vote favorite 1

Eugene Yarmash , asked Apr 24 '13 at 16:35

I have a bash script which uses rsync to backup files in Archlinux. I noticed that rsync failed to copy a file from /sys , while cp worked just fine:
# rsync /sys/class/net/enp3s1/address /tmp    
rsync: read errors mapping "/sys/class/net/enp3s1/address": No data available (61)
rsync: read errors mapping "/sys/class/net/enp3s1/address": No data available (61)
ERROR: address failed verification -- update discarded.
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.9]

# cp  /sys/class/net/enp3s1/address /tmp   ## this works

I wonder why does rsync fail, and is it possible to copy the file with it?

mattdm , answered Apr 24 '13 at 18:20

Rsync has code which specifically checks if a file is truncated during read and gives this error ! ENODATA . I don't know why the files in /sys have this behavior, but since they're not real files, I guess it's not too surprising. There doesn't seem to be a way to tell rsync to skip this particular check.

I think you're probably better off not rsyncing /sys and using specific scripts to cherry-pick out the particular information you want (like the network card address).

Runium , answered Apr 25 '13 at 0:23

First off /sys is a pseudo file system . If you look at /proc/filesystems you will find a list of registered file systems where quite a few has nodev in front. This indicates they are pseudo filesystems . This means they exists on a running kernel as a RAM-based filesystem. Further they do not require a block device.
$ cat /proc/filesystems
nodev   sysfs
nodev   rootfs
nodev   bdev
...

At boot the kernel mount this system and updates entries when suited. E.g. when new hardware is found during boot or by udev .

In /etc/mtab you typically find the mount by:

sysfs /sys sysfs rw,noexec,nosuid,nodev 0 0

For a nice paper on the subject read Patric Mochel's – The sysfs Filesystem .


stat of /sys files

If you go into a directory under /sys and do a ls -l you will notice that all files has one size. Typically 4096 bytes. This is reported by sysfs .

:/sys/devices/pci0000:00/0000:00:19.0/net/eth2$ ls -l
-r--r--r-- 1 root root 4096 Apr 24 20:09 addr_assign_type
-r--r--r-- 1 root root 4096 Apr 24 20:09 address
-r--r--r-- 1 root root 4096 Apr 24 20:09 addr_len
...

Further you can do a stat on a file and notice another distinct feature; it occupies 0 blocks. Also inode of root (stat /sys) is 1. /stat/fs typically has inode 2. etc.

rsync vs. cp

The easiest explanation for rsync failure of synchronizing pseudo files is perhaps by example.

Say we have a file named address that is 18 bytes. An ls or stat of the file reports 4096 bytes.


rsync
  1. Opens file descriptor, fd.
  2. Uses fstat(fd) to get information such as size.
  3. Set out to read size bytes, i.e. 4096. That would be line 253 of the code linked by @mattdm . read_size == 4096
    1. Ask; read: 4096 bytes.
    2. A short string is read i.e. 18 bytes. nread == 18
    3. read_size = read_size - nread (4096 - 18 = 4078)
    4. Ask; read: 4078 bytes
    5. 0 bytes read (as first read consumed all bytes in file).
    6. nread == 0 , line 255
    7. Unable to read 4096 bytes. Zero out buffer.
    8. Set error ENODATA .
    9. Return.
  4. Report error.
  5. Retry. (Above loop).
  6. Fail.
  7. Report error.
  8. FINE.

During this process it actually reads the entire file. But with no size available it cannot validate the result – thus failure is only option.

cp
  1. Opens file descriptor, fd.
  2. Uses fstat(fd) to get information such as st_size (also uses lstat and stat).
  3. Check if file is likely to be sparse. That is the file has holes etc.
    copy.c:1010
    /* Use a heuristic to determine whether SRC_NAME contains any sparse
     * blocks.  If the file has fewer blocks than would normally be
     * needed for a file of its size, then at least one of the blocks in
     * the file is a hole.  */
    sparse_src = is_probably_sparse (&src_open_sb);
    

    As stat reports file to have zero blocks it is categorized as sparse.

  4. Tries to read file by extent-copy (a more efficient way to copy normal sparse files), and fails.
  5. Copy by sparse-copy.
    1. Starts out with max read size of MAXINT.
      Typically 18446744073709551615 bytes on a 32 bit system.
    2. Ask; read 4096 bytes. (Buffer size allocated in memory from stat information.)
    3. A short string is read i.e. 18 bytes.
    4. Check if a hole is needed, nope.
    5. Write buffer to target.
    6. Subtract 18 from max read size.
    7. Ask; read 4096 bytes.
    8. 0 bytes as all got consumed in first read.
    9. Return success.
  6. All OK. Update flags for file.
  7. FINE.

,

Might be related, but extended attribute calls will fail on sysfs:

[root@hypervisor eth0]# lsattr address

lsattr: Inappropriate ioctl for device While reading flags on address

[root@hypervisor eth0]#

Looking at my strace it looks like rsync tries to pull in extended attributes by default:

22964 <... getxattr resumed> , 0x7fff42845110, 132) = -1 ENODATA (No data available)

I tried finding a flag to give rsync to see if skipping extended attributes resolves the issue but wasn't able to find anything ( --xattrs turns them on at the destination).

[Aug 28, 2017] Rsync doesn't copy everyting s

Aug 28, 2017 | ubuntuforums.org

View Full Version : [ubuntu] Rsync doesn't copy everyting



Scormen May 31st, 2009, 10:09 AM Hi all,

I'm having some trouble with rsync. I'm trying to sync my local /etc directory to a remote server, but this won't work.

The problem is that it seems he doesn't copy all the files.
The local /etc dir contains 15MB of data, after a rsync, the remote backup contains only 4.6MB of data.

Rsync is running by root. I'm using this command:

rsync --rsync-path="sudo rsync" -e "ssh -i /root/.ssh/backup" -avz --delete --delete-excluded -h --stats /etc [email protected]:/home/kris/backup/laptopkris

I hope someone can help.
Thanks!

Kris


Scormen May 31st, 2009, 11:05 AM I found that if I do a local sync, everything goes fine.
But if I do a remote sync, it copies only 4.6MB.

Any idea?


LoneWolfJack May 31st, 2009, 05:14 PM never used rsync on a remote machine, but "sudo rsync" looks wrong. you probably can't call sudo like that so the ssh connection needs to have the proper privileges for executing rsync.

just an educated guess, though.


Scormen May 31st, 2009, 05:24 PM Thanks for your answer.

In /etc/sudoers I have added next line, so "sudo rsync" will work.

kris ALL=NOPASSWD: /usr/bin/rsync

I also tried without --rsync-path="sudo rsync", but without success.

I have also tried on the server to pull the files from the laptop, but that doesn't work either.


LoneWolfJack May 31st, 2009, 05:30 PM in the rsync help file it says that --rsync-path is for the path to rsync on the remote machine, so my guess is that you can't use sudo there as it will be interpreted as a path.

so you will have to do --rsync-path="/path/to/rsync" and make sure the ssh login has root privileges if you need them to access the files you want to sync.

--rsync-path="sudo rsync" probably fails because
a) sudo is interpreted as a path
b) the space isn't escaped
c) sudo probably won't allow itself to be called remotely

again, this is not more than an educated guess.


Scormen May 31st, 2009, 05:45 PM I understand what you mean, so I tried also:

rsync -Cavuhzb --rsync-path="/usr/bin/rsync" -e "ssh -i /root/.ssh/backup" /etc [email protected]:/home/kris/backup/laptopkris

Then I get this error:

sending incremental file list
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/chatscripts/pap": Permission denied (13)
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/chatscripts/provider": Permission denied (13)
rsync: symlink "/home/kris/backup/laptopkris/etc/cups/ssl/server.crt" -> "/etc/ssl/certs/ssl-cert-snakeoil.pem" failed: Permission denied (13)
rsync: symlink "/home/kris/backup/laptopkris/etc/cups/ssl/server.key" -> "/etc/ssl/private/ssl-cert-snakeoil.key" failed: Permission denied (13)
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/ppp/peers/provider": Permission denied (13)
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/ssl/private/ssl-cert-snakeoil.key": Permission denied (13)

sent 86.85K bytes received 306 bytes 174.31K bytes/sec
total size is 8.71M speedup is 99.97
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1058) [sender=3.0.5]

And the same command with "root" instead of "kris".
Then, I get no errors, but I still don't have all the files synced.


Scormen June 1st, 2009, 09:00 AM Sorry for this bump.
I'm still having the same problem.

Any idea?

Thanks.


binary10 June 1st, 2009, 10:36 AM I understand what you mean, so I tried also:

rsync -Cavuhzb --rsync-path="/usr/bin/rsync" -e "ssh -i /root/.ssh/backup" /etc [email protected]:/home/kris/backup/laptopkris

Then I get this error:

And the same command with "root" instead of "kris".
Then, I get no errors, but I still don't have all the files synced.

Maybe there's a nicer way but you could place /usr/bin/rsync into a private protected area and set the owner to root place the sticky bit on it and change your rsync-path argument such like:

# on the remote side, aka [email protected]
mkdir priv-area
# protect it from normal users running a priv version of rsync
chmod 700 priv-area
cd priv-area
cp -p /usr/local/bin/rsync ./rsync-priv
sudo chown 0:0 ./rsync-priv
sudo chmod +s ./rsync-priv
ls -ltra # rsync-priv should now be 'bold-red' in bash

Looking at your flags, you've specified a cvs ignore factor, ignore files that are updated on the target, and you're specifying a backup of removed files.

rsync -Cavuhzb --rsync-path="/home/kris/priv-area/rsync-priv" -e "ssh -i /root/.ssh/backup" /etc [email protected]:/home/kris/backup/laptopkris

From those qualifiers you're not going to be getting everything sync'd. It's doing what you're telling it to do.

If you really wanted to perform a like for like backup.. (not keeping stuff that's been changed/deleted from the source. I'd go for something like the following.

rsync --archive --delete --hard-links --one-file-system --acls --xattrs --dry-run -i --rsync-path="/home/kris/priv-area/rsync-priv" --rsh="ssh -i /root/.ssh/backup" /etc/ [email protected]:/home/kris/backup/laptopkris/etc/

Remove the --dry-run and -i when you're happy with the output, and it should do what you want. A word of warning, I get a bit nervous when not seeing trailing (/) on directories as it could lead to all sorts of funnies if you end up using rsync on softlinks.


Scormen June 1st, 2009, 12:19 PM Thanks for your help, binary10.

I've tried what you have said, but still, I only receive 4.6MB on the remote server.
Thanks for the warning, I'll not that!

Did someone already tried to rsync their own /etc to a remote system? Just to know if this strange thing only happens to me...

Thanks.


binary10 June 1st, 2009, 01:22 PM Thanks for your help, binary10.

I've tried what you have said, but still, I only receive 4.6MB on the remote server.
Thanks for the warning, I'll not that!

Did someone already tried to rsync their own /etc to a remote system? Just to know if this strange thing only happens to me...

Thanks.

Ok so I've gone back and looked at your original post, how are you calculating 15MB of data under etc - via a du -hsx /etc/ ??

I do daily drive to drive backup copies via rsync and drive to network copies.. and have used them recently for restoring.

Sure my du -hsx /etc/ reports 17MB of data of which 10MB gets transferred via an rsync. My backup drives still operate.

rsync 3.0.6 has some fixes to do with ACLs and special devices rsyncing between solaris. but I think 3.0.5 is still ok with ubuntu to ubuntu systems.

Here is my test doing exactly what you you're probably trying to do. I even check the remote end..

binary10@jsecx25:~/bin-priv$ ./rsync --archive --delete --hard-links --one-file-system --stats --acls --xattrs --human-readable --rsync-path="~/bin/rsync-priv-os-specific" --rsh="ssh" /etc/ [email protected]:/home/kris/backup/laptopkris/etc/

Number of files: 3121
Number of files transferred: 1812
Total file size: 10.04M bytes
Total transferred file size: 10.00M bytes
Literal data: 10.00M bytes
Matched data: 0 bytes
File list size: 109.26K
File list generation time: 0.002 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10.20M
Total bytes received: 38.70K

sent 10.20M bytes received 38.70K bytes 4.09M bytes/sec
total size is 10.04M speedup is 0.98

binary10@jsecx25:~/bin-priv$ sudo du -hsx /etc/
17M /etc/
binary10@jsecx25:~/bin-priv$

And then on the remote system I do the du -hsx

binary10@lenovo-n200:/home/kris/backup/laptopkris/etc$ cd ..
binary10@lenovo-n200:/home/kris/backup/laptopkris$ sudo du -hsx etc
17M etc
binary10@lenovo-n200:/home/kris/backup/laptopkris$


Scormen June 1st, 2009, 01:35 PM ow are you calculating 15MB of data under etc - via a du -hsx /etc/ ??
Indeed, on my laptop I see:

root@laptopkris:/home/kris# du -sh /etc/
15M /etc/

If I do the same thing after a fresh sync to the server, I see:

root@server:/home/kris# du -sh /home/kris/backup/laptopkris/etc/
4.6M /home/kris/backup/laptopkris/etc/

On both sides, I have installed Ubuntu 9.04, with version 3.0.5 of rsync.
So strange...


binary10 June 1st, 2009, 01:45 PM it does seem a bit odd.

I'd start doing a few diffs from the outputs find etc/ -printf "%f %s %p %Y\n" | sort

And see what type of files are missing.

- edit - Added the %Y file type.


Scormen June 1st, 2009, 01:58 PM Hmm, it's going stranger.
Now I see that I have all my files on the server, but they don't have their full size (bytes).

I have uploaded the files, so you can look into them.

Laptop: http://www.linuxontdekt.be/files/laptop.files
Server: http://www.linuxontdekt.be/files/server.files


binary10 June 1st, 2009, 02:16 PM If you look at the files that are different aka the ssl's they are links to local files else where aka linked to /usr and not within /etc/

aka they are different on your laptop and the server


Scormen June 1st, 2009, 02:25 PM I understand that soft links are just copied, and not the "full file".

But, you have run the same command to test, a few posts ago.
How is it possible that you can see the full 15MB?


binary10 June 1st, 2009, 02:34 PM I was starting to think that this was a bug with du.

The de-referencing is a bit topsy.

If you rsync copy the remote backup back to a new location back onto the laptop and do the du command. I wonder if you'll end up with 15MB again.


Scormen June 1st, 2009, 03:20 PM Good tip.

On the server side, the backup of the /etc was still 4.6MB.
I have rsynced it back to the laptop, to a new directory.

If I go on the laptop to that new directory and do a du, it says 15MB.


binary10 June 1st, 2009, 03:34 PM Good tip.

On the server side, the backup of the /etc was still 4.6MB.
I have rsynced it back to the laptop, to a new directory.

If I go on the laptop to that new directory and do a du, it says 15MB.

I think you've now confirmed that RSYNC DOES copy everything.. just tht du confusing what you had expected by counting the end link sizes.

It might also think about what you're copying, maybe you need more than just /etc of course it depends on what you are trying to do with the backup :)

enjoy.


Scormen June 1st, 2009, 03:37 PM Yeah, it seems to work well.
So, the "problem" where just the soft links, that couldn't be counted on the server side?
binary10 June 1st, 2009, 04:23 PM Yeah, it seems to work well.
So, the "problem" where just the soft links, that couldn't be counted on the server side?

The links were copied as links as per the design of the --archive in rsync.

The contents of the pointing links were different between your two systems. These being that that reside outside of /etc/ in /usr And so DU reporting them differently.


Scormen June 1st, 2009, 05:36 PM Okay, I got it.
Many thanks for the support, binarty10!
Scormen June 1st, 2009, 05:59 PM Just to know, is it possible to copy the data from these links as real, hard data?
Thanks.
binary10 June 2nd, 2009, 09:54 AM Just to know, is it possible to copy the data from these links as real, hard data?
Thanks.

Yep absolutely

You should then look at other possibilities of:

-L, --copy-links transform symlink into referent file/dir
--copy-unsafe-links only "unsafe" symlinks are transformed
--safe-links ignore symlinks that point outside the source tree
-k, --copy-dirlinks transform symlink to a dir into referent dir
-K, --keep-dirlinks treat symlinked dir on receiver as dir

but then you'll have to start questioning why you are backing them up like that especially stuff under /etc/. If you ever wanted to restore it you'd be restoring full files and not symlinks the restore result could be a nightmare as well as create future issues (upgrades etc) let alone your backup will be significantly larger, could be 150MB instead of 4MB.


Scormen June 2nd, 2009, 10:04 AM Okay, now I'm sure what its doing :)
Is it also possible to show on a system the "real disk usage" of e.g. that /etc directory? So, without the links, that we get a output of 4.6MB.

Thank you very much for your help!


binary10 June 2nd, 2009, 10:22 AM What does the following respond with.

sudo du --apparent-size -hsx /etc

If you want the real answer then your result from a dry-run rsync will only be enough for you.

sudo rsync --dry-run --stats -h --archive /etc/ /tmp/etc/

[Jul 20, 2017] The ULTIMATE Horrors story with recovery!

Notable quotes:
"... Have you ever left your terminal logged in, only to find when you came back to it that a (supposed) friend had typed "rm -rf ~/*" and was hovering over the keyboard with threats along the lines of "lend me a fiver 'til Thursday, or I hit return"? Undoubtedly the person in question would not have had the nerve to inflict such a trauma upon you, and was doing it in jest. So you've probably never experienced the worst of such disasters.... ..."
"... I can't remember what happened in the succeeding minutes; my memory is just a blur. ..."
"... (We take dumps of the user files every Thursday; by Murphy's Law this had to happen on a Wednesday). ..."
"... By yet another miracle of good fortune, the terminal from which the damage had been done was still su'd to root (su is in /bin, remember?), so at least we stood a chance of all this working. ..."
Nov 08, 2002 | www.linuxjournal.com

Anonymous on Fri, 11/08/2002 - 03:00.

Its here .. Unbeliveable..

[I had intended to leave the discussion of "rm -r *" behind after the compendium I sent earlier, but I couldn't resist this one.

I also received a response from rutgers!seismo!hadron!jsdy (Joseph S. D. Yao) that described building a list of "dangerous" commands into a shell and dropping into a query when a glob turns up. They built it in so it couldn't be removed, like an alias. Anyway, on to the story! RWH.] I didn't see the message that opened up the discussion on rm, but thought you might like to read this sorry tale about the perils of rm....

(It was posted to net.unix some time ago, but I think our postnews didn't send it as far as it should have!)

----------------------------------------------------------------

Have you ever left your terminal logged in, only to find when you came back to it that a (supposed) friend had typed "rm -rf ~/*" and was hovering over the keyboard with threats along the lines of "lend me a fiver 'til Thursday, or I hit return"? Undoubtedly the person in question would not have had the nerve to inflict such a trauma upon you, and was doing it in jest. So you've probably never experienced the worst of such disasters....

It was a quiet Wednesday afternoon. Wednesday, 1st October, 15:15 BST, to be precise, when Peter, an office-mate of mine, leaned away from his terminal and said to me, "Mario, I'm having a little trouble sending mail." Knowing that msg was capable of confusing even the most capable of people, I sauntered over to his terminal to see what was wrong. A strange error message of the form (I forget the exact details) "cannot access /foo/bar for userid 147" had been issued by msg.

My first thought was "Who's userid 147?; the sender of the message, the destination, or what?" So I leant over to another terminal, already logged in, and typed

grep 147 /etc/passwd

only to receive the response

/etc/passwd: No such file or directory.

Instantly, I guessed that something was amiss. This was confirmed when in response to

ls /etc

I got

ls: not found.

I suggested to Peter that it would be a good idea not to try anything for a while, and went off to find our system manager. When I arrived at his office, his door was ajar, and within ten seconds I realised what the problem was. James, our manager, was sat down, head in hands, hands between knees, as one whose world has just come to an end. Our newly-appointed system programmer, Neil, was beside him, gazing listlessly at the screen of his terminal. And at the top of the screen I spied the following lines:

# cd 
# rm -rf * 

Oh, *****, I thought. That would just about explain it.

I can't remember what happened in the succeeding minutes; my memory is just a blur. I do remember trying ls (again), ps, who and maybe a few other commands beside, all to no avail. The next thing I remember was being at my terminal again (a multi-window graphics terminal), and typing

cd / 
echo * 

I owe a debt of thanks to David Korn for making echo a built-in of his shell; needless to say, /bin, together with /bin/echo, had been deleted. What transpired in the next few minutes was that /dev, /etc and /lib had also gone in their entirety; fortunately Neil had interrupted rm while it was somewhere down below /news, and /tmp, /usr and /users were all untouched.

Meanwhile James had made for our tape cupboard and had retrieved what claimed to be a dump tape of the root filesystem, taken four weeks earlier. The pressing question was, "How do we recover the contents of the tape?". Not only had we lost /etc/restore, but all of the device entries for the tape deck had vanished. And where does mknod live?

You guessed it, /etc.

How about recovery across Ethernet of any of this from another VAX? Well, /bin/tar had gone, and thoughtfully the Berkeley people had put rcp in /bin in the 4.3 distribution. What's more, none of the Ether stuff wanted to know without /etc/hosts at least. We found a version of cpio in /usr/local, but that was unlikely to do us any good without a tape deck.

Alternatively, we could get the boot tape out and rebuild the root filesystem, but neither James nor Neil had done that before, and we weren't sure that the first thing to happen would be that the whole disk would be re-formatted, losing all our user files. (We take dumps of the user files every Thursday; by Murphy's Law this had to happen on a Wednesday).

Another solution might be to borrow a disk from another VAX, boot off that, and tidy up later, but that would have entailed calling the DEC engineer out, at the very least. We had a number of users in the final throes of writing up PhD theses and the loss of a maybe a weeks' work (not to mention the machine down time) was unthinkable.

So, what to do? The next idea was to write a program to make a device descriptor for the tape deck, but we all know where cc, as and ld live. Or maybe make skeletal entries for /etc/passwd, /etc/hosts and so on, so that /usr/bin/ftp would work. By sheer luck, I had a gnuemacs still running in one of my windows, which we could use to create passwd, etc., but the first step was to create a directory to put them in.

Of course /bin/mkdir had gone, and so had /bin/mv, so we couldn't rename /tmp to /etc. However, this looked like a reasonable line of attack.

By now we had been joined by Alasdair, our resident UNIX guru, and as luck would have it, someone who knows VAX assembler. So our plan became this: write a program in assembler which would either rename /tmp to /etc, or make /etc, assemble it on another VAX, uuencode it, type in the uuencoded file using my gnu, uudecode it (some bright spark had thought to put uudecode in /usr/bin), run it, and hey presto, it would all be plain sailing from there. By yet another miracle of good fortune, the terminal from which the damage had been done was still su'd to root (su is in /bin, remember?), so at least we stood a chance of all this working.

Off we set on our merry way, and within only an hour we had managed to concoct the dozen or so lines of assembler to create /etc. The stripped binary was only 76 bytes long, so we converted it to hex (slightly more readable than the output of uuencode), and typed it in using my editor. If any of you ever have the same problem, here's the hex for future reference:

070100002c000000000000000000000000000000000000000000000000000000 
0000dd8fff010000dd8f27000000fb02ef07000000fb01ef070000000000bc8f 
8800040000bc012f65746300 

I had a handy program around (doesn't everybody?) for converting ASCII hex to binary, and the output of /usr/bin/sum tallied with our original binary. But hang on---how do you set execute permission without /bin/chmod? A few seconds thought (which as usual, lasted a couple of minutes) suggested that we write the binary on top of an already existing binary, owned by me...problem solved.

So along we trotted to the terminal with the root login, carefully remembered to set the umask to 0 (so that I could create files in it using my gnu), and ran the binary. So now we had a /etc, writable by all.

From there it was but a few easy steps to creating passwd, hosts, services, protocols, (etc), and then ftp was willing to play ball. Then we recovered the contents of /bin across the ether (it's amazing how much you come to miss ls after just a few, short hours), and selected files from /etc. The key file was /etc/rrestore, with which we recovered /dev from the dump tape, and the rest is history.

Now, you're asking yourself (as I am), what's the moral of this story? Well, for one thing, you must always remember the immortal words, DON'T PANIC. Our initial reaction was to reboot the machine and try everything as single user, but it's unlikely it would have come up without /etc/init and /bin/sh. Rational thought saved us from this one.

The next thing to remember is that UNIX tools really can be put to unusual purposes. Even without my gnuemacs, we could have survived by using, say, /usr/bin/grep as a substitute for /bin/cat. And the final thing is, it's amazing how much of the system you can delete without it falling apart completely. Apart from the fact that nobody could login (/bin/login?), and most of the useful commands had gone, everything else seemed normal. Of course, some things can't stand life without say /etc/termcap, or /dev/kmem, or /etc/utmp, but by and large it all hangs together.

I shall leave you with this question: if you were placed in the same situation, and had the presence of mind that always comes with hindsight, could you have got out of it in a simpler or easier way?

Answers on a postage stamp to:

Mario Wolczko

------------------------------------------------------------------------

Dept. of Computer Science ARPA: miw%[email protected]

The University USENET: mcvax!ukc!man.cs.ux!miw

Manchester M13 9PL JANET: [email protected]

U.K. 061-273 7121 x 5699

[Jul 20, 2017] These Guys Didn't Back Up Their Files, Now Look What Happened

Notable quotes:
"... Unfortunately, even today, people have not learned that lesson. Whether it's at work, at home, or talking with friends, I keep hearing stories of people losing hundreds to thousands of files, sometimes they lose data worth actual dollars in time and resources that were used to develop the information. ..."
"... "I lost all my files from my hard drive? help please? I did a project that took me 3 days and now i lost it, its powerpoint presentation, where can i look for it? its not there where i save it, thank you" ..."
"... Please someone help me I last week brought a Toshiba Satellite laptop running windows 7, to replace my blue screening Dell vista laptop. On plugged in my sumo external hard drive to copy over some much treasured photos and some of my (work – music/writing.) it said installing driver. it said completed I clicked on the hard drive and found a copy of my documents from the new laptop and nothing else. ..."
Jul 20, 2017 | www.makeuseof.com
Back in college, I used to work just about every day as a computer cluster consultant. I remember a month after getting promoted to a supervisor, I was in the process of training a new consultant in the library computer cluster. Suddenly, someone tapped me on the shoulder, and when I turned around I was confronted with a frantic graduate student – a 30-something year old man who I believe was Eastern European based on his accent – who was nearly in tears.

"Please need help – my document is all gone and disk stuck!" he said as he frantically pointed to his PC.

Now, right off the bat I could have told you three facts about the guy. One glance at the blue screen of the archaic DOS-based version of Wordperfect told me that – like most of the other graduate students at the time – he had not yet decided to upgrade to the newer, point-and-click style word processing software. For some reason, graduate students had become so accustomed to all of the keyboard hot-keys associated with typing in a DOS-like environment that they all refused to evolve into point-and-click users.

The second fact, gathered from a quick glance at his blank document screen and the sweat on his brow told me that he had not saved his document as he worked. The last fact, based on his thick accent, was that communicating the gravity of his situation wouldn't be easy. In fact, it was made even worse by his answer to my question when I asked him when he last saved.

"I wrote 30 pages."

Calculated out at about 600 words a page, that's 18000 words. Ouch.

Then he pointed at the disk drive. The floppy disk was stuck, and from the marks on the drive he had clearly tried to get it out with something like a paper clip. By the time I had carefully fished the torn and destroyed disk out of the drive, it was clear he'd never recover anything off of it. I asked him what was on it.

"My thesis."

I gulped. I asked him if he was serious. He was. I asked him if he'd made any backups. He hadn't.

Making Backups of Backups

If there is anything I learned during those early years of working with computers (and the people that use them), it was how critical it is to not only save important stuff, but also to save it in different places. I would back up floppy drives to those cool new zip drives as well as the local PC hard drive. Never, ever had a single copy of anything.

Unfortunately, even today, people have not learned that lesson. Whether it's at work, at home, or talking with friends, I keep hearing stories of people losing hundreds to thousands of files, sometimes they lose data worth actual dollars in time and resources that were used to develop the information.

To drive that lesson home, I wanted to share a collection of stories that I found around the Internet about some recent cases were people suffered that horrible fate – from thousands of files to entire drives worth of data completely lost. These are people where the only remaining option is to start running recovery software and praying, or in other cases paying thousands of dollars to a data recovery firm and hoping there's something to find.

Not Backing Up Projects

The first example comes from Yahoo Answers , where a user that only provided a "?" for a user name (out of embarrassment probably), posted:

"I lost all my files from my hard drive? help please? I did a project that took me 3 days and now i lost it, its powerpoint presentation, where can i look for it? its not there where i save it, thank you"

The folks answering immediately dove into suggesting that the person run recovery software, and one person suggested that the person run a search on the computer for *.ppt.

... ... ...

Doing Backups Wrong

Then, there's a scenario of actually trying to do a backup and doing it wrong, losing all of the files on the original drive. That was the case for the person who posted on Tech Support Forum , that after purchasing a brand new Toshiba Laptop and attempting to transfer old files from an external hard drive, inadvertently wiped the files on the hard drive.

Please someone help me I last week brought a Toshiba Satellite laptop running windows 7, to replace my blue screening Dell vista laptop. On plugged in my sumo external hard drive to copy over some much treasured photos and some of my (work – music/writing.) it said installing driver. it said completed I clicked on the hard drive and found a copy of my documents from the new laptop and nothing else.

While the description of the problem is a little broken, from the sound of it, the person thought they were backing up from one direction, while they were actually backing up in the other direction. At least in this case not all of the original files were deleted, but a majority were.

[Jul 20, 2017] How Toy Story 2 Almost Got Deleted... Except That One Person Made A Home Backup

Notable quotes:
"... as a general observation, large organizations/corporations tend to opt for incredibly expensive, incredibly complex, incredibly overblown backup "solutions" sold to them by vendors rather than using the stock, well-tested, reliable tools that they already have. ..."
"... in over 30 years of working in the field, the second-worst product I have ever had the misfortune to deal with is Legato (now EMC) NetWorker. ..."
"... Panic can lead to further problems ..."
May 01, 2018 | Techdirt

Here's a random story, found via Kottke , highlighting how Pixar came very close to losing a very large portion of Toy Story 2 , because someone did an rm * (non geek: "remove all" command). And that's when they realized that their backups hadn't been working for a month. Then, the technical director of the film noted that, because she wanted to see her family and kids, she had been making copies of the entire film and transferring it to her home computer. After a careful trip from the Pixar offices to her home and back, they discovered that, indeed, most of the film was saved:

http://www.youtube.com/embed/EL_g0tyaIeE?rel=0

Now, mostly, this is just an amusing little anecdote, but two things struck me:

How in the world do they not have more "official" backups of something as major as Toy Story 2 . In the clip they admit that it was potentially 20 to 30 man-years of work that may have been lost. It makes no sense to me that this would include a single backup system. I wonder if the copy, made by technical director Galyn Susman, was outside of corporate policy. You would have to imagine that at a place like Pixar, there were significant concerns about things "getting out," and so the policy likely wouldn't have looked all that kindly on copies being used on home computers.

The Mythbusters folks wonder if this story was a little over-dramatized , and others have wondered how the technical director would have "multiple terabytes of source material" on her home computer back in 1999. That resulted in an explanation from someone who was there that what was deleted was actually the database containing the master copies of the characters, sets, animation, etc. rather than the movie itself. Of course, once again, that makes you wonder how it is that no one else had a simple backup. You'd think such a thing would be backed up in dozens of places around the globe for safe keeping...
Hans B PUFAL ( profile ), 18 May 2012 @ 5:53am
Reminds me of .... Some decades ago I was called to a customer site, a bank, to diagnose a computer problem. On my arrival early in the morning I noted a certain panic in the air. On querying my hosts I was told that there had been an "issue" the previous night and that they were trying, unsuccessfully, to recover data from backup tapes. The process was failing and panic ensued.

Though this was not the problem I had been called on to investigate, I asked some probing questions, made a short phone call, and provided the answer, much to the customer's relief.

What I found was that for months if not years the customer had been performing backups of indexed sequential files, that is data files with associated index files, without once verifying that the backed-up data could be recovered. On the first occasion of a problem requiring such a recovery they discovered that they just did not work.

The answer? Simply recreate the index files from the data. For efficiency reasons (this was a LONG time ago) the index files referenced the data files by physical disk addresses. When the backup tapes were restored the data was of course no longer at the original place on the disk and the index files were useless. A simple procedure to recreate the index files solved the problem.

Clearly whoever had designed that system had never tested a recovery, nor read the documentation which clearly stated the issue and its simple solution.

So here is a case of making backups, but then finding them flawed when needed.

Anonymous Coward , 18 May 2012 @ 6:00am
Re: Reminds me of .... That's why, in the IT world, you ALWAYS do a "dry run" when you want to deploy something, and you monitor the heck out of critical systems.
Rich Kulawiec , 18 May 2012 @ 6:30am
Two notes on backups

1. Everyone who has worked in computing for any period of time has their own backup horror story. I'll spare you mine, but note that as a general observation, large organizations/corporations tend to opt for incredibly expensive, incredibly complex, incredibly overblown backup "solutions" sold to them by vendors rather than using the stock, well-tested, reliable tools that they already have. (e.g., "why should we use dump, which is open-source/reliable/portable/tested/proven/efficient/etc., when we could drop $40K on closed-source/proprietary/non-portable/slow/bulky software from a vendor?"

Okay, okay, one comment: in over 30 years of working in the field, the second-worst product I have ever had the misfortune to deal with is Legato (now EMC) NetWorker.

2. Hollywood has a massive backup and archiving problem. How do we know? Because they keep telling us about it. There are a series of self-promoting commercials that they run in theaters before movies, in which they talk about all of the old films that are slowly decaying in their canisters in vast warehouses, and how terrible this is, and how badly they need charitable contributions from the public to save these treasures of cinema before they erode into dust, etc.

Let's skip the irony of Hollywood begging for money while they're paying professional liar Chris Dodd millions and get to the technical point: the easiest and cheapest way to preserve all of these would be to back them up to the Internet. Yes, there's a one-time expense of cleaning up the analog versions and then digitizing them at high resolution, but once that's done, all the copies are free. There's no need for a data center or elaborate IT infrastructure: put 'em on BitTorrent and let the world do the work. Or give copies to the Internet Archive. Whatever -- the point is that once we get past the analog issues, the only reason that this is a problem is that they

made it a problem by refusing to surrender control.
saulgoode ( profile ), 18 May 2012 @ 6:38am
Re: Two notes on backups "Real Men don't make backups. They upload it via ftp and let the world mirror it." - Linus Torvalds
Anonymous Coward , 18 May 2012 @ 7:02am
What I suspect is that she was copying the rendered footage. If the footage was rendered at a resolution and rate fitting to DVD spec, that'd put the raw footage at around 3GB to 4GB for a full 90min, which just might fit on the 10GB HDD that were available back then on a laptop computer (remember how small OSes were back then).

Even losing just the rendered raw footage (or even processed footage), would be a massive setback. It takes a long time across a lot of very powerful computers to render film quality footage. If it was processed footage then it's even more valuable as that takes a lot of man hours of post fx to make raw footage presentable to a consumer audience.

aldestrawk ( profile ), 18 May 2012 @ 8:34am
a retelling by Oren Jacob Oren Jacob, the Pixar director featured in the animation, has made a comment on the Quora post that explains things in much more detail. The narration and animation was telling a story, as in storytelling. Despite the 99% true caption at the end, a lot of details were left out which misrepresented what had happened. Still, it was a fun tale for anyone who had dealt with backup problems. Oren Jacob's retelling in the comment makes it much more realistic and believable.
The terabytes level of data came from whoever posted the video on Quora. The video itself never mentions the actual amount of data lost or the total amount the raw files represent. Oren says, vaguely, that it was much less than a terabyte. There were backups! The last one was from two days previous to the delete event. The backup was flawed in that it produced files that when tested, by rendering, exhibited errors.

They ended up patching a two-month old backup together with the home computer version (two weeks old). This was labor intensive as some 30k files had to be individually checked.

The moral of the story.

Deleting files, under Linux as well as just about any OS, only involves deleting the directory entries. There is software which can recover those files as long as further use of the computer system doesn't end up overwriting what is now free space.

Mason Wheeler , 18 May 2012 @ 10:01am
Re: a retelling by Oren Jacob
Panic can lead to further problems. They could well have introduced corruption in files by abruptly unplugging the computer.

What's worse? Corrupting some files or deleting all files?

aldestrawk ( profile ), 18 May 2012 @ 10:38am
Re: Re: a retelling by Oren Jacob

In this case they were not dealing with unknown malware that was steadily erasing the system as they watched. There was, apparently, a delete event at a single point in time that had repercussions that made things disappear while people worked on the movie.

I'll bet things disappeared when whatever editing was being done required a file to be refreshed.

A refresh operation would make the related object disappear when the underlying file was no longer available.

Apart from the set of files that had already been deleted, more files could have been corrupted when the computer was unplugged.

Having said that, this occurred in 1999 when they were probably using the Ext2 filesystem under Linux. These days most everyone uses a filesystem that includes journaling which protects against corruption that may occur when a computer loses power. Ext3 is a journaling filesystem and was introduced in 2001.

In 1998 I had to rebuild my entire home computer system. A power glitch introduced corruption in a Windows 95 system file and use of a Norton recovery tool rendered the entire disk into a handful of unusable files. It took me ten hours to rebuild the OS and re-install all the added hardware, software, and copy personal files from backup floppies. The next day I went out and bought a UPS. Nowadays, sometimes the UPS for one of my computers will fail during one of the three dozen power outages a year I get here. I no longer have problems with that because of journaling.

Danny ( profile ), 18 May 2012 @ 10:49am
I've gotta story like this too Ive posted in athe past on Techdirt that I used to work for Ticketmaster. The is an interesting TM story that I don't think ever made it into the public, so I will do it now.

Back in the 1980s each TM city was on an independent computer system (PDP unibus systems with RM05 or CDC9766 disk drives. The drives were fixed removable boxes about the size of a washing machine, the removable disk platters about the size of the proverbial breadbox. Each platter held 256mb formatted.

Each city had itts own operations policies, but generally, the systems ran with mirrored drives, the database was backed up every night, archival copies were made monthly. In Chicago, where I worked, we did not have offsite backup in the 1980s. The Bay Area had the most interesting system for offsite backup.

The Bay Area BASS operation, bought by TM in the mid 1980s, had a deal with a taxi driver. They would make their nightly backup copies in house, and make an extra copy on a spare disk platter. Tis cabbie would come by the office about 2am each morning, and they'd put the spare disk platter in his trunk, swapping it for the previous day's copy that had been his truck for 24 hours. So, for the cost of about two platters ($700 at the time) and whatever cash they'd pay the cabbie, they had a mobile offsite copy of their database circulating the Bay Area at all times.

When the World Series earthquake hit in October 1988, the TM office in downtown Oakland was badly damaged. The only copy of the database that survived was the copy in the taxi cab.

That incident led TM corporate to establish much more sophisticated and redundant data redundancy policies.

aldestrawk ( profile ), 18 May 2012 @ 11:30am
Re: I've gotta story like this too I like that story. Not that it matters anymore, but taxi cab storage was probably a bad idea. The disks were undoubtedly the "Winchester" type and when powered down the head would be parked on a "landing strip". Still, subjecting these drives to jolts from a taxi riding over bumps in the road could damage the head or cause it to be misaligned. You would have known though it that actually turned out to be a problem. Also, I wouldn't trust a taxi driver with the company database. Although, that is probably due to an unreasonable bias towards cab drivers. I won't mention the numerous arguments with them (not in the U.S.) over fares and the one physical fight with a driver who nearly ran me down while I was walking.
Huw Davies , 19 May 2012 @ 1:20am
Re: Re: I've gotta story like this too RM05s are removable pack drives. The heads stay in the washing machine size unit - all you remove are the platters.
That One Guy ( profile ), 18 May 2012 @ 5:00pm
What I want to know is this... She copied bits of a movie to her home system... how hard did they have to pull in the leashes to keep Disney's lawyers from suing her to infinity and beyond after she admitted she'd done so(never mind the fact that he doing so saved them apparently years of work...)?
Lance , 3 May 2014 @ 8:53am

http://thenextweb.com/media/2012/05/21/how-pixars-toy-story-2-was-deleted-twice-once-by-technology-a nd-again-for-its-own-good/

Evidently, the film data only took up 10 GB in those days. Nowhere near TB...

[Jul 20, 2017] Scary Backup Stories by Paul Barry

Good backup is that backup that was checked using actual restore procedure. Anything else is just an approximation of this as devil often is is derails.
Notable quotes:
"... All the tapes were then checked, and they were all ..."
Nov 07, 2002 | Linux Journal

The dangers of not testing your backup procedures and some common pitfalls to avoid.

Backups. We all know the importance of making a backup of our most important systems. Unfortunately, some of us also know that realizing the importance of performing backups often is a lesson learned the hard way. Everyone has their scary backup stories. Here are mine. Scary Story #1

Like a lot of people, my professional career started out in technical support. In my case, I was part of a help-desk team for a large professional practice. Among other things, we were responsible for performing PC LAN backups for a number of systems used by other departments. For one especially important system, we acquired fancy new tape-backup equipment and a large collection of tapes. A procedure was put in place, and before-you-go-home-at-night backups became a standard. Some months later, a crash brought down the system, and all the data was lost. Shortly thereafter, a call came in for the latest backup tape. It was located and dispatched, and a recovery was attempted. The recovery failed, however, as the tape was blank . A call came in for the next-to-last backup tape. Nervously, it was located and dispatched, and a recovery was attempted. It also failed because this tape also was blank. Amid long silences and pink-slip glares, panic started to set in as the tape from three nights prior was called up. This attempt resulted in a lot of shouting.

All the tapes were then checked, and they were all blank. To add insult to injury, the problem wasn't only that the tapes were blank--they weren't even formatted! The fancy new backup equipment wasn't smart enough to realize the tapes were not formatted, so it allowed them to be used. Note: writing good data to an unformatted tape is never a good idea.

Now, don't get me wrong, the backup procedures themselves were good. The problem was that no one had ever tested the whole process--no one had ever attempted a recovery. Was it no small wonder then that each recovery failed?

For backups to work, you need to do two things: (1) define and implement a good procedure and (2) test that it works.

To this day, I can't fathom how my boss (who had overall responsibility for the backup procedures) managed not to get fired over this incident. And what happened there has always stayed with me.

A Good Solution

When it comes to doing backups on Linux systems, a number of standard tools can help avoid the problems discussed above. Marcel Gagné's excellent book (see Resources) contains a simple yet useful script that not only performs the backup but verifies that things went well. Then, after each backup, the script sends an e-mail to root detailing what occurred.

I'll run through the guts of a modified version of Marcel's script here, to show you how easy this process actually is. This bash script starts by defining the location of a log and an error file. Two mv commands then copy the previous log and error files to allow for the examination of the next-to-last backup (if required):

#! /bin/bash
backup_log=/usr/local/.Backups/backup.log
backup_err=/usr/local/.Backups/backup.err
mv $backup_log $backup_log.old
mv $backup_err $backup_err.old

With the log and error files ready, a few echo commands append messages (note the use of >>) to each of the files. The messages include the current date and time (which is accessed using the back-ticked date command). The cd command then changes to the location of the directory to be backed up. In this example, that directory is /mnt/data, but it could be any location:

echo "Starting backup of /mnt/data: `date`." >> $backup_log
echo "Errors reported for backup/verify: `date`." >> $backup_err
cd /mnt/data

The backup then starts, using the tried and true tar command. The -cvf options request the creation of a new archive (c), verbose mode (v) and the name of the file/device to backup to (f). In this example, we backup to /dev/st0, the location of an attached SCSI tape drive:

    tar -cvf /dev/st0 . 2>>$backup_err

Any errors produced by this command are sent to STDERR (standard error). The above command exploits this behaviour by appending anything sent to STDERR to the error file as well (using the 2>> directive).

When the backup completes, the script then rewinds the tape using the mt command, before listing the files on the tape with another tar command (the -t option lists the files in the named archive). This is a simple way of verifying the contents of the tape. As before, we append any errors reported during this tar command to the error file. Additionally, informational messages are added to the log file at appropriate times:

mt -f /dev/st0 rewind
echo "Verifying this backup: `date`" >>$backup_log
tar -tvf /dev/st0 2>>$backup_err
echo "Backup complete: `date`" >>$backup_log

To conclude the script, we concatenate the error file to the log file (with cat ), then e-mail the log file to root (where the -s option to the mail command allows the specification of an appropriate subject line):

    cat $backup_err >> $backup_log
    mail -s "Backup status report for /mnt/data" root < $backup_log

And there you have it, Marcel's deceptively simple solution to performing a verified backup and e-mailing the results to an interested party. If only we'd had something similar all those years ago.

... ... ...

[Jul 18, 2017] Can I copy my Ubuntu OS off my hard drive to a USB stick and boot from that stick with all my programs

get=
user323419
Yes, this is completely possible. First and foremost, you will need at least 2 USB ports available, or 1 USB port and 1 CD-Drive.

You start by booting into a Live-CD version of Ubuntu with your hard-drive where it is and the target device plugged into USB. Mount your internal drive and target USB to any paths you like.

Open up a terminal and enter the following commands:

tar cp --xattrs /path/to/internal | tar x /path/to/target/usb

You can also look into doing this through a live installation and a utility called CloneZilla, but I am unsure of exactly how to use CloneZilla. The above method is what I used to copy my 128GB hard-drive's installation of Ubuntu to a 64GB flash drive.

2) Clone again the internal or external drive in its entirety to another drive:

Use the "Clonezilla" utility, mentioned in the very last paragraph of my original answer, to clone the original internal drive to another external drive to make two such external bootable drives to keep track of. v>

[Feb 20, 2017] Using rsync to back up your Linux system

Feb 20, 2017 | opensource.com
Another interesting option, and my personal favorite because it increases the power and flexibility of rsync immensely, is the --link-dest option. The --link-dest option allows a series of daily backups that take up very little additional space for each day and also take very little time to create.

Specify the previous day's target directory with this option and a new directory for today. rsync then creates today's new directory and a hard link for each file in yesterday's directory is created in today's directory. So we now have a bunch of hard links to yesterday's files in today's directory. No new files have been created or duplicated. Just a bunch of hard links have been created. Wikipedia has a very good description of hard links . After creating the target directory for today with this set of hard links to yesterday's target directory, rsync performs its sync as usual, but when a change is detected in a file, the target hard link is replaced by a copy of the file from yesterday and the changes to the file are then copied from the source to the target.

So now our command looks like the following.

rsync -aH --delete --link-dest=yesterdaystargetdir sourcedir todaystargetdir

There are also times when it is desirable to exclude certain directories or files from being synchronized. For this, there is the --exclude option. Use this option and the pattern for the files or directories you want to exclude. You might want to exclude browser cache files so your new command will look like this.

rsync -aH --delete --exclude Cache --link-dest=yesterdaystargetdir sourcedir todaystargetdir

Note that each file pattern you want to exclude must have a separate exclude option.

rsync can sync files with remote hosts as either the source or the target. For the next example, let's assume that the source directory is on a remote computer with the hostname remote1 and the target directory is on the local host. Even though SSH is the default communications protocol used when transferring data to or from a remote host, I always add the ssh option. The command now looks like this.

rsync -aH -e ssh --delete --exclude Cache --link-dest=yesterdaystargetdir remote1:sourcedir todaystargetdir

This is the final form of my rsync backup command.

rsync has a very large number of options that you can use to customize the synchronization process. For the most part, the relatively simple commands that I have described here are perfect for making backups for my personal needs. Be sure to read the extensive man page for rsync to learn about more of its capabilities as well as the options discussed here.

[Feb 12, 2017] Easy Automated Snapshot-Style Backups with Linux and Rsync

Notable quotes:
"... illusion ..."
"... only one extra, slightly-larger, hard disk ..."
"... hard link ..."
"... what appears to be ..."
"... Putting it all together ..."
"... If you are rsync'ing from a SAMBA share, you must add --modify-window=10 ..."
Feb 12, 2017 | www.mikerubel.org

page last modified 2004.01.04

Updates: As of rsync-2.5.6 , the --link-dest option is now standard! That can be used instead of the separate cp -al and rsync stages, and it eliminates the ownerships/permissions bug. I now recommend using it. Also, I'm proud to report this article is mentioned in Linux Server Hacks , a new (and very good, in my opinion) O'Reilly book by compiled by Rob Flickenger.

Contents
  1. Abstract
  2. Motivation
  3. Using rsync to make a backup
    1. Basics
    2. Using the --delete flag
    3. Be lazy: use cron
  4. Incremental backups with rsync
    1. Review of hard links
    2. Using cp -al
    3. Putting it all together
    4. I'm used to dump or tar ! This seems backward!
  5. Isolating the backup from the rest of the system
    1. The easy (bad) way
    2. Keep it on a separate partition
    3. Keep that partition on a separate disk
    4. Keep that disk on a separate machine
  6. Making the backup as read-only as possible
    1. Bad: mount / unmount
    2. Better: mount read-only most of the time
    3. Tempting but it doesn't seem to work: the 2.4 kernel's mount --bind
    4. My solution: using NFS on localhost
  7. Extensions: hourly, daily, and weekly snapshots
    1. Keep an extra script for each level
    2. Run it all with cron
  8. Known bugs and problems
    1. Maintaining Permissions and Owners in the snapshots
    2. mv updates timestamp bug
    3. Windows-related problems
  9. Appendix: my actual configuration
    1. Listing one: make_snapshot.sh
    2. Listing two: daily_snapshot_rotate.sh
    3. Sample output of ls -l /snapshot/home
  10. Contributed codes
  11. References
  12. Frequently Asked Questions
Abstract

This document describes a method for generating automatic rotating "snapshot"-style backups on a Unix-based system, with specific examples drawn from the author's GNU/Linux experience. Snapshot backups are a feature of some high-end industrial file servers; they create the illusion of multiple, full backups per day without the space or processing overhead. All of the snapshots are read-only, and are accessible directly by users as special system directories. It is often possible to store several hours, days, and even weeks' worth of snapshots with slightly more than 2x storage. This method, while not as space-efficient as some of the proprietary technologies (which, using special copy-on-write filesystems, can operate on slightly more than 1x storage), makes use of only standard file utilities and the common rsync program, which is installed by default on most Linux distributions. Properly configured, the method can also protect against hard disk failure, root compromises, or even back up a network of heterogeneous desktops automatically.

Motivation

Note: what follows is the original sgvlug DEVSIG announcement.

Ever accidentally delete or overwrite a file you were working on? Ever lose data due to hard-disk failure? Or maybe you export shares to your windows-using friends--who proceed to get outlook viruses that twiddle a digit or two in all of their .xls files. Wouldn't it be nice if there were a /snapshot directory that you could go back to, which had complete images of the file system at semi-hourly intervals all day, then daily snapshots back a few days, and maybe a weekly snapshot too? What if every user could just go into that magical directory and copy deleted or overwritten files back into "reality", from the snapshot of choice, without any help from you? And what if that /snapshot directory were read-only, like a CD-ROM, so that nothing could touch it (except maybe root, but even then not directly)?

Best of all, what if you could make all of that happen automatically, using only one extra, slightly-larger, hard disk ? (Or one extra partition, which would protect against all of the above except disk failure).

In my lab, we have a proprietary NetApp file server which provides that sort of functionality to the end-users. It provides a lot of other things too, but it cost as much as a luxury SUV. It's quite appropriate for our heavy-use research lab, but it would be overkill for a home or small-office environment. But that doesn't mean small-time users have to do without!

I'll show you how I configured automatic, rotating snapshots on my $80 used Linux desktop machine (which is also a file, web, and mail server) using only a couple of one-page scripts and a few standard Linux utilities that you probably already have.

I'll also propose a related strategy which employs one (or two, for the wisely paranoid) extra low-end machines for a complete, responsible, automated backup strategy that eliminates tapes and manual labor and makes restoring files as easy as "cp".

Using rsync to make a backup

The rsync utility is a very well-known piece of GPL'd software, written originally by Andrew Tridgell and Paul Mackerras. If you have a common Linux or UNIX variant, then you probably already have it installed; if not, you can download the source code from rsync.samba.org . Rsync's specialty is efficiently synchronizing file trees across a network, but it works fine on a single machine too.

Basics

Suppose you have a directory called source , and you want to back it up into the directory destination . To accomplish that, you'd use:

rsync -a source/ destination/

(Note: I usually also add the -v (verbose) flag too so that rsync tells me what it's doing). This command is equivalent to:

cp -a source/. destination/

except that it's much more efficient if there are only a few differences.

Just to whet your appetite, here's a way to do the same thing as in the example above, but with destination on a remote machine, over a secure shell:

rsync -a -e ssh source/ [email protected]:/path/to/destination/
Trailing Slashes Do Matter...Sometimes

This isn't really an article about rsync , but I would like to take a momentary detour to clarify one potentially confusing detail about its use. You may be accustomed to commands that don't care about trailing slashes. For example, if a and b are two directories, then cp -a a b is equivalent to cp -a a/ b/ . However, rsync does care about the trailing slash, but only on the source argument. For example, let a and b be two directories, with the file foo initially inside directory a . Then this command:

rsync -a a b

produces b/a/foo , whereas this command:

rsync -a a/ b

produces b/foo . The presence or absence of a trailing slash on the destination argument ( b , in this case) has no effect.

Using the --delete flag

If a file was originally in both source/ and destination/ (from an earlier rsync , for example), and you delete it from source/ , you probably want it to be deleted from destination/ on the next rsync . However, the default behavior is to leave the copy at destination/ in place. Assuming you want rsync to delete any file from destination/ that is not in source/ , you'll need to use the --delete flag:

rsync -a --delete source/ destination/
Be lazy: use cron

One of the toughest obstacles to a good backup strategy is human nature; if there's any work involved, there's a good chance backups won't happen. (Witness, for example, how rarely my roommate's home PC was backed up before I created this system). Fortunately, there's a way to harness human laziness: make cron do the work.

To run the rsync-with-backup command from the previous section every morning at 4:20 AM, for example, edit the root cron table: (as root)

crontab -e

Then add the following line:

20 4 * * * rsync -a --delete source/ destination/

Finally, save the file and exit. The backup will happen every morning at precisely 4:20 AM, and root will receive the output by email. Don't copy that example verbatim, though; you should use full path names (such as /usr/bin/rsync and /home/source/ ) to remove any ambiguity.

Incremental backups with rsync

Since making a full copy of a large filesystem can be a time-consuming and expensive process, it is common to make full backups only once a week or once a month, and store only changes on the other days. These are called "incremental" backups, and are supported by the venerable old dump and tar utilities, along with many others.

However, you don't have to use tape as your backup medium; it is both possible and vastly more efficient to perform incremental backups with rsync .

The most common way to do this is by using the rsync -b --backup-dir= combination. I have seen examples of that usage here , but I won't discuss it further, because there is a better way. If you're not familiar with hard links, though, you should first start with the following review.

Review of hard links

We usually think of a file's name as being the file itself, but really the name is a hard link . A given file can have more than one hard link to itself--for example, a directory has at least two hard links: the directory name and . (for when you're inside it). It also has one hard link from each of its sub-directories (the .. file inside each one). If you have the stat utility installed on your machine, you can find out how many hard links a file has (along with a bunch of other information) with the command:

stat filename

Hard links aren't just for directories--you can create more than one link to a regular file too. For example, if you have the file a , you can make a link called b :

ln a b

Now, a and b are two names for the same file, as you can verify by seeing that they reside at the same inode (the inode number will be different on your machine):

ls -i a
  232177 a
ls -i b
  232177 b

So ln a b is roughly equivalent to cp a b , but there are several important differences:

  1. The contents of the file are only stored once, so you don't use twice the space.
  2. If you change a , you're changing b , and vice-versa.
  3. If you change the permissions or ownership of a , you're changing those of b as well, and vice-versa.
  4. If you overwrite a by copying a third file on top of it, you will also overwrite b , unless you tell cp to unlink before overwriting. You do this by running cp with the --remove-destination flag. Notice that rsync always unlinks before overwriting!! . Note, added 2002.Apr.10: the previous statement applies to changes in the file contents only, not permissions or ownership.

But this raises an interesting question. What happens if you rm one of the links? The answer is that rm is a bit of a misnomer; it doesn't really remove a file, it just removes that one link to it. A file's contents aren't truly removed until the number of links to it reaches zero. In a moment, we're going to make use of that fact, but first, here's a word about cp .

Using cp -al

In the previous section, it was mentioned that hard-linking a file is similar to copying it. It should come as no surprise, then, that the standard GNU coreutils cp command comes with a -l flag that causes it to create (hard) links instead of copies (it doesn't hard-link directories, though, which is good; you might want to think about why that is). Another handy switch for the cp command is -a (archive), which causes it to recurse through directories and preserve file owners, timestamps, and access permissions.

Together, the combination cp -al makes what appears to be a full copy of a directory tree, but is really just an illusion that takes almost no space. If we restrict operations on the copy to adding or removing (unlinking) files--i.e., never changing one in place--then the illusion of a full copy is complete. To the end-user, the only differences are that the illusion-copy takes almost no disk space and almost no time to generate.

2002.05.15: Portability tip: If you don't have GNU cp installed (if you're using a different flavor of *nix, for example), you can use find and cpio instead. Simply replace cp -al a b with cd a && find . -print | cpio -dpl ../b . Thanks to Brage Fψrland for that tip.

Putting it all together

We can combine rsync and cp -al to create what appear to be multiple full backups of a filesystem without taking multiple disks' worth of space. Here's how, in a nutshell:

rm -rf backup.3
mv backup.2 backup.3
mv backup.1 backup.2
cp -al backup.0 backup.1
rsync -a --delete source_directory/  backup.0/

If the above commands are run once every day, then backup.0 , backup.1 , backup.2 , and backup.3 will appear to each be a full backup of source_directory/ as it appeared today, yesterday, two days ago, and three days ago, respectively--complete, except that permissions and ownerships in old snapshots will get their most recent values (thanks to J.W. Schultz for pointing this out). In reality, the extra storage will be equal to the current size of source_directory/ plus the total size of the changes over the last three days--exactly the same space that a full plus daily incremental backup with dump or tar would have taken.

Update (2003.04.23): As of rsync-2.5.6 , the --link-dest flag is now standard. Instead of the separate cp -al and rsync lines above, you may now write:

mv backup.0 backup.1
rsync -a --delete --link-dest=../backup.1 source_directory/  backup.0/

This method is preferred, since it preserves original permissions and ownerships in the backup. However, be sure to test it--as of this writing some users are still having trouble getting --link-dest to work properly. Make sure you use version 2.5.7 or later.

Update (2003.05.02): John Pelan writes in to suggest recycling the oldest snapshot instead of recursively removing and then re-creating it. This should make the process go faster, especially if your file tree is very large:

mv backup.3 backup.tmp
mv backup.2 backup.3
mv backup.1 backup.2
mv backup.0 backup.1
mv backup.tmp backup.0
cp -al backup.1/. backup.0
rsync -a --delete source_directory/ backup.0/

2003.06.02: OOPS! Rsync's link-dest option does not play well with J. Pelan's suggestion--the approach I previously had written above will result in unnecessarily large storage, because old files in backup.0 will get replaced and not linked. Please only use Dr. Pelan's directory recycling if you use the separate cp -al step; if you plan to use --link-dest , start with backup.0 empty and pristine. Apologies to anyone I've misled on this issue. Thanks to Kevin Everets for pointing out the discrepancy to me, and to J.W. Schultz for clarifying --link-dest 's behavior. Also note that I haven't fully tested the approach written above; if you have, please let me know. Until then, caveat emptor!

I'm used to dump or tar ! This seems backward!

The dump and tar utilities were originally designed to write to tape media, which can only access files in a certain order. If you're used to their style of incremental backup, rsync might seem backward. I hope that the following example will help make the differences clearer.

Suppose that on a particular system, backups were done on Monday night, Tuesday night, and Wednesday night, and now it's Thursday.

With dump or tar , the Monday backup is the big ("full") one. It contains everything in the filesystem being backed up. The Tuesday and Wednesday "incremental" backups would be much smaller, since they would contain only changes since the previous day. At some point (presumably next Monday), the administrator would plan to make another full dump.

With rsync, in contrast, the Wednesday backup is the big one. Indeed, the "full" backup is always the most recent one. The Tuesday directory would contain data only for those files that changed between Tuesday and Wednesday; the Monday directory would contain data for only those files that changed between Monday and Tuesday.

A little reasoning should convince you that the rsync way is much better for network-based backups, since it's only necessary to do a full backup once, instead of once per week. Thereafter, only the changes need to be copied. Unfortunately, you can't rsync to a tape, and that's probably why the dump and tar incremental backup models are still so popular. But in your author's opinion, these should never be used for network-based backups now that rsync is available.

Isolating the backup from the rest of the system

If you take the simple route and keep your backups in another directory on the same filesystem, then there's a very good chance that whatever damaged your data will also damage your backups. In this section, we identify a few simple ways to decrease your risk by keeping the backup data separate.

The easy (bad) way

In the previous section, we treated /destination/ as if it were just another directory on the same filesystem. Let's call that the easy (bad) approach. It works, but it has several serious limitations:

Fortunately, there are several easy ways to make your backup more robust.

Keep it on a separate partition

If your backup directory is on a separate partition, then any corruption in the main filesystem will not normally affect the backup. If the backup process runs out of disk space, it will fail, but it won't take the rest of the system down too. More importantly, keeping your backups on a separate partition means you can keep them mounted read-only; we'll discuss that in more detail in the next chapter.

Keep that partition on a separate disk

If your backup partition is on a separate hard disk, then you're also protected from hardware failure. That's very important, since hard disks always fail eventually, and often take your data with them. An entire industry has formed to service the needs of those whose broken hard disks contained important data that was not properly backed up.

Important : Notice, however, that in the event of hardware failure you'll still lose any changes made since the last backup. For home or small office users, where backups are made daily or even hourly as described in this document, that's probably fine, but in situations where any data loss at all would be a serious problem (such as where financial transactions are concerned), a RAID system might be more appropriate.

RAID is well-supported under Linux, and the methods described in this document can also be used to create rotating snapshots of a RAID system.

Keep that disk on a separate machine

If you have a spare machine, even a very low-end one, you can turn it into a dedicated backup server. Make it standalone, and keep it in a physically separate place--another room or even another building. Disable every single remote service on the backup server, and connect it only to a dedicated network interface on the source machine.

On the source machine, export the directories that you want to back up via read-only NFS to the dedicated interface. The backup server can mount the exported network directories and run the snapshot routines discussed in this article as if they were local. If you opt for this approach, you'll only be remotely vulnerable if:

  1. a remote root hole is discovered in read-only NFS, and
  2. the source machine has already been compromised.

I'd consider this "pretty good" protection, but if you're (wisely) paranoid, or your job is on the line, build two backup servers. Then you can make sure that at least one of them is always offline.

If you're using a remote backup server and can't get a dedicated line to it (especially if the information has to cross somewhere insecure, like the public internet), you should probably skip the NFS approach and use rsync -e ssh instead.

It has been pointed out to me that rsync operates far more efficiently in server mode than it does over NFS, so if the connection between your source and backup server becomes a bottleneck, you should consider configuring the backup machine as an rsync server instead of using NFS. On the downside, this approach is slightly less transparent to users than NFS--snapshots would not appear to be mounted as a system directory, unless NFS is used in that direction, which is certainly another option (I haven't tried it yet though). Thanks to Martin Pool, a lead developer of rsync , for making me aware of this issue.

Here's another example of the utility of this approach--one that I use. If you have a bunch of windows desktops in a lab or office, an easy way to keep them all backed up is to share the relevant files, read-only, and mount them all from a dedicated backup server using SAMBA. The backup job can treat the SAMBA-mounted shares just like regular local directories.

Making the backup as read-only as possible

In the previous section, we discussed ways to keep your backup data physically separate from the data they're backing up. In this section, we discuss the other side of that coin--preventing user processes from modifying backups once they're made.

We want to avoid leaving the snapshot backup directory mounted read-write in a public place. Unfortunately, keeping it mounted read-only the whole time won't work either--the backup process itself needs write access. The ideal situation would be for the backups to be mounted read-only in a public place, but at the same time, read-write in a private directory accessible only by root, such as /root/snapshot .

There are a number of possible approaches to the challenge presented by mounting the backups read-only. After some amount of thought, I found a solution which allows root to write the backups to the directory but only gives the users read permissions. I'll first explain the other ideas I had and why they were less satisfactory.

It's tempting to keep your backup partition mounted read-only as /snapshot most of the time, but unmount that and remount it read-write as /root/snapshot during the brief periods while snapshots are being made. Don't give in to temptation!.

Bad: mount / umount

A filesystem cannot be unmounted if it's busy--that is, if some process is using it. The offending process need not be owned by root to block an unmount request. So if you plan to umount the read-only copy of the backup and mount it read-write somewhere else, don't--any user can accidentally (or deliberately) prevent the backup from happening. Besides, even if blocking unmounts were not an issue, this approach would introduce brief intervals during which the backups would seem to vanish, which could be confusing to users.

Better: mount read-only most of the time

A better but still-not-quite-satisfactory choice is to remount the directory read-write in place:

mount -o remount,rw /snapshot
[ run backup process ]
mount -o remount,ro /snapshot

Now any process that happens to be in /snapshot when the backups start will not prevent them from happening. Unfortunately, this approach introduces a new problem--there is a brief window of vulnerability, while the backups are being made, during which a user process could write to the backup directory. Moreover, if any process opens a backup file for writing during that window, it will prevent the backup from being remounted read-only, and the backups will stay vulnerable indefinitely.

Tempting but doesn't seem to work: the 2.4 kernel's mount --bind

Starting with the 2.4-series Linux kernels, it has been possible to mount a filesystem simultaneously in two different places. "Aha!" you might think, as I did. "Then surely we can mount the backups read-only in /snapshot , and read-write in /root/snapshot at the same time!"

Alas, no. Say your backups are on the partition /dev/hdb1 . If you run the following commands,

mount /dev/hdb1 /root/snapshot
mount --bind -o ro /root/snapshot /snapshot

then (at least as of the 2.4.9 Linux kernel--updated, still present in the 2.4.20 kernel), mount will report /dev/hdb1 as being mounted read-write in /root/snapshot and read-only in /snapshot , just as you requested. Don't let the system mislead you!

It seems that, at least on my system, read-write vs. read-only is a property of the filesystem, not the mount point. So every time you change the mount status, it will affect the status at every point the filesystem is mounted, even though neither /etc/mtab nor /proc/mounts will indicate the change.

In the example above, the second mount call will cause both of the mounts to become read-only, and the backup process will be unable to run. Scratch this one.

Update: I have it on fairly good authority that this behavior is considered a bug in the Linux kernel, which will be fixed as soon as someone gets around to it. If you are a kernel maintainer and know more about this issue, or are willing to fix it, I'd love to hear from you!

My solution: using NFS on localhost

This is a bit more complicated, but until Linux supports mount --bind with different access permissions in different places, it seems like the best choice. Mount the partition where backups are stored somewhere accessible only by root, such as /root/snapshot . Then export it, read-only, via NFS, but only to the same machine. That's as simple as adding the following line to /etc/exports :

/root/snapshot 127.0.0.1(secure,ro,no_root_squash)

then start nfs and portmap from /etc/rc.d/init.d/ . Finally mount the exported directory, read-only, as /snapshot :

mount -o ro 127.0.0.1:/root/snapshot /snapshot

And verify that it all worked:

mount
...
/dev/hdb1 on /root/snapshot type ext3 (rw)
127.0.0.1:/root/snapshot on /snapshot type nfs (ro,addr=127.0.0.1)

At this point, we'll have the desired effect: only root will be able to write to the backup (by accessing it through /root/snapshot ). Other users will see only the read-only /snapshot directory. For a little extra protection, you could keep mounted read-only in /root/snapshot most of the time, and only remount it read-write while backups are happening.

Damian Menscher pointed out this CERT advisory which specifically recommends against NFS exporting to localhost, though since I'm not clear on why it's a problem, I'm not sure whether exporting the backups read-only as we do here is also a problem. If you understand the rationale behind this advisory and can shed light on it, would you please contact me? Thanks!

Extensions: hourly, daily, and weekly snapshots

With a little bit of tweaking, we make multiple-level rotating snapshots. On my system, for example, I keep the last four "hourly" snapshots (which are taken every four hours) as well as the last three "daily" snapshots (which are taken at midnight every day). You might also want to keep weekly or even monthly snapshots too, depending upon your needs and your available space.

Keep an extra script for each level

This is probably the easiest way to do it. I keep one script that runs every four hours to make and rotate hourly snapshots, and another script that runs once a day rotate the daily snapshots. There is no need to use rsync for the higher-level snapshots; just cp -al from the appropriate hourly one.

Run it all with cron

To make the automatic snapshots happen, I have added the following lines to root's crontab file:

0 */4 * * * /usr/local/bin/make_snapshot.sh
0 13 * * *  /usr/local/bin/daily_snapshot_rotate.sh

They cause make_snapshot.sh to be run every four hours on the hour and daily_snapshot_rotate.sh to be run every day at 13:00 (that is, 1:00 PM). I have included those scripts in the appendix.

If you tire of receiving an email from the cron process every four hours with the details of what was backed up, you can tell it to send the output of make_snapshot.sh to /dev/null , like so:

0 */4 * * * /usr/local/bin/make_snapshot.sh >/dev/null 2>&1

Understand, though, that this will prevent you from seeing errors if make_snapshot.sh cannot run for some reason, so be careful with it. Creating a third script to check for any unusual behavior in the snapshot periodically seems like a good idea, but I haven't implemented it yet. Alternatively, it might make sense to log the output of each run, by piping it through tee , for example. mRgOBLIN wrote in to suggest a better (and obvious, in retrospect!) approach, which is to send stdout to /dev/null but keep stderr, like so:

0 */4 * * * /usr/local/bin/make_snapshot.sh >/dev/null

Presto! Now you only get mail when there's an error. :)

Appendix: my actual configuration

I know that listing my actual backup configuration here is a security risk; please be kind and don't use this information to crack my site. However, I'm not a security expert, so if you see any vulnerabilities in my setup, I'd greatly appreciate your help in fixing them. Thanks!

I actually use two scripts, one for every-four-hours (hourly) snapshots, and one for every-day (daily) snapshots. I am only including the parts of the scripts that relate to backing up /home , since those are relevant ones here.

I use the NFS-to-localhost trick of exporting /root/snapshot read-only as /snapshot , as discussed above.

The system has been running without a hitch for months.

Listing one: make_snapshot.sh
#!/bin/bash
# ----------------------------------------------------------------------
# mikes handy rotating-filesystem-snapshot utility
# ----------------------------------------------------------------------
# this needs to be a lot more general, but the basic idea is it makes
# rotating backup-snapshots of /home whenever called
# ----------------------------------------------------------------------

unset PATH	# suggestion from H. Milz: avoid accidental use of $PATH

# ------------- system commands used by this script --------------------
ID=/usr/bin/id;
ECHO=/bin/echo;

MOUNT=/bin/mount;
RM=/bin/rm;
MV=/bin/mv;
CP=/bin/cp;
TOUCH=/bin/touch;

RSYNC=/usr/bin/rsync;


# ------------- file locations -----------------------------------------

MOUNT_DEVICE=/dev/hdb1;
SNAPSHOT_RW=/root/snapshot;
EXCLUDES=/usr/local/etc/backup_exclude;


# ------------- the script itself --------------------------------------

# make sure we're running as root
if (( `$ID -u` != 0 )); then { $ECHO "Sorry, must be root.  Exiting..."; exit; } fi

# attempt to remount the RW mount point as RW; else abort
$MOUNT -o remount,rw $MOUNT_DEVICE $SNAPSHOT_RW ;
if (( $? )); then
{
	$ECHO "snapshot: could not remount $SNAPSHOT_RW readwrite";
	exit;
}
fi;


# rotating snapshots of /home (fixme: this should be more general)

# step 1: delete the oldest snapshot, if it exists:
if [ -d $SNAPSHOT_RW/home/hourly.3 ] ; then			\
$RM -rf $SNAPSHOT_RW/home/hourly.3 ;				\
fi ;

# step 2: shift the middle snapshots(s) back by one, if they exist
if [ -d $SNAPSHOT_RW/home/hourly.2 ] ; then			\
$MV $SNAPSHOT_RW/home/hourly.2 $SNAPSHOT_RW/home/hourly.3 ;	\
fi;
if [ -d $SNAPSHOT_RW/home/hourly.1 ] ; then			\
$MV $SNAPSHOT_RW/home/hourly.1 $SNAPSHOT_RW/home/hourly.2 ;	\
fi;

# step 3: make a hard-link-only (except for dirs) copy of the latest snapshot,
# if that exists
if [ -d $SNAPSHOT_RW/home/hourly.0 ] ; then			\
$CP -al $SNAPSHOT_RW/home/hourly.0 $SNAPSHOT_RW/home/hourly.1 ;	\
fi;

# step 4: rsync from the system into the latest snapshot (notice that
# rsync behaves like cp --remove-destination by default, so the destination
# is unlinked first.  If it were not so, this would copy over the other
# snapshot(s) too!
$RSYNC								\
	-va --delete --delete-excluded				\
	--exclude-from="$EXCLUDES"				\
	/home/ $SNAPSHOT_RW/home/hourly.0 ;

# step 5: update the mtime of hourly.0 to reflect the snapshot time
$TOUCH $SNAPSHOT_RW/home/hourly.0 ;

# and thats it for home.

# now remount the RW snapshot mountpoint as readonly

$MOUNT -o remount,ro $MOUNT_DEVICE $SNAPSHOT_RW ;
if (( $? )); then
{
	$ECHO "snapshot: could not remount $SNAPSHOT_RW readonly";
	exit;
} fi;

As you might have noticed above, I have added an excludes list to the rsync call. This is just to prevent the system from backing up garbage like web browser caches, which change frequently (so they'd take up space in every snapshot) but would be no loss if they were accidentally destroyed.

Listing two: daily_snapshot_rotate.sh
#!/bin/bash
# ----------------------------------------------------------------------
# mikes handy rotating-filesystem-snapshot utility: daily snapshots
# ----------------------------------------------------------------------
# intended to be run daily as a cron job when hourly.3 contains the
# midnight (or whenever you want) snapshot; say, 13:00 for 4-hour snapshots.
# ----------------------------------------------------------------------

unset PATH

# ------------- system commands used by this script --------------------
ID=/usr/bin/id;
ECHO=/bin/echo;

MOUNT=/bin/mount;
RM=/bin/rm;
MV=/bin/mv;
CP=/bin/cp;

# ------------- file locations -----------------------------------------

MOUNT_DEVICE=/dev/hdb1;
SNAPSHOT_RW=/root/snapshot;

# ------------- the script itself --------------------------------------

# make sure we're running as root
if (( `$ID -u` != 0 )); then { $ECHO "Sorry, must be root.  Exiting..."; exit; } fi

# attempt to remount the RW mount point as RW; else abort
$MOUNT -o remount,rw $MOUNT_DEVICE $SNAPSHOT_RW ;
if (( $? )); then
{
	$ECHO "snapshot: could not remount $SNAPSHOT_RW readwrite";
	exit;
}
fi;


# step 1: delete the oldest snapshot, if it exists:
if [ -d $SNAPSHOT_RW/home/daily.2 ] ; then			\
$RM -rf $SNAPSHOT_RW/home/daily.2 ;				\
fi ;

# step 2: shift the middle snapshots(s) back by one, if they exist
if [ -d $SNAPSHOT_RW/home/daily.1 ] ; then			\
$MV $SNAPSHOT_RW/home/daily.1 $SNAPSHOT_RW/home/daily.2 ;	\
fi;
if [ -d $SNAPSHOT_RW/home/daily.0 ] ; then			\
$MV $SNAPSHOT_RW/home/daily.0 $SNAPSHOT_RW/home/daily.1;	\
fi;

# step 3: make a hard-link-only (except for dirs) copy of
# hourly.3, assuming that exists, into daily.0
if [ -d $SNAPSHOT_RW/home/hourly.3 ] ; then			\
$CP -al $SNAPSHOT_RW/home/hourly.3 $SNAPSHOT_RW/home/daily.0 ;	\
fi;

# note: do *not* update the mtime of daily.0; it will reflect
# when hourly.3 was made, which should be correct.

# now remount the RW snapshot mountpoint as readonly

$MOUNT -o remount,ro $MOUNT_DEVICE $SNAPSHOT_RW ;
if (( $? )); then
{
	$ECHO "snapshot: could not remount $SNAPSHOT_RW readonly";
	exit;
} fi;
Sample output of ls -l /snapshot/home
total 28
drwxr-xr-x   12 root     root         4096 Mar 28 00:00 daily.0
drwxr-xr-x   12 root     root         4096 Mar 27 00:00 daily.1
drwxr-xr-x   12 root     root         4096 Mar 26 00:00 daily.2
drwxr-xr-x   12 root     root         4096 Mar 28 16:00 hourly.0
drwxr-xr-x   12 root     root         4096 Mar 28 12:00 hourly.1
drwxr-xr-x   12 root     root         4096 Mar 28 08:00 hourly.2
drwxr-xr-x   12 root     root         4096 Mar 28 04:00 hourly.3

Notice that the contents of each of the subdirectories of /snapshot/home/ is a complete image of /home at the time the snapshot was made. Despite the w in the directory access permissions, no one--not even root--can write to this directory; it's mounted read-only.

Bugs Maintaining Permissions and Owners in the snapshots

The snapshot system above does not properly maintain old ownerships/permissions; if a file's ownership or permissions are changed in place, then the new ownership/permissions will apply to older snapshots as well. This is because rsync does not unlink files prior to changing them if the only changes are ownership/permission. Thanks to J.W. Schultz for pointing this out. Using his new --link-dest option, it is now trivial to work around this problem. See the discussion in the Putting it all together section of Incremental backups with rsync , above.

mv updates timestamp bug

Apparently, a bug in some Linux kernels between 2.4.4 and 2.4.9 causes mv to update timestamps; this may result in inaccurate timestamps on the snapshot directories. Thanks to Claude Felizardo for pointing this problem out. He was able to work around the problem my replacing mv with the following script:

MV=my_mv;
...
function my_mv() {
   REF=/tmp/makesnapshot-mymv-$$;
   touch -r $1 $REF;
   /bin/mv $1 $2;
   touch -r $REF $2;
   /bin/rm $REF;
}
Windows-related problems

I have recently received a few reports of what appear to be interaction issues between Windows and rsync.

One report came from a user who mounts a windows share via Samba, much as I do, and had files mysteriously being deleted from the backup even when they weren't deleted from the source. Tim Burt also used this technique, and was seeing files copied even when they hadn't changed. He determined that the problem was modification time precision; adding --modify-window=10 caused rsync to behave correctly in both cases. If you are rsync'ing from a SAMBA share, you must add --modify-window=10 or you may get inconsistent results. Update: --modify-window=1 should be sufficient. Yet another update: the problem appears to still be there. Please let me know if you use this method and files which should not be deleted are deleted.

Also, for those who use rsync directly on cygwin, there are some known problems, apparently related to cygwin signal handling. Scott Evans reports that rsync sometimes hangs on large directories. Jim Kleckner informed me of an rsync patch, discussed here and here , which seems to work around this problem. I have several reports of this working, and two reports of it not working (the hangs continue). However, one of the users who reported a negative outcome, Greg Boyington, was able to get it working using Craig Barrett's suggested sleep() approach, which is documented here .

Memory use in rsync scales linearly with the number of files being sync'd. This is a problem when syncing large file trees, especially when the server involved does not have a lot of RAM. If this limitation is more of an issue to you than network speed (for example, if you copy over a LAN), you may wish to use mirrordir instead. I haven't tried it personally, but it looks promising. Thanks to Vladimir Vuksan for this tip!

Contributed codes

Several people have been kind enough to send improved backup scripts. There are a number of good ideas here, and I hope they'll save you time when you're ready to design your own backup plan. Disclaimer: I have not necessarily tested these; make sure you check the source code and test them thoroughly before use!

References Frequently Asked Questions

[Feb 04, 2017] How do I fix mess created by accidentally untarred files in the current dir, aka tar bomb

Highly recommended!
In such cases the UID of the file is often different from uid of "legitimate" files in polluted directories and you probably can use this fact for quick elimination of the tar bomb, But the idea of using the list of files from the tar bomb to eliminate offending files also works if you observe some precautions -- some directories that were created can have the same names as existing directories. Never do rm in -exec or via xargs without testing.
Notable quotes:
"... You don't want to just rm -r everything that tar tf tells you, since it might include directories that were not empty before unpacking! ..."
"... Another nice trick by @glennjackman, which preserves the order of files, starting from the deepest ones. Again, remove echo when done. ..."
"... One other thing: you may need to use the tar option --numeric-owner if the user names and/or group names in the tar listing make the names start in an unpredictable column. ..."
"... That kind of (antisocial) archive is called a tar bomb because of what it does. Once one of these "explodes" on you, the solutions in the other answers are way better than what I would have suggested. ..."
"... The easiest (laziest) way to do that is to always unpack a tar archive into an empty directory. ..."
"... The t option also comes in handy if you want to inspect the contents of an archive just to see if it has something you're looking for in it. If it does, you can, optionally, just extract the file(s) you want. ..."
Feb 04, 2017 | superuser.com

linux - Undo tar file extraction mess - Super User

first try to issue

tar tf archive
tar will list the contents line by line.

This can be piped to xargs directly, but beware : do the deletion very carefully. You don't want to just rm -r everything that tar tf tells you, since it might include directories that were not empty before unpacking!

You could do

tar tf archive.tar | xargs -d'\n' rm -v
tar tf archive.tar | sort -r | xargs -d'\n' rmdir -v

to first remove all files that were in the archive, and then the directories that are left empty.

sort -r (glennjackman suggested tac instead of sort -r in the comments to the accepted answer, which also works since tar 's output is regular enough) is needed to delete the deepest directories first; otherwise a case where dir1 contains a single empty directory dir2 will leave dir1 after the rmdir pass, since it was not empty before dir2 was removed.

This will generate a lot of

rm: cannot remove `dir/': Is a directory

and

rmdir: failed to remove `dir/': Directory not empty
rmdir: failed to remove `file': Not a directory

Shut this up with 2>/dev/null if it annoys you, but I'd prefer to keep as much information on the process as possible.

And don't do it until you are sure that you match the right files. And perhaps try rm -i to confirm everything. And have backups, eat your breakfast, brush your teeth, etc.

===

List the contents of the tar file like so:

tar tzf myarchive.tar

Then, delete those file names by iterating over that list:

while IFS= read -r file; do echo "$file"; done < <(tar tzf myarchive.tar.gz)

This will still just list the files that would be deleted. Replace echo with rm if you're really sure these are the ones you want to remove. And maybe make a backup to be sure.

In a second pass, remove the directories that are left over:

while IFS= read -r file; do rmdir "$file"; done < <(tar tzf myarchive.tar.gz)

This prevents directories with from being deleted if they already existed before.

Another nice trick by @glennjackman, which preserves the order of files, starting from the deepest ones. Again, remove echo when done.

tar tvf myarchive.tar | tac | xargs -d'\n' echo rm

This could then be followed by the normal rmdir cleanup.


Here's a possibility that will take the extracted files and move them to a subdirectory, cleaning up your main folder.
    #!/usr/bin/perl -w  

    use strict  ;  
    use   Getopt  ::  Long  ;  

    my $clean_folder   =     "clean"  ;  
    my $DRY_RUN  ;  
    die   "Usage: $0 [--dry] [--clean=dir-name]\n"  
          if     (     !  GetOptions  (  "dry!"     =>   \$DRY_RUN  ,  
                           "clean=s"     =>   \$clean_folder  ));  

      # Protect the 'clean_folder' string from shell substitution  
    $clean_folder   =~   s  /  '/'  \\  ''  /  g  ;  

      # Process the "tar tv" listing and output a shell script.  
    print   "#!/bin/sh\n"     if     (     !  $DRY_RUN   );  
      while     (<>)  
      {  
        chomp  ;  

          # Strip out permissions string and the directory entry from the 'tar' list  
        my $perms   =   substr  (  $_  ,     0  ,     10  );  
        my $dirent   =   substr  (  $_  ,     48  );  

          # Drop entries that are in subdirectories  
        next   if     (   $dirent   =~   m  :/.:     );  

          # If we're in "dry run" mode, just list the permissions and the directory  
          # entries.  
          #  
          if     (   $DRY_RUN   )  
          {  
            print   "$perms|$dirent\n"  ;  
            next  ;  
          }  

          # Emit the shell code to clean up the folder  
        $dirent   =~   s  /  '/'  \\  ''  /  g  ;  
        print   "mv -i '$dirent' '$clean_folder'/.\n"  ;  
      } 

Save this to the file fix-tar.pl and then execute it like this:

 $ tar tvf myarchive  .  tar   |   perl fix  -  tar  .  pl   --  dry 

This will confirm that your tar list is like mine. You should get output like:

  -  rw  -  rw  -  r  --|  batch
  -  rw  -  rw  -  r  --|  book  -  report  .  png
  -  rwx  ------|  CaseReports  .  png
  -  rw  -  rw  -  r  --|  caseTree  .  png
  -  rw  -  rw  -  r  --|  tree  .  png
drwxrwxr  -  x  |  sample  / 

If that looks good, then run it again like this:

$ mkdir cleanup
$ tar tvf myarchive  .  tar   |   perl fix  -  tar  .  pl   --  clean  =  cleanup   >   fixup  .  sh 

The fixup.sh script will be the shell commands that will move the top-level files and directories into a "clean" folder (in this instance, the folder called cleanup). Have a peek through this script to confirm that it's all kosher. If it is, you can now clean up your mess with:

 $ sh fixup  .  sh 

I prefer this kind of cleanup because it doesn't destroy anything that isn't already destroyed by being overwritten by that initial tar xv.

Note: if that initial dry run output doesn't look right, you should be able to fiddle with the numbers in the two substr function calls until they look proper. The $perms variable is used only for the dry run so really only the $dirent substring needs to be proper.

One other thing: you may need to use the tar option --numeric-owner if the user names and/or group names in the tar listing make the names start in an unpredictable column.

One other thing: you may need to use the tar option --numeric-owner if the user names and/or group names in the tar listing make the names start in an unpredictable column.

===

That kind of (antisocial) archive is called a tar bomb because of what it does. Once one of these "explodes" on you, the solutions in the other answers are way better than what I would have suggested.

The best "solution", however, is to prevent the problem in the first place.

The easiest (laziest) way to do that is to always unpack a tar archive into an empty directory. If it includes a top level directory, then you just move that to the desired destination. If not, then just rename your working directory (the one that was empty) and move that to the desired location.

If you just want to get it right the first time, you can run tar -tvf archive-file.tar | less and it will list the contents of the archive so you can see how it is structured and then do what is necessary to extract it to the desired location to start with.

The t option also comes in handy if you want to inspect the contents of an archive just to see if it has something you're looking for in it. If it does, you can, optionally, just extract the file(s) you want.

[Feb 04, 2017] 20 Unix Command Line Tricks – Part I

Feb 04, 2017 | www.cyberciti.biz
Build directory trees in a single command

You can create directory trees one at a time using mkdir command by passing the -p option:

mkdir -p /jail/{dev,bin,sbin,etc,usr,lib,lib64}
ls -l /jail

[Dec 26, 2016] Here is my top 5 backup tools in Linux

March 26, 2016 OSTechNix

Data is the backbone of a Company. So, performing backup on regular intervals is one of the vital role of a system administrator. Here is my favourite five backup tools that I use mostly. I won't say these are the best, but these are the backup tools which I considered first when it comes to data backup.

Let me explain some of my preferred backup tools.

1. BACULA

BACULA is a power full backup tool . It is easy to use and efficient in recovering of loss data and damaged files in the local system and remotely. It having rich user interface( UI ) . It works on different cross platforms like windows, and Mac OS X.

Concerning about BACULA features, I can list the following:

  1. SD-SD replication.
  2. Enterprise binaries avaliable for univention.
  3. Restore performance improved for hard data files.
  4. Periodic status on running jobs in Director status report.

BACULA has the following components.

2. FWBACKUPS

FWBACKUPS is the easiest of all backup tools in linux. It having the rich user interface, and also it is a cross platform tool.

One of the notable feature of FWBACKUPS is remote backup. We can backup data from various systems remotely.

FWBACKUPS having some features are listed below.

  1. Simple Interface – Backup and restoring the documents is simple for user.
  2. Cross – platform – It's supports different platforms like windows, and Mac OS X. It restores the data on one system and restores into another system.
  3. Remote backup – All types of files can handle remotely.
  4. Scheduled Backups – Run a backup once or periodically.
  5. Speed – Backups moves faster by copying only the changes.
  6. Organized and clean – It takes care about organized data and removal of expired one. It list the backup to restore from which list of date.
3. RSYNC

RSYNC is a widely used tool for backups in linux. It is a command line backup tool. RSYNC is used to collect data remotely and locally. It is mainly used for automated backup. We can automate backup jobs with scripts.

Some of the notable features are listed below:

  1. It can update whole directory trees and filesystems.
  2. It uses ssh, rsh or direct sockets as the transport.
  3. Supports anonymous rsync which is ideal for mirroring.
  4. We can set bandwidth limit and file size.
4. URBACKUP

URBACKUP is a client/server backup system. It's efficient in client/server backup system for both windows and linux environments. File and image backups are made while the system is running without interrupting current process.

Here is the some features of this tool:

  1. whole partition can be saved as single directory.
  2. Image and file backup are made while system is running.
  3. Fast file and image transmission.
  4. Clients have the flexibility to change the settings like backup frequency. Next to no configuration.
  5. Web interface of URBACKUP is good in showing the status of the clients, current status of backup issues.
5. BACKUP PC

BACKUP PC is high performance, enterprise-grade backup tool. It is a high configurable and easy to install, use and maintain.

It reduces the cost of the disks and raid system. BACKUP PC is written in perl language and extracts data using Samba service.

It is robust, reliable, well documented and freely available as open source on Sourceforge .

Features:

  1. No client side software needed. The standard smb protocol is used to extract backup data.
  2. A powerful web interface provides log details to view log files, configuration, current status and allows user to initiate and cancelled backups and browse and restore files from backups.
  3. It supports mobile environment where laptops are only intermittently connected to the network and have dynamic IP address.
  4. Users will receive email remainders if their pc has not recently been backed up.
  5. Open source and freely available under GPL.

These are the top backup tools that I use mostly. What's your favourite? Let us know in the comment section below.

Thanks for stopping by.

Cheers!

[Nov 06, 2016] Backup and restore using tar

www.unix.com
Q:

tar -cjpf /backup /bin /etc /home /opt /root /sbin /usr /var /boot

When i include the / directory it also tar's the /lib /sys /proc /dev filesystems too (and more but these seem to be problem directories.)

Although i have never tried to restore the /sys /proc and /dev directories I have not seen anyone mention that your cant restore /lib but when i tried the server crashed and would not even start the kernel (not even in single user mode).

Can anyone let me know why this happened and provide a more comprehensive list of directories than the 4 mentioned as to what should and shouldn't be backed up and restored? Or point me to a useful site that might explain why you should or shouldn't backup each one?

A:
There's no point in backing-up things like /proc because that's the dynamic handling of processes and memory working sets (virtual memory).

However, directories like /lib, although problematic to restore on a running system, you would definitely need them in a disaster recovery situation. You would restore /lib to hard disk in single user or cd boot mode.

So you need to backup all non-process, non-memory files for the backup to be sufficient to recover. It doesn't mean, however, that you should attempt to restore them on a running (multi-user) system.

Full Hard-Drive Backup with Linux Tar

[Nov 06, 2016] How to restore a backup from a tgz file in linux

serverfault.com

Antonio Alimba Jun 9 '14 at 13:01

How can i restore from a backup.tgz file generated from another linux server on my own server? I tried the command the following command:
tar xvpfz backup.tgz -C /

The above command worked, but it replaced the existing system files which made my linux server not to work properly.

How can i restore without running into trouble?

You can use --skip-old-files command to tell tar not to overwrite existing files.

You could still run into problem with the backup files, if the software versions are different between the two servers. Some data file structure changes might have happened, and things might stop working.

A more refined backup process should be developed.

[Nov 05, 2016] Relax-and-Recover – Freecode

Nov 05, 2016 | freecode.com

Relax-and-Recover (Rear) is a bare metal disaster recovery and system migration solution, similar to AIX mksysb or HP-UX ignite. It is composed of a modular framework and ready-to-go workflows for many common situations to produce a bootable image and restore from backup using this image. It can restore to different hardware, and can therefore be used as a migration tool as well. It supports various boot media (including tape, USB, or eSATA storage, ISO, PXE, etc.), a variety of network protocols (including SFTP, FTP, HTTP, NFS, and CIFS), as well as a multitude of backup strategies (including IBM TSM, HP DataProtector, Symantec NetBackup, Bacula, and rsync). It was designed to be easy to set up, requires no maintenance, and is there to assist when disaster strikes. Recovering from disaster is made very straight-forward by a 2-step recovery process so that it can be executed by operational teams when required. When used interactively (e.g. when used for migrating systems), menus help make decisions to restore to a new (hardware) environment.

Release Notes: Integrated with duply/duplicity support. systemd support has been added. Various small fixes and improvements to tape support, Xen, PPC, Gentoo, Fedora, multi-arch, storage ... layout configuration, and serial console integration.

(more)

Release Notes: This release adds support for multipathing, adds several improvements to distribution backward compatibility, improves ext4 support, makes various bugfixes, migrates HWADDR ... after rescovery, and includes better systemd support.

(more)

Release Notes: Multi-system and multi-copy support on USB storage devices. Basic rsync backup support. More extensive exclude options. The new layout code is enabled by default. Support ... for Arch Linux. Improved multipath support. Experimental btrfs support.

(more)

Release Notes: Standardization of the command line. The default is quiet output; use the option -v for the old behavior. Boot images now have a comprehensive boot menu. Support for IPv6 ... addresses. Restoring NBU backup from a point in time is supported. Support for Fedora 15 (systemd) and RHEL6/SL6. Improved handling of HP SmartArray. Support for ext4 on RHEL5/SL5. Support for Xen paravirtualization. Integration with the local GRUB menu. Boot images can now be centralized through network transfers. Support for udev on RHEL4. Many small improvements and performance enhancements.

(more)

Release Notes: This release supports many recent distributions, including "upstart" (Ubuntu 7.10). It has more IA-64 support (RHEL5 only at the moment), better error reporting and catching, ... Debian packages (mkdeb), and improved TSM support.

(more)

[Nov 05, 2016] Relax and Recover – How Did I Do That

www.howdididothat.info

21 August 2014

Start a backup on the CentOS machine

Add the following lines to /etc/rear/local.conf:

OUTPUT=ISO
BACKUP=NETFS
BACKUP_TYPE=incremental
BACKUP_PROG=tar
FULLBACKUPDAY="Mon"
BACKUP_URL="nfs://NFSSERVER/path/to/nfs/export/servername"
BACKUP_PROG_COMPRESS_OPTIONS="--gzip"
BACKUP_PROG_COMPRESS_SUFFIX=".gz"
BACKUP_PROG_EXCLUDE=( '/tmp/*' '/dev/shm/*' )
BACKUP_OPTIONS="nfsvers=3,nolock"

OUTPUT=ISO
BACKUP=NETFS
BACKUP_TYPE=incremental
BACKUP_PROG=tar
FULLBACKUPDAY="Mon"
BACKUP_URL="nfs://NFSSERVER/path/to/nfs/export/servername"
BACKUP_PROG_COMPRESS_OPTIONS="--gzip"
BACKUP_PROG_COMPRESS_SUFFIX=".gz"
BACKUP_PROG_EXCLUDE=( '/tmp/*' '/dev/shm/*' )
BACKUP_OPTIONS="nfsvers=3,nolock"


Now make a backup

[root@centos7 ~]# rear mkbackup -v
Relax-and-Recover 1.16.1 / Git
Using log file: /var/log/rear/rear-centos7.log
mkdir: created directory '/var/lib/rear/output'
Creating disk layout
Creating root filesystem layout
TIP: To login as root via ssh you need to set up /root/.ssh/authorized_keys or SSH_ROOT_PASSWORD in your configuration file
Copying files and directories
Copying binaries and libraries
Copying kernel modules
Creating initramfs
Making ISO image
Wrote ISO image: /var/lib/rear/output/rear-centos7.iso (90M)
Copying resulting files to nfs location
Encrypting disabled
Creating tar archive '/tmp/rear.QnDt1Ehk25Vqurp/outputfs/centos7/2014-08-21-1548-F.tar.gz'
Archived 406 MiB [avg 3753 KiB/sec]OK
Archived 406 MiB in 112 seconds [avg 3720 KiB/sec]

Now look on your NFS server

You'll see all the files you'll need to perform the disaster recovery.

total 499M
drwxr-x- 2 root root 4.0K Aug 21 23:51 .
drwxr-xr-x 3 root root 4.0K Aug 21 23:48 ..
-rw--- 1 root root 407M Aug 21 23:51 2014-08-21-1548-F.tar.gz
-rw--- 1 root root 2.2M Aug 21 23:51 backup.log
-rw--- 1 root root 202 Aug 21 23:49 README
-rw--- 1 root root 90M Aug 21 23:49 rear-centos7.iso
-rw--- 1 root root 161K Aug 21 23:49 rear.log
-rw--- 1 root root 0 Aug 21 23:51 selinux.autorelabel
-rw--- 1 root root 277 Aug 21 23:49 VERSION


Author: masterdam79

You can also connect with me on Google+ View all posts by masterdam79


Author masterdam79/
Posted on 21 August 2014/

dheeraj says:

31 August 2016 at 02:26


is it possible to give list of directories or mount points while giving mkbackup to exclude from backup. Like giving a file with list of all directories that need to be excluded ??

masterdam79 says:

26 September 2016 at 21:50

Have a look at https://github.com/rear/rear/issues/216
Should be possible if you ask me.

Creating a restorable archive of the SGE configuration by Johnson, Glenn P

ICTS Biomedical Informatics - Institute for Clinical and Translational

Creating a restorable archive of the SGE configuration

With the following backup configuration:

/opt/gridengine/util/install_modules/backup_template.conf

##################################################

# Autobackup Configuration File Template

##################################################

# Please, enter your SGE_ROOT here (mandatory)

SGE_ROOT="/opt/gridengine"

# Please, enter your SGE_CELL here (mandatory)

SGE_CELL="default"

# Please, enter your Backup Directory here

# After backup you will find your backup files here (mandatory)

# The autobackup will add a time /date combination to this dirname

# to prevent an overwriting!

BACKUP_DIR="/opt/gridengine/backup"

# Please, enter true to get a tar/gz package

# and false to copy the files only (mandatory)

TAR="true"

# Please, enter the backup file name here. (mandatory)

BACKUP_FILE="backup.tar"

the SGE configuration can be backed up with the following command:

(cd /opt/gridengine/;./inst_sge -bup -auto /opt/gridengine/util/install_modules/backup_template.conf)

This is run run from cron every month.

Grid Engine configuration backup by Dave Love

Liverpool University, 2013–02
Introduction

It is advisable to make backups of the configuration as it changes, for the usual reasons of disaster recovery and recording/undoing configuration changes that might, for instance, prove ill-advised. Two methods of backup for different purposes are supplied with the installation, described below. Note that neither of them backs up the job spool as supplied.

The configuration is stored in the qmaster spool area (see 'spooling_params' in bootstrap(5)) which is likely subject to normal system backups. In the case of 'classic' spooling (into simple text files), normal system backups may be enough, but the methods below may still be useful, especially for more frequent backups and easy checking of changes. Also there are extra considerations if Berkeley DB spooling is used, when it is not safe simply to copy the DB files in a normal backup, and it may not be safe to back up at all in some circumstances.

Using 'upgrade modules'

The scripts in $SGE_ROOT/util/upgrade_modules, intended for distribution updates and changing the spool type etc. (inst_sge -upd), are also useful for backup/restore and recording the configuration. They use qconf(5) to load and restore the configuration. That requires a running qmaster and operator/manager privileges, typically as user sgeadmin. After setting $SGE_ROOT and $SGE_CELL as usual, and with $SGE_ROOT as the current directory, use

    util/upgrade_modules/save_sge_config.sh directory

to back up the configuration into the fresh directory. Note that some data are not present in the configuration saved this way, particularly usage information for fair share purposes, and jobs. The inst_sge method below preserves more.

Restoring the configuration (which also requires a running qmaster) is done using util/upgrade_modules/load_sge_config.sh. Use the -help option for more information.

Note that the CSP configuration isn't backed up, but a new configuration can be re-created on restoring with inst_sge -upd.

Revision control

Backup directories made by running save_sge_config.sh under cron(1) may conveniently be kept under revision control to have a record of changes, not necessarily as an actual backup. There is a version of the technique using git ("scripts/GridEngine-git-config") in the flex-grid repository, based on a subversion one. A simple darcs-based script is also available. As in these examples, it is usual to delete the accounting file, and possibly arseqnum and jobseqnum (recording the advance reservation and job sequence numbers), and also to expunge the execution host load_values. These can be expected to change for each backup, contributing noise to the history. Similarly, it might be useful to expunge the delete_time parameter for any auto-created users in the users sub-directory of the backup. However, if backups are modified like that, they may not be able to be re-loaded directly.

Using inst_sge

Backups using inst_sge work by copying (part of) the spool area, rather than using qconf, preserving more information than the upgrade script method, such as share tree information. With Berkeley DB spooling, some of the copying involves the db_dump and db_load utilities, which must be available in the $SGE_ROOT/utilbin/$ARCH/ directory. If they were not installed, copy or symbolically link your operating system's versions there. Some systems support more than one BDB version, and may call the utilities something like db5.1_dump, in which case it may be necessary to select one corresponding to the version of the BDB library used by gridengine. For a packaged version, consult the package dependencies to determine that. E.g.

    ln -s /usr/bin/db5.1_dump utilbin/lx-amd64/db_dump

Caution: This method is not safe if Berkeley DB spooling is in use, the qmaster is running, and it opens the spool database in 'private' mode to allow spooling to a network filesystem. (See the BDB documentation.) Some gridengine implementations do this unconditionally, and others may provide it as an option to allow use of either network filesystems or this backup method on a live system (with a local spool directory). The database is opened in private mode if qmaster runs with no files of the form __db.nnn in the directory specified by spooling_params of bootstrap(5). It is done optionally (in the Son of Grid Engine development version at the time of writing) by appending the private option to spooling_params in the bootstrap file. In that case, inst_sge should refuse to do the backup if the qmaster is running.

To back up manually using this method, cd to $SGE_ROOT, run

    ./inst_sge -bup

and answer the questions. This can be done with or without the qmaster running, including automatically on a live system (e.g. under cron), by supplying a configuration file:

    ./inst_sge -bup -auto conf-file

An example configuration is installed as $SGE_ROOT/util/install_modules/backup_template.conf. In auto mode, a timestamp is appended to the backup directory name, so you may want to reap old directories with something like tmpwatch(1). The -auto method may fail with some Bourne shell implementations; if so, try running it under bash (assuming you have it installed) as

    bash inst_sge ...

See also documentation from Oracle.

To restore the configuration (which must be done with the qmaster shut down) use

    ./inst_sge -rst directory

Like the upgrade script, this method doesn't back up the job spool (BDB sge_jobs database or classic spool/qmaster/{jobs,job_scripts}), but it might usefully be extended to back it up if throughput is low compared with the backup frequency, and if potentially re-running jobs isn't problematic. For classic spooling, see the case in the DoBackup function, and for Berkeley DB spooling see the use of db_dump/db_load in the SwitchArchBup and SwitchArchRst functions (all in $SGE_ROOT/util/install_modules/inst_common.sh).

Any CSP configuration also isn't included in the backup, because of the risk of exposing private keys if the backup is stored insecurely. If you do want to make a copy, the configuration resides in $SGE_ROOT/$SGE_CELL/common/sgeCA and, usually, /var/lib/sgaCA (where private keys live).

[Nov 04, 2016] Coding Style rear-rear Wiki

Reading rear sources is an interesting exercise. It really demonstrates attempt to use "reasonable' style of shell programming and you can learn a lot.
Nov 04, 2016 | github.com

Relax-and-Recover is written in Bash (at least bash version 3 is needed), a language that can be used in many styles. We want to make it easier for everybody to understand the Relax-and-Recover code and subsequently to contribute fixes and enhancements.

Here is a collection of coding hints that should help to get a more consistent code base.

Don't be afraid to contribute to Relax-and-Recover even if your contribution does not fully match all this coding hints. Currently large parts of the Relax-and-Recover code are not yet in compliance with this coding hints. This is an ongoing step by step process. Nevertheless try to understand the idea behind this coding hints so that you know how to break them properly (i.e. "learn the rules so you know how to break them properly").

The overall idea behind this coding hints is:

Make yourself understood

Make yourself understood to enable others to fix and enhance your code properly as needed.

From this overall idea the following coding hints are derived.

For the fun of it an extreme example what coding style should be avoided:

#!/bin/bash for i in `seq 1 2 $((2*$1-1))`;do echo $((j+=i));done



   

Try to find out what that code is about - it does a useful thing.

Code must be easy to read Code should be easy to understand

Do not only tell what the code does (i.e. the implementation details) but also explain what the intent behind is (i.e. why ) to make the code maintainable.

Here the initial example so that one can understand what it is about:

#!/bin/bash # output the first N square numbers # by summing up the first N odd numbers 1 3 ... 2*N-1 # where each nth partial sum is the nth square number # see https://en.wikipedia.org/wiki/Square_number#Properties # this way it is a little bit faster for big N compared to # calculating each square number on its own via multiplication N=$1 if ! [[ $N =~ ^[0-9]+$ ]] ; then echo "Input must be non-negative integer." 1>&2 exit 1 fi square_number=0 for odd_number in $( seq 1 2 $(( 2 * N - 1 )) ) ; do (( square_number += odd_number )) && echo $square_number done

Now the intent behind is clear and now others can easily decide if that code is really the best way to do it and easily improve it if needed.

Try to care about possible errors

By default bash proceeds with the next command when something failed. Do not let your code blindly proceed in case of errors because that could make it hard to find the root cause of a failure when it errors out somewhere later at an unrelated place with a weird error message which could lead to false fixes that cure only a particular symptom but not the root cause.

Maintain Backward Compatibility

Implement adaptions and enhancements in a backward compatible way so that your changes do not cause regressions for others.

Dirty hacks welcome

When there are special issues on particular systems it is more important that the Relax-and-Recover code works than having nice looking clean code that sometimes fails. In such special cases any dirty hacks that intend to make it work everywhere are welcome. But for dirty hacks the above listed coding hints become mandatory rules:

For example a dirty hack like the following is perfectly acceptable:

# FIXME: Dirty hack to make it work # on "FUBAR Linux version 666" # where COMMAND sometimes inexplicably fails # but always works after at most 3 attempts # see http://example.org/issue12345 # Retries should have no bad effect on other systems # where the first run of COMMAND works. COMMAND || COMMAND || COMMAND || Error "COMMAND failed."

Character Encoding

Use only traditional (7-bit) ASCII charactes. In particular do not use UTF-8 encoded multi-byte characters.

Text Layout Variables Functions Relax-and-Recover functions

Use the available Relax-and-Recover functions when possible instead of re-implementing basic functionality again and again. The Relax-and-Recover functions are implemented in various lib/*-functions.sh files .

test, [, [[, (( Paired parenthesis See also

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

[Feb 04, 2017] How do I fix mess created by accidentally untarred files in the current dir, aka tar bomb Published on Feb 04, 2017 | superuser.com

Sites

Top articles

Sites

...

UpgradeToSGE62 - GridWiki



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: November, 09, 2019