I used rsync to copy a large number of files, but my OS (Ubuntu) restarted
unexpectedly.
After reboot, I ran rsync again, but from the output on the terminal, I found
that rsync still copied those already copied before. But I heard that
rsync is able to find differences between source and destination, and therefore
to just copy the differences. So I wonder in my case if rsync can resume what
was left last time?
Yes, rsync won't copy again files that it's already copied. There are a few edge cases where
its detection can fail. Did it copy all the already-copied files? What options did you use?
What were the source and target filesystems? If you run rsync again after it's copied
everything, does it copy again? – Gilles
Sep 16 '12 at 1:56
@Gilles: Thanks! (1) I think I saw rsync copied the same files again from its output on the
terminal. (2) Options are same as in my other post, i.e. sudo rsync -azvv
/home/path/folder1/ /home/path/folder2 . (3) Source and target are both NTFS, buy
source is an external HDD, and target is an internal HDD. (3) It is now running and hasn't
finished yet. – Tim
Sep 16 '12 at 2:30
@Tim Off the top of my head, there's at least clock skew, and differences in time resolution
(a common issue with FAT filesystems which store times in 2-second increments, the
--modify-window option helps with that). – Gilles
Sep 19 '12 at 9:25
First of all, regarding the "resume" part of your question, --partial just tells
the receiving end to keep partially transferred files if the sending end disappears as though
they were completely transferred.
While transferring files, they are temporarily saved as hidden files in their target
folders (e.g. .TheFileYouAreSending.lRWzDC ), or a specifically chosen folder if
you set the --partial-dir switch. When a transfer fails and
--partial is not set, this hidden file will remain in the target folder under
this cryptic name, but if --partial is set, the file will be renamed to the
actual target file name (in this case, TheFileYouAreSending ), even though the
file isn't complete. The point is that you can later complete the transfer by running rsync
again with either --append or --append-verify .
So, --partial doesn't itself resume a failed or cancelled transfer.
To resume it, you'll have to use one of the aforementioned flags on the next run. So, if you
need to make sure that the target won't ever contain files that appear to be fine but are
actually incomplete, you shouldn't use --partial . Conversely, if you want to
make sure you never leave behind stray failed files that are hidden in the target directory,
and you know you'll be able to complete the transfer later, --partial is there
to help you.
With regards to the --append switch mentioned above, this is the actual
"resume" switch, and you can use it whether or not you're also using --partial .
Actually, when you're using --append , no temporary files are ever created.
Files are written directly to their targets. In this respect, --append gives the
same result as --partial on a failed transfer, but without creating those hidden
temporary files.
So, to sum up, if you're moving large files and you want the option to resume a cancelled
or failed rsync operation from the exact point that rsync stopped, you need to
use the --append or --append-verify switch on the next attempt.
As @Alex points out below, since version 3.0.0 rsync now has a new option,
--append-verify , which behaves like --append did before that
switch existed. You probably always want the behaviour of --append-verify , so
check your version with rsync --version . If you're on a Mac and not using
rsync from homebrew , you'll (at least up to and including El
Capitan) have an older version and need to use --append rather than
--append-verify . Why they didn't keep the behaviour on --append
and instead named the newcomer --append-no-verify is a bit puzzling. Either way,
--append on rsync before version 3 is the same as
--append-verify on the newer versions.
--append-verify isn't dangerous: It will always read and compare the data on
both ends and not just assume they're equal. It does this using checksums, so it's easy on
the network, but it does require reading the shared amount of data on both ends of the wire
before it can actually resume the transfer by appending to the target.
Second of all, you said that you "heard that rsync is able to find differences between
source and destination, and therefore to just copy the differences."
That's correct, and it's called delta transfer, but it's a different thing. To enable
this, you add the -c , or --checksum switch. Once this switch is
used, rsync will examine files that exist on both ends of the wire. It does this in chunks,
compares the checksums on both ends, and if they differ, it transfers just the differing
parts of the file. But, as @Jonathan points out below, the comparison is only done when files
are of the same size on both ends -- different sizes will cause rsync to upload the entire
file, overwriting the target with the same name.
This requires a bit of computation on both ends initially, but can be extremely efficient
at reducing network load if for example you're frequently backing up very large files
fixed-size files that often contain minor changes. Examples that come to mind are virtual
hard drive image files used in virtual machines or iSCSI targets.
It is notable that if you use --checksum to transfer a batch of files that
are completely new to the target system, rsync will still calculate their checksums on the
source system before transferring them. Why I do not know :)
So, in short:
If you're often using rsync to just "move stuff from A to B" and want the option to cancel
that operation and later resume it, don't use --checksum , but do use
--append-verify .
If you're using rsync to back up stuff often, using --append-verify probably
won't do much for you, unless you're in the habit of sending large files that continuously
grow in size but are rarely modified once written. As a bonus tip, if you're backing up to
storage that supports snapshotting such as btrfs or zfs , adding
the --inplace switch will help you reduce snapshot sizes since changed files
aren't recreated but rather the changed blocks are written directly over the old ones. This
switch is also useful if you want to avoid rsync creating copies of files on the target when
only minor changes have occurred.
When using --append-verify , rsync will behave just like it always does on
all files that are the same size. If they differ in modification or other timestamps, it will
overwrite the target with the source without scrutinizing those files further.
--checksum will compare the contents (checksums) of every file pair of identical
name and size.
UPDATED 2015-09-01 Changed to reflect points made by @Alex (thanks!)
UPDATED 2017-07-14 Changed to reflect points made by @Jonathan (thanks!)
According to the documentation--append does not check the data, but --append-verify does.
Also, as @gaoithe points out in a comment below, the documentation claims
--partialdoes resume from previous files. – Alex
Aug 28 '15 at 3:49
Thank you @Alex for the updates. Indeed, since 3.0.0, --append no longer
compares the source to the target file before appending. Quite important, really!
--partial does not itself resume a failed file transfer, but rather leaves it
there for a subsequent --append(-verify) to append to it. My answer was clearly
misrepresenting this fact; I'll update it to include these points! Thanks a lot :) –
DanielSmedegaardBuus
Sep 1 '15 at 13:29
@CMCDragonkai Actually, check out Alexander's answer below about --partial-dir
-- looks like it's the perfect bullet for this. I may have missed something entirely ;)
– DanielSmedegaardBuus
May 10 '16 at 19:31
What's your level of confidence in the described behavior of --checksum ?
According to the man it has more to do with deciding
which files to flag for transfer than with delta-transfer (which, presumably, is
rsync 's default behavior). – Jonathan Y.
Jun 14 '17 at 5:48
Just specify a partial directory as the rsync man pages recommends:
--partial-dir=.rsync-partial
Longer explanation:
There is actually a built-in feature for doing this using the --partial-dir
option, which has several advantages over the --partial and
--append-verify / --append alternative.
Excerpt from the
rsync man pages:
--partial-dir=DIR
A better way to keep partial files than the --partial option is
to specify a DIR that will be used to hold the partial data
(instead of writing it out to the destination file). On the
next transfer, rsync will use a file found in this dir as data
to speed up the resumption of the transfer and then delete it
after it has served its purpose.
Note that if --whole-file is specified (or implied), any par-
tial-dir file that is found for a file that is being updated
will simply be removed (since rsync is sending files without
using rsync's delta-transfer algorithm).
Rsync will create the DIR if it is missing (just the last dir --
not the whole path). This makes it easy to use a relative path
(such as "--partial-dir=.rsync-partial") to have rsync create
the partial-directory in the destination file's directory when
needed, and then remove it again when the partial file is
deleted.
If the partial-dir value is not an absolute path, rsync will add
an exclude rule at the end of all your existing excludes. This
will prevent the sending of any partial-dir files that may exist
on the sending side, and will also prevent the untimely deletion
of partial-dir items on the receiving side. An example: the
above --partial-dir option would add the equivalent of "-f '-p
.rsync-partial/'" at the end of any other filter rules.
By default, rsync uses a random temporary file name which gets deleted when a transfer
fails. As mentioned, using --partial you can make rsync keep the incomplete file
as if it were successfully transferred , so that it is possible to later append to
it using the --append-verify / --append options. However there are
several reasons this is sub-optimal.
Your backup files may not be complete, and without checking the remote file which must
still be unaltered, there's no way to know.
If you are attempting to use --backup and --backup-dir ,
you've just added a new version of this file that never even exited before to your version
history.
However if we use --partial-dir , rsync will preserve the temporary partial
file, and resume downloading using that partial file next time you run it, and we do not
suffer from the above issues.
I agree this is a much more concise answer to the question. the TL;DR: is perfect and for
those that need more can read the longer bit. Strong work. – JKOlaf
Jun 28 '17 at 0:11
You may want to add the -P option to your command.
From the man page:
--partial By default, rsync will delete any partially transferred file if the transfer
is interrupted. In some circumstances it is more desirable to keep partially
transferred files. Using the --partial option tells rsync to keep the partial
file which should make a subsequent transfer of the rest of the file much faster.
-P The -P option is equivalent to --partial --progress. Its pur-
pose is to make it much easier to specify these two options for
a long transfer that may be interrupted.
@Flimm not quite correct. If there is an interruption (network or receiving side) then when
using --partial the partial file is kept AND it is used when rsync is resumed. From the
manpage: "Using the --partial option tells rsync to keep the partial file which should
<b>make a subsequent transfer of the rest of the file much faster</b>." –
gaoithe
Aug 19 '15 at 11:29
@Flimm and @gaoithe, my answer wasn't quite accurate, and definitely not up-to-date. I've
updated it to reflect version 3 + of rsync . It's important to stress, though,
that --partial does not itself resume a failed transfer. See my answer
for details :) – DanielSmedegaardBuus
Sep 1 '15 at 14:11
@DanielSmedegaardBuus I tried it and the -P is enough in my case. Versions:
client has 3.1.0 and server has 3.1.1. I interrupted the transfer of a single large file with
ctrl-c. I guess I am missing something. – guettli
Nov 18 '15 at 12:28
I think you are forcibly calling the rsync and hence all data is getting
downloaded when you recall it again. use --progress option to copy only those
files which are not copied and --delete option to delete any files if already
copied and now it does not exist in source folder...
@Fabien He tells rsync to set two ssh options (rsync uses ssh to connect). The second one
tells ssh to not prompt for confirmation if the host he's connecting to isn't already known
(by existing in the "known hosts" file). The first one tells ssh to not use the default known
hosts file (which would be ~/.ssh/known_hosts). He uses /dev/null instead, which is of course
always empty, and as ssh would then not find the host in there, it would normally prompt for
confirmation, hence option two. Upon connecting, ssh writes the now known host to /dev/null,
effectively forgetting it instantly :) – DanielSmedegaardBuus
Dec 7 '14 at 0:12
...but you were probably wondering what effect, if any, it has on the rsync operation itself.
The answer is none. It only serves to not have the host you're connecting to added to your
SSH known hosts file. Perhaps he's a sysadmin often connecting to a great number of new
servers, temporary systems or whatnot. I don't know :) – DanielSmedegaardBuus
Dec 7 '14 at 0:23
There are a couple errors here; one is very serious: --delete will delete files
in the destination that don't exist in the source. The less serious one is that
--progress doesn't modify how things are copied; it just gives you a progress
report on each file as it copies. (I fixed the serious error; replaced it with
--remove-source-files .) – Paul d'Aoust
Nov 17 '16 at 22:39
If you already know you want it, get it here:
parsync+utils.tar.gz (contains parsync
plus the kdirstat-cache-writer , stats , and scut utilities below) Extract it into a dir on your $PATH and after
verifying the other dependencies below, give it a shot.
While parsync is developed for and test on Linux, the latest version of parsync has been modified to (mostly) work on the Mac
(tested on OSX 10.9.5). A number of the Linux-specific dependencies have been removed and there are a number of Mac-specific work
arounds.
Thanks to Phil Reese < [email protected] > for the code mods needed to get it started.
It's the same package and instructions for both platforms.
2. Dependencies
parsync requires the following utilities to work:
stats - self-writ Perl utility for providing
descriptive stats on STDIN
scut - self-writ Perl utility like cut
that allows regex split tokens
kdirstat-cache-writer (included in the tarball mentioned above), requires a
parsync needs to be installed only on the SOURCE end of the transfer and uses whatever rsync is available on the TARGET.
It uses a number of Linux- specific utilities so if you're transferring between Linux and a FreeBSD host, install parsync on the
Linux side. In fact, as currently written, it will only PUSH data to remote targets ; it will not pull data as rsync itself
can do. This will probably in the near future. 3. Overviewrsync is
a fabulous data mover. Possibly more bytes have been moved (or have been prevented from being moved) by rsync than by any other application.
So what's not to love? For transferring large, deep file trees, rsync will pause while it generates lists of files to process. Since
Version 3, it does this pretty fast, but on sluggish filesystems, it can take hours or even days before it will start to actually
exchange rsync data. Second, due to various bottlenecks, rsync will tend to use less than the available bandwidth on high speed networks.
Starting multiple instances of rsync can improve this significantly. However, on such transfers, it is also easy to overload the
available bandwidth, so it would be nice to both limit the bandwidth used if necessary and also to limit the load on the system.
parsync tries to satisfy all these conditions and more by:
using the kdir-cache-writer
utility from the beautiful kdirstat directory browser which can
produce lists of files very rapidly
allowing re-use of the cache files so generated.
doing crude loadbalancing of the number of active rsyncs, suspending and un-suspending the processes as necessary.
using rsync's own bandwidth limiter (--bwlimit) to throttle the total bandwidth.
using rsync's own vast option selection is available as a pass-thru (tho limited to those compatible with the --files-from
option).
Only use for LARGE data transfers The main use case for parsync is really only very large data transfers thru fairly fast
network connections (>1Gb/s). Below this speed, a single rsync can saturate the connection, so there's little reason to use
parsync and in fact the overhead of testing the existence of and starting more rsyncs tends to worsen its performance on small
transfers to slightly less than rsync alone.
Beyond this introduction, parsync's internal help is about all you'll need to figure out how to use it; below is what you'll see
when you type parsync -h . There are still edge cases where parsync will fail or behave oddly, especially with small data
transfers, so I'd be happy to hear of such misbehavior or suggestions to improve it. Download the complete tarball of parsync, plus
the required utilities here: parsync+utils.tar.gz
Unpack it, move the contents to a dir on your $PATH , chmod it executable, and try it out.
parsync --help
or just
parsync
Below is what you should see:
4. parsync help
parsync version 1.67 (Mac compatibility beta) Jan 22, 2017
by Harry Mangalam <[email protected]> || <[email protected]>
parsync is a Perl script that wraps Andrew Tridgell's miraculous 'rsync' to
provide some load balancing and parallel operation across network connections
to increase the amount of bandwidth it can use.
parsync is primarily tested on Linux, but (mostly) works on MaccOSX
as well.
parsync needs to be installed only on the SOURCE end of the
transfer and only works in local SOURCE -> remote TARGET mode
(it won't allow remote local SOURCE <- remote TARGET, emitting an
error and exiting if attempted).
It uses whatever rsync is available on the TARGET. It uses a number
of Linux-specific utilities so if you're transferring between Linux
and a FreeBSD host, install parsync on the Linux side.
The only native rsync option that parsync uses is '-a' (archive) &
'-s' (respect bizarro characters in filenames).
If you need more, then it's up to you to provide them via
'--rsyncopts'. parsync checks to see if the current system load is
too heavy and tries to throttle the rsyncs during the run by
monitoring and suspending / continuing them as needed.
It uses the very efficient (also Perl-based) kdirstat-cache-writer
from kdirstat to generate lists of files which are summed and then
crudely divided into NP jobs by size.
It appropriates rsync's bandwidth throttle mechanism, using '--maxbw'
as a passthru to rsync's 'bwlimit' option, but divides it by NP so
as to keep the total bw the same as the stated limit. It monitors and
shows network bandwidth, but can't change the bw allocation mid-job.
It can only suspend rsyncs until the load decreases below the cutoff.
If you suspend parsync (^Z), all rsync children will suspend as well,
regardless of current state.
Unless changed by '--interface', it tried to figure out how to set the
interface to monitor. The transfer will use whatever interface routing
provides, normally set by the name of the target. It can also be used for
non-host-based transfers (between mounted filesystems) but the network
bandwidth continues to be (usually pointlessly) shown.
[[NB: Between mounted filesystems, parsync sometimes works very poorly for
reasons still mysterious. In such cases (monitor with 'ifstat'), use 'cp'
or 'tnc' (https://goo.gl/5FiSxR) for the initial data movement and a single
rsync to finalize. I believe the multiple rsync chatter is interfering with
the transfer.]]
It only works on dirs and files that originate from the current dir (or
specified via "--rootdir"). You cannot include dirs and files from
discontinuous or higher-level dirs.
** the ~/.parsync files **
The ~/.parsync dir contains the cache (*.gz), the chunk files (kds*), and the
time-stamped log files. The cache files can be re-used with '--reusecache'
(which will re-use ALL the cache and chunk files. The log files are
datestamped and are NOT overwritten.
** Odd characters in names **
parsync will sometimes refuse to transfer some oddly named files, altho
recent versions of rsync allow the '-s' flag (now a parsync default)
which tries to respect names with spaces and properly escaped shell
characters. Filenames with embedded newlines, DOS EOLs, and other
odd chars will be recorded in the log files in the ~/.parsync dir.
** Because of the crude way that files are chunked, NP may be
adjusted slightly to match the file chunks. ie '--NP 8' -> '--NP 7'.
If so, a warning will be issued and the rest of the transfer will be
automatically adjusted.
OPTIONS
=======
[i] = integer number
[f] = floating point number
[s] = "quoted string"
( ) = the default if any
--NP [i] (sqrt(#CPUs)) ............... number of rsync processes to start
optimal NP depends on many vars. Try the default and incr as needed
--startdir [s] (`pwd`) .. the directory it works relative to. If you omit
it, the default is the CURRENT dir. You DO have
to specify target dirs. See the examples below.
--maxbw [i] (unlimited) .......... in KB/s max bandwidth to use (--bwlimit
passthru to rsync). maxbw is the total BW to be used, NOT per rsync.
--maxload [f] (NP+2) ........ max total system load - if sysload > maxload,
sleeps an rsync proc for 10s
--checkperiod [i] (5) .......... sets the period in seconds between updates
--rsyncopts [s] ... options passed to rsync as a quoted string (CAREFUL!)
this opt triggers a pause before executing to verify the command.
--interface [s] ............. network interface to /monitor/, not nec use.
default: `/sbin/route -n | grep "^0.0.0.0" | rev | cut -d' ' -f1 | rev`
above works on most simple hosts, but complex routes will confuse it.
--reusecache .......... don't re-read the dirs; re-use the existing caches
--email [s] ..................... email address to send completion message
(requires working mail system on host)
--barefiles ..... set to allow rsync of individual files, as oppo to dirs
--nowait ................ for scripting, sleep for a few s instead of wait
--version ................................. dumps version string and exits
--help ......................................................... this help
Examples
========
-- Good example 1 --
% parsync --maxload=5.5 --NP=4 --startdir='/home/hjm' dir1 dir2 dir3
hjm@remotehost:~/backups
where
= "--startdir='/home/hjm'" sets the working dir of this operation to
'/home/hjm' and dir1 dir2 dir3 are subdirs from '/home/hjm'
= the target "hjm@remotehost:~/backups" is the same target rsync would use
= "--NP=4" forks 4 instances of rsync
= -"-maxload=5.5" will start suspending rsync instances when the 5m system
load gets to 5.5 and then unsuspending them when it goes below it.
It uses 4 instances to rsync dir1 dir2 dir3 to hjm@remotehost:~/backups
-- Good example 2 --
% parsync --rsyncopts="--ignore-existing" --reusecache --NP=3
--barefiles *.txt /mount/backups/txt
where
= "--rsyncopts='--ignore-existing'" is an option passed thru to rsync
telling it not to disturb any existing files in the target directory.
= "--reusecache" indicates that the filecache shouldn't be re-generated,
uses the previous filecache in ~/.parsync
= "--NP=3" for 3 copies of rsync (with no "--maxload", the default is 4)
= "--barefiles" indicates that it's OK to transfer barefiles instead of
recursing thru dirs.
= "/mount/backups/txt" is the target - a local disk mount instead of a network host.
It uses 3 instances to rsync *.txt from the current dir to "/mount/backups/txt".
-- Error Example 1 --
% pwd
/home/hjm # executing parsync from here
% parsync --NP4 --compress /usr/local /media/backupdisk
why this is an error:
= '--NP4' is not an option (parsync will say "Unknown option: np4")
It should be '--NP=4'
= if you were trying to rsync '/usr/local' to '/media/backupdisk',
it will fail since there is no /home/hjm/usr/local dir to use as
a source. This will be shown in the log files in
~/.parsync/rsync-logfile-<datestamp>_#
as a spew of "No such file or directory (2)" errors
= the '--compress' is a native rsync option, not a native parsync option.
You have to pass it to rsync with "--rsyncopts='--compress'"
The correct version of the above command is:
% parsync --NP=4 --rsyncopts='--compress' --startdir=/usr local
/media/backupdisk
-- Error Example 2 --
% parsync --start-dir /home/hjm mooslocal [email protected]:/usr/local
why this is an error:
= this command is trying to PULL data from a remote SOURCE to a
local TARGET. parsync doesn't support that kind of operation yet.
The correct version of the above command is:
# ssh to hjm@moo, install parsync, then:
% parsync --startdir=/usr local hjm@remote:/home/hjm/mooslocal
I have been using a rsync script to synchronize data at one host with the data
at another host. The data has numerous small-sized files that contribute to almost 1.2TB.
In order to sync those files, I have been using rsync command as follows:
As a test, I picked up two of those projects (8.5GB of data) and I executed the command
above. Being a sequential process, it tool 14 minutes 58 seconds to complete. So, for 1.2TB
of data it would take several hours.
If I would could multiple rsync processes in parallel (using
& , xargs or parallel ), it would save my
time.
I tried with below command with parallel (after cd ing to source
directory) and it took 12 minutes 37 seconds to execute:
If possible, we would want to use 50% of total bandwidth. But, parallelising multiple
rsync s is our first priority. – Mandar Shinde
Mar 13 '15 at 7:32
In fact, I do not know about above parameters. For the time being, we can neglect the
optimization part. Multiple rsync s in parallel is the primary focus now.
– Mandar Shinde
Mar 13 '15 at 7:47
Here, --relative option ( link
) ensured that the directory structure for the affected files, at the source and destination,
remains the same (inside /data/ directory), so the command must be run in the
source folder (in example, /data/projects ).
That would do an rsync per file. It would probably be more efficient to split up the whole
file list using split and feed those filenames to parallel. Then use rsync's
--files-from to get the filenames out of each file and sync them. rm backups.*
split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync
--progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/ –
Sandip Bhattacharya
Nov 17 '16 at 21:22
How does the second rsync command handle the lines in result.log that are not files? i.e.
receiving file list ... donecreated directory /data/ . –
Mike D
Sep 19 '17 at 16:42
On newer versions of rsync (3.1.0+), you can use --info=name in place of
-v , and you'll get just the names of the files and directories. You may want to
use --protect-args to the 'inner' transferring rsync too if any files might have spaces or
shell metacharacters in them. – Cheetah
Oct 12 '17 at 5:31
I would strongly discourage anybody from using the accepted answer, a better solution is to
crawl the top level directory and launch a proportional number of rync operations.
I have a large zfs volume and my source was was a cifs mount. Both are linked with 10G,
and in some benchmarks can saturate the link. Performance was evaluated using zpool
iostat 1 .
The source drive was mounted like:
mount -t cifs -o username=,password= //static_ip/70tb /mnt/Datahoarder_Mount/ -o vers=3.0
This in synthetic benchmarks (crystal disk), performance for sequential write approaches
900 MB/s which means the link is saturated. 130MB/s is not very good, and the difference
between waiting a weekend and two weeks.
So, I built the file list and tried to run the sync again (I have a 64 core machine):
In conclusion, as @Sandip Bhattacharya brought up, write a small script to get the
directories and parallel that. Alternatively, pass a file list to rsync. But don't create new
instances for each file.
ls -1 | parallel rsync -a {} /destination/directory/
Which only is usefull when you have more than a few non-near-empty directories, else
you'll end up having almost every rsync terminating and the last one doing all
the job alone.
rsync is a great tool, but sometimes it will not fill up the available bandwidth. This
is often a problem when copying several big files over high speed connections.
The following will start one rsync per big file in src-dir to dest-dir on the server
fooserver:
If I use --dry-run option in rsync , I would have a list of files
that would be transferred. Can I provide that file list to parallel in order to
parallelise the process? – Mandar Shinde
Apr 10 '15 at 3:47
rsync is a great tool, but sometimes it will not fill up the available bandwidth.
This is often a problem when copying several big files over high speed connections.
The following will start one rsync per big file in src-dir to dest-dir
on the server fooserver :
Necessity is frequently the mother of invention. I knew very little about BASH scripting but
that was about to change rapidly. Working with the existing script and using online help
forums, search engines, and some printed documentation, I setup Linux network attached storage
computer running on Fedora Core. I learned how to create an SSH keypair and
configure that along with rsync to move the backup file from the email server
to the storage server. That worked well for a few days until I noticed that the storage servers
disk space was rapidly disappearing. What was I going to do?
That's when I learned more about Bash scripting. I modified my rsync command to delete
backed up files older than ten days. In both cases I learned that a little knowledge can be a
dangerous thing but in each case my experience and confidence as Linux user and system
administrator grew and due to that I functioned as a resource for other. On the plus side, we
soon realized that the disk to disk backup system was superior to tape when it came to
restoring email files. In the long run it was a win but there was a lot of uncertainty and
anxiety along the way.
There is a flag --files-from that does exactly what you want. From man
rsync :
--files-from=FILE
Using this option allows you to specify the exact list of files to transfer (as read
from the specified FILE or - for standard input). It also tweaks the default behavior of
rsync to make transferring just the specified files and directories easier:
The --relative (-R) option is implied, which preserves the path information that is
specified for each item in the file (use --no-relative or --no-R if you want to turn that
off).
The --dirs (-d) option is implied, which will create directories specified in the
list on the destination rather than noisily skipping them (use --no-dirs or --no-d if you
want to turn that off).
The --archive (-a) option's behavior does not imply --recursive (-r), so specify it
explicitly, if you want it.
These side-effects change the default state of rsync, so the position of the
--files-from option on the command-line has no bearing on how other options are parsed
(e.g. -a works the same before or after --files-from, as does --no-R and all other
options).
The filenames that are read from the FILE are all relative to the source dir -- any
leading slashes are removed and no ".." references are allowed to go higher than the source
dir. For example, take this command:
rsync -a --files-from=/tmp/foo /usr remote:/backup
If /tmp/foo contains the string "bin" (or even "/bin"), the /usr/bin directory will be
created as /backup/bin on the remote host. If it contains "bin/" (note the trailing slash),
the immediate contents of the directory would also be sent (without needing to be
explicitly mentioned in the file -- this began in version 2.6.4). In both cases, if the -r
option was enabled, that dir's entire hierarchy would also be transferred (keep in mind
that -r needs to be specified explicitly with --files-from, since it is not implied by -a).
Also note that the effect of the (enabled by default) --relative option is to duplicate
only the path info that is read from the file -- it does not force the duplication of the
source-spec path (/usr in this case).
In addition, the --files-from file can be read from the remote host instead of the local
host if you specify a "host:" in front of the file (the host must match one end of the
transfer). As a short-cut, you can specify just a prefix of ":" to mean "use the remote end
of the transfer". For example:
rsync -a --files-from=:/path/file-list src:/ /tmp/copy
This would copy all the files specified in the /path/file-list file that was located on
the remote "src" host.
If the --iconv and --protect-args options are specified and the --files-from filenames
are being sent from one host to another, the filenames will be translated from the sending
host's charset to the receiving host's charset.
NOTE: sorting the list of files in the --files-from input helps rsync to be more
efficient, as it will avoid re-visiting the path elements that are shared between adjacent
entries. If the input is not sorted, some path elements (implied directories) may end up
being scanned multiple times, and rsync will eventually unduplicate them after they get
turned into file-list elements.
Note that you still have to specify the directory where the files listed are located, for
instance: rsync -av --files-from=file-list . target/ for copying files from the
current dir. – Nicolas Mattia
Feb 11 '16 at 11:06
if the files-from file has anything starting with .. rsync appears to ignore the
.. giving me an error like rsync: link_stat
"/home/michael/test/subdir/test.txt" failed: No such file or directory (in this case
running from the "test" dir and trying to specify "../subdir/test.txt" which does exist.
– Michael
Nov 2 '16 at 0:09
xxx,
--files-from= parameter needs trailing slash if you want to keep the absolute
path intact. So your command would become something like below:
rsync -av --files-from=/path/to/file / /tmp/
This could be done like there are a large number of files and you want to copy all files
to x path. So you would find the files and throw output to a file like below:
I have one older ubuntu server, and one newer debian server and I am migrating data from the old
one to the new one. I want to use rsync to transfer data across to make final migration easier and
quicker than the equivalent tar/scp/untar process.
As an example, I want to sync the home folders one at a time to the new server. This requires
root access at both ends as not all files at the source side are world readable and the destination
has to be written with correct permissions into /home. I can't figure out how to give rsync root
access on both sides.
I've seen a few related questions, but none quite match what I'm trying to do.
Actually you do NOT need to allow root authentication via SSH to run rsync as Antoine suggests.
The transport and system authentication can be done entirely over user accounts as long as
you can run rsync with sudo on both ends for reading and writing the files.
As a user on your destination server you can suck the data from your source server like
this:
The user you run as on both servers will need passwordless* sudo access to the rsync binary,
but you do NOT need to enable ssh login as root anywhere. If the user you are using doesn't
match on the other end, you can add user@boron: to specify a different remote user.
Good luck.
*or you will need to have entered the password manually inside the timeout window.
Although this is an old question I'd like to add word of CAUTION to this
accepted answer. From my understanding allowing passwordless "sudo rsync" is equivalent
to open the root account to remote login. This is because with this it is very easy
to gain full root access, e.g. because all system files can be downloaded, modified
and replaced without a password. –
Ascurion
Jan 8 '16 at 16:30
Good point. In a trusted environment, you'll pick up a lot of speed by not encrypting.
It might not matter on small files, but with GBs of data it will. –
pboin
May 18 '10 at 10:53
How do I use the rsync tool to copy
only the hidden files and directory (such as ~/.ssh/, ~/.foo, and so on) from /home/jobs directory
to the /mnt/usb directory under Unix like operating system?
The rsync program is used for synchronizing files over a network or local disks. To view or display
only hidden files with ls command:
ls -ld ~/.??*
OR
ls -ld ~/.[^.]*
Sample outputs:
Fig:01 ls command to view only hidden files
rsync not synchronizing all hidden .dot files?
In this example, you used the pattern .[^.]* or .??* to
select and display only hidden files using ls command . You can use the same pattern with any
Unix command including rsync command. The syntax is as follows to copy hidden files with rsync:
In this example, copy all hidden files from my home directory to /mnt/test:
rsync -avzP ~/.[^.]* /mnt/test
rsync -avzP ~/.[^.]* /mnt/test
Sample outputs:
Fig.02 Rsync example to copy only hidden files
Vivek Gite is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating
system/Unix shell scripting. He has worked with global clients and in various industries, including
IT, education, defense and space research, and the nonprofit sector. Follow him on
Twitter ,
Facebook ,
Google+ .
Using ssh means encryption, which makes things slower. --force does only affect
directories, if I read the man page correctly. –
Torsten Bronger
Jan 1 '13 at 23:08
Unless your using ancient kit, the CPU overhead of encrypting / decrypting the
traffic shouldn't be noticeable, but you will loose 10-20% of your bandwidth,
through the encapsulation process. Then again 80% of a working link is better than
100% of a non working one :) –
arober11
Jan 2 '13 at 10:52
do
have an "ancient kit". ;-) (Slow ARM CPU on a NAS.) But I now mount
the NAS with NFS and use rsync (with "sudo") locally. This solves the problem (and
is even faster). However, I still think that my original problem must be solvable
using the rsync protocol (remote, no ssh). –
Torsten Bronger
Jan 4 '13 at 7:55
On my Ubuntu server there are about 150 shell accounts. All usernames begin with the prefix
u12.. I have root access and I am trying to copy a directory named "somefiles" to all the
home directories. After copying the directory the user and group ownership of the directory
should be changed to user's. Username, group and home-dir name are same. How can this be
done?
Do the copying as the target user. This will automatically make the target files. Make sure
that the original files are world-readable (or at least readable by all the target users).
Run chmod afterwards if you don't want the copied files to be world-readable.
getent passwd |
awk -F : '$1 ~ /^u12/ {print $1}' |
while IFS= read -r user; do
su "$user" -c 'cp -Rp /original/location/somefiles ~/'
done
I am using rsync to replicate a web folder structure from a local server to a remote server.
Both servers are ubuntu linux. I use the following command, and it works well:
The usernames for the local system and the remote system are different. From what I have
read it may not be possible to preserve all file and folder owners and groups. That is OK,
but I would like to preserve owners and groups just for the www-data user, which does exist
on both servers.
Is this possible? If so, how would I go about doing that?
I ended up getting the desired affect thanks to many of the helpful comments and answers
here. Assuming the IP of the source machine is 10.1.1.2 and the IP of the destination machine
is 10.1.1.1. I can use this line from the destination machine:
This preserves the ownership and groups of the files that have a common user name, like
www-data. Note that using
rsync
without
sudo
does not preserve
these permissions.
This lets you authenticate as
user
on targethost, but still get privileged
write permission through
sudo
. You'll have to modify your sudoers file on the
target host to avoid sudo's request for your password.
man sudoers
or run
sudo visudo
for instructions and samples.
You mention that you'd like to retain the ownership of files owned by www-data, but not
other files. If this is really true, then you may be out of luck unless you implement
chown
or a second run of
rsync
to update permissions. There is no
way to tell rsync to preserve ownership for
just one user
.
That said, you should read about rsync's
--files-from
option.
As far as I know, you cannot
chown
files to somebody else than you, if you are
not root. So you would have to
rsync
using the
www-data
account, as
all files will be created with the specified user as owner. So you need to
chown
the files afterwards.
The root users for the local system and the remote system are different.
What does this mean? The
root
user is uid 0. How are they different?
Any user with read permission to the directories you want to copy can determine what
usernames own what files. Only root can change the ownership of files being
written
.
You're currently running the command on the source machine, which restricts your writes to
the permissions associated with [email protected]. Instead, you can try to run the command
as
root
on the
target
machine. Your
read
access on the source machine
isn't an issue.
So on the target machine (10.1.1.1), assuming the source is 10.1.1.2:
Also, set up access to [email protected] using a DSA or RSA key, so that you can avoid having
passwords floating around. For example, as root on your target machine, run:
# ssh-keygen -d
Then take the contents of the file
/root/.ssh/id_dsa.pub
and add it to
~user/.ssh/authorized_keys
on the source machine. You can
ssh
[email protected] as root from the target machine to see if it works. If you get a
password prompt, check your error log to see why the key isn't working.
I'm trying to use rsync to copy a set of files from one system to another. I'm running
the command as a normal user (not root). On the remote system, the files are owned by
apache and when copied they are obviously owned by the local account (fred).
My problem is that every time I run the rsync command, all files are re-synched even
though they haven't changed. I think the issue is that rsync sees the file owners are
different and my local user doesn't have the ability to change ownership to apache, but
I'm not including the
-a
or
-o
options so I thought this would
not be checked. If I run the command as root, the files come over owned by apache and do
not come a second time if I run the command again. However I can't run this as root for
other reasons. Here is the command:
Why can't you run rsync as root? On the remote system, does fred have read
access to the apache-owned files? –
chrishiestand
May 3 '11 at 0:32
Ah, I left out the fact that there are ssh keys set up so that local fred can
become remote root, so yes fred/root can read them. I know this is a bit convoluted
but its real. –
Fred Snertz
May 3 '11 at 14:50
Always be careful when root can ssh into the machine. But if you have password
and challenge response authentication disabled it's not as bad. –
chrishiestand
May 3 '11 at 17:32
-c, --checksum
This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option,
rsync uses a "quick check" that (by default) checks if each file's size and time of last modification match between the
sender and receiver. This option changes this to compare a 128-bit checksum for each file that has a matching size.
Generating the checksums means that both sides will expend a lot of disk I/O reading all the data in the files in the
transfer (and this is prior to any reading that will be done to transfer changed files), so this can slow things down
significantly.
The sending side generates its checksums while it is doing the file-system scan that builds the list of the available
files. The receiver generates its checksums when it is scanning for changed files, and will checksum any file that has
the same size as the corresponding sender's file: files with either a changed size or a changed checksum are selected
for transfer.
Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking
a whole-file checksum that is generated as the file is transferred, but that automatic after-the-transfer verification
has nothing to do with this option's before-the-transfer "Does this file need to be updated?" check.
For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5. For older protocols, the checksum used
is MD4.
I have a bash script which uses
rsync
to backup files in Archlinux. I noticed
that
rsync
failed to copy a file from
/sys
, while
cp
worked just fine:
# rsync /sys/class/net/enp3s1/address /tmp
rsync: read errors mapping "/sys/class/net/enp3s1/address": No data available (61)
rsync: read errors mapping "/sys/class/net/enp3s1/address": No data available (61)
ERROR: address failed verification -- update discarded.
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.9]
# cp /sys/class/net/enp3s1/address /tmp ## this works
I wonder why does
rsync
fail, and is it possible to copy the file with
it?
Rsync has
code
which specifically checks if a file is truncated during read and gives this error !
ENODATA
. I don't know
why
the files in
/sys
have this
behavior, but since they're not real files, I guess it's not too surprising. There doesn't
seem to be a way to tell rsync to skip this particular check.
I think you're probably better off not rsyncing
/sys
and using specific
scripts to cherry-pick out the particular information you want (like the network card
address).
First off
/sys
is a
pseudo file system
. If you look at
/proc/filesystems
you will find a list of registered file systems where quite a
few has
nodev
in front. This indicates they are
pseudo filesystems
.
This means they exists on a running kernel as a RAM-based filesystem. Further they do not
require a block device.
Further you can do a
stat
on a file and notice another distinct feature; it
occupies 0 blocks. Also inode of root (stat /sys) is 1.
/stat/fs
typically has
inode 2. etc.
rsync vs. cp
The easiest explanation for rsync failure of synchronizing pseudo files is perhaps by
example.
Say we have a file named
address
that is 18 bytes. An
ls
or
stat
of the file reports 4096 bytes.
rsync
Opens file descriptor, fd.
Uses fstat(fd) to get information such as size.
Set out to read size bytes, i.e. 4096. That would be
line 253
of the code linked by
@mattdm
.
read_size ==
4096
Ask; read: 4096 bytes.
A short string is read i.e. 18 bytes.
nread == 18
read_size = read_size - nread (4096 - 18 = 4078)
Ask; read: 4078 bytes
0 bytes read (as first read consumed all bytes in file).
During this process it actually reads the entire file. But with no size available it
cannot validate the result – thus failure is only option.
cp
Opens file descriptor, fd.
Uses fstat(fd) to get information such as st_size (also uses lstat and stat).
Check if file is likely to be sparse. That is the file has holes etc.
copy.c:1010
/* Use a heuristic to determine whether SRC_NAME contains any sparse
* blocks. If the file has fewer blocks than would normally be
* needed for a file of its size, then at least one of the blocks in
* the file is a hole. */
sparse_src = is_probably_sparse (&src_open_sb);
As
stat
reports file to have zero blocks it is categorized as sparse.
Tries to read file by extent-copy (a more efficient way to copy
normal
sparse
files), and fails.
Copy by sparse-copy.
Starts out with max read size of MAXINT.
Typically
18446744073709551615
bytes on a 32 bit system.
Ask; read 4096 bytes. (Buffer size allocated in memory from stat information.)
A short string is read i.e. 18 bytes.
Check if a hole is needed, nope.
Write buffer to target.
Subtract 18 from max read size.
Ask; read 4096 bytes.
0 bytes as all got consumed in first read.
Return success.
All OK. Update flags for file.
FINE.
,
Might be related, but extended attribute calls will fail on sysfs:
[root@hypervisor eth0]# lsattr address
lsattr: Inappropriate ioctl for device While reading flags on address
[root@hypervisor eth0]#
Looking at my strace it looks like rsync tries to pull in extended attributes by
default:
22964 <... getxattr resumed> , 0x7fff42845110, 132) = -1 ENODATA (No data
available)
I tried finding a flag to give rsync to see if skipping extended attributes resolves the
issue but wasn't able to find anything (
--xattrs
turns them
on
at the
destination).
I'm having some trouble with rsync. I'm trying to sync my local /etc directory to a remote
server, but this won't work.
The problem is that it seems he doesn't copy all the files.
The local /etc dir contains 15MB of data, after a rsync, the remote backup contains only 4.6MB
of data.
Scormen May 31st, 2009, 11:05 AM I found that if I do a local sync, everything goes fine.
But if I do a remote sync, it copies only 4.6MB.
Any idea?
LoneWolfJack May 31st, 2009, 05:14 PM never used rsync on a remote machine, but "sudo rsync"
looks wrong. you probably can't call sudo like that so the ssh connection needs to have the
proper privileges for executing rsync.
just an educated guess, though.
Scormen May 31st, 2009, 05:24 PM Thanks for your answer.
In /etc/sudoers I have added next line, so "sudo rsync" will work.
kris ALL=NOPASSWD: /usr/bin/rsync
I also tried without --rsync-path="sudo rsync", but without success.
I have also tried on the server to pull the files from the laptop, but that doesn't work
either.
LoneWolfJack May 31st, 2009, 05:30 PM in the rsync help file it says that --rsync-path is for
the path to rsync on the remote machine, so my guess is that you can't use sudo there as it
will be interpreted as a path.
so you will have to do --rsync-path="/path/to/rsync" and make sure the ssh login has root
privileges if you need them to access the files you want to sync.
--rsync-path="sudo rsync" probably fails because
a) sudo is interpreted as a path
b) the space isn't escaped
c) sudo probably won't allow itself to be called remotely
again, this is not more than an educated guess.
Scormen May 31st, 2009, 05:45 PM I understand what you mean, so I tried also:
sending incremental file list
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/chatscripts/pap":
Permission denied (13)
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/chatscripts/provider":
Permission denied (13)
rsync: symlink "/home/kris/backup/laptopkris/etc/cups/ssl/server.crt" ->
"/etc/ssl/certs/ssl-cert-snakeoil.pem" failed: Permission denied (13)
rsync: symlink "/home/kris/backup/laptopkris/etc/cups/ssl/server.key" ->
"/etc/ssl/private/ssl-cert-snakeoil.key" failed: Permission denied (13)
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/ppp/peers/provider":
Permission denied (13)
rsync: recv_generator: failed to stat
"/home/kris/backup/laptopkris/etc/ssl/private/ssl-cert-snakeoil.key": Permission denied
(13)
sent 86.85K bytes received 306 bytes 174.31K bytes/sec
total size is 8.71M speedup is 99.97
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at
main.c(1058) [sender=3.0.5]
And the same command with "root" instead of "kris".
Then, I get no errors, but I still don't have all the files synced.
Scormen June 1st, 2009, 09:00 AM Sorry for this bump.
I'm still having the same problem.
Any idea?
Thanks.
binary10 June 1st, 2009, 10:36 AM I understand what you mean, so I tried also:
And the same command with "root" instead of "kris".
Then, I get no errors, but I still don't have all the files synced.
Maybe there's a nicer way but you could place /usr/bin/rsync into a private protected area
and set the owner to root place the sticky bit on it and change your rsync-path argument such
like:
# on the remote side, aka [email protected]
mkdir priv-area
# protect it from normal users running a priv version of rsync
chmod 700 priv-area
cd priv-area
cp -p /usr/local/bin/rsync ./rsync-priv
sudo chown 0:0 ./rsync-priv
sudo chmod +s ./rsync-priv
ls -ltra # rsync-priv should now be 'bold-red' in bash
Looking at your flags, you've specified a cvs ignore factor, ignore files that are updated
on the target, and you're specifying a backup of removed files.
From those qualifiers you're not going to be getting everything sync'd. It's doing what
you're telling it to do.
If you really wanted to perform a like for like backup.. (not keeping stuff that's been
changed/deleted from the source. I'd go for something like the following.
Remove the --dry-run and -i when you're happy with the output, and it should do what you
want. A word of warning, I get a bit nervous when not seeing trailing (/) on directories as it
could lead to all sorts of funnies if you end up using rsync on softlinks.
Scormen June 1st, 2009, 12:19 PM Thanks for your help, binary10.
I've tried what you have said, but still, I only receive 4.6MB on the remote server.
Thanks for the warning, I'll not that!
Did someone already tried to rsync their own /etc to a remote system? Just to know if this
strange thing only happens to me...
Thanks.
binary10 June 1st, 2009, 01:22 PM Thanks for your help, binary10.
I've tried what you have said, but still, I only receive 4.6MB on the remote server.
Thanks for the warning, I'll not that!
Did someone already tried to rsync their own /etc to a remote system? Just to know if this
strange thing only happens to me...
Thanks.
Ok so I've gone back and looked at your original post, how are you calculating 15MB of data
under etc - via a du -hsx /etc/ ??
I do daily drive to drive backup copies via rsync and drive to network copies.. and have
used them recently for restoring.
Sure my du -hsx /etc/ reports 17MB of data of which 10MB gets transferred via an rsync. My
backup drives still operate.
rsync 3.0.6 has some fixes to do with ACLs and special devices rsyncing between solaris. but
I think 3.0.5 is still ok with ubuntu to ubuntu systems.
Here is my test doing exactly what you you're probably trying to do. I even check the remote
end..
Number of files: 3121
Number of files transferred: 1812
Total file size: 10.04M bytes
Total transferred file size: 10.00M bytes
Literal data: 10.00M bytes
Matched data: 0 bytes
File list size: 109.26K
File list generation time: 0.002 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10.20M
Total bytes received: 38.70K
sent 10.20M bytes received 38.70K bytes 4.09M bytes/sec
total size is 10.04M speedup is 0.98
binary10@jsecx25:~/bin-priv$ sudo du -hsx /etc/
17M /etc/
binary10@jsecx25:~/bin-priv$
And then on the remote system I do the du -hsx
binary10@lenovo-n200:/home/kris/backup/laptopkris/etc$ cd ..
binary10@lenovo-n200:/home/kris/backup/laptopkris$ sudo du -hsx etc
17M etc
binary10@lenovo-n200:/home/kris/backup/laptopkris$
Scormen June 1st, 2009, 01:35 PM ow are you calculating 15MB of data under etc - via a du -hsx
/etc/ ??
Indeed, on my laptop I see:
root@laptopkris:/home/kris# du -sh /etc/
15M /etc/
If I do the same thing after a fresh sync to the server, I see:
root@server:/home/kris# du -sh /home/kris/backup/laptopkris/etc/
4.6M /home/kris/backup/laptopkris/etc/
On both sides, I have installed Ubuntu 9.04, with version 3.0.5 of rsync.
So strange...
binary10 June 1st, 2009, 01:45 PM it does seem a bit odd.
I'd start doing a few diffs from the outputs find etc/ -printf "%f %s %p %Y\n" | sort
And see what type of files are missing.
- edit - Added the %Y file type.
Scormen June 1st, 2009, 01:58 PM Hmm, it's going stranger.
Now I see that I have all my files on the server, but they don't have their full size (bytes).
I have uploaded the files, so you can look into them.
binary10 June 1st, 2009, 02:16 PM If you look at the files that are different aka the ssl's
they are links to local files else where aka linked to /usr and not within /etc/
aka they are different on your laptop and the server
Scormen June 1st, 2009, 02:25 PM I understand that soft links are just copied, and not the
"full file".
But, you have run the same command to test, a few posts ago.
How is it possible that you can see the full 15MB?
binary10 June 1st, 2009, 02:34 PM I was starting to think that this was a bug with du.
The de-referencing is a bit topsy.
If you rsync copy the remote backup back to a new location back onto the laptop and do the
du command. I wonder if you'll end up with 15MB again.
Scormen June 1st, 2009, 03:20 PM Good tip.
On the server side, the backup of the /etc was still 4.6MB.
I have rsynced it back to the laptop, to a new directory.
If I go on the laptop to that new directory and do a du, it says 15MB.
binary10 June 1st, 2009, 03:34 PM Good tip.
On the server side, the backup of the /etc was still 4.6MB.
I have rsynced it back to the laptop, to a new directory.
If I go on the laptop to that new directory and do a du, it says 15MB.
I think you've now confirmed that RSYNC DOES copy everything.. just tht du confusing what
you had expected by counting the end link sizes.
It might also think about what you're copying, maybe you need more than just /etc of course
it depends on what you are trying to do with the backup :)
enjoy.
Scormen June 1st, 2009, 03:37 PM Yeah, it seems to work well.
So, the "problem" where just the soft links, that couldn't be counted on the server side?
binary10 June 1st, 2009, 04:23 PM Yeah, it seems to work well.
So, the "problem" where just the soft links, that couldn't be counted on the server side?
The links were copied as links as per the design of the --archive in rsync.
The contents of the pointing links were different between your two systems. These being that
that reside outside of /etc/ in /usr And so DU reporting them differently.
Scormen June 1st, 2009, 05:36 PM Okay, I got it.
Many thanks for the support, binarty10!
Scormen June 1st, 2009, 05:59 PM Just to know, is it possible to copy the data from these links
as real, hard data?
Thanks.
binary10 June 2nd, 2009, 09:54 AM Just to know, is it possible to copy the data from these
links as real, hard data?
Thanks.
Yep absolutely
You should then look at other possibilities of:
-L, --copy-links transform symlink into referent file/dir
--copy-unsafe-links only "unsafe" symlinks are transformed
--safe-links ignore symlinks that point outside the source tree
-k, --copy-dirlinks transform symlink to a dir into referent dir
-K, --keep-dirlinks treat symlinked dir on receiver as dir
but then you'll have to start questioning why you are backing them up like that especially
stuff under /etc/. If you ever wanted to restore it you'd be restoring full files and not
symlinks the restore result could be a nightmare as well as create future issues (upgrades etc)
let alone your backup will be significantly larger, could be 150MB instead of 4MB.
Scormen June 2nd, 2009, 10:04 AM Okay, now I'm sure what its doing :)
Is it also possible to show on a system the "real disk usage" of e.g. that /etc directory? So,
without the links, that we get a output of 4.6MB.
Thank you very much for your help!
binary10 June 2nd, 2009, 10:22 AM What does the following respond with.
sudo du --apparent-size -hsx /etc
If you want the real answer then your result from a dry-run rsync will only be enough for
you.
Another interesting option, and my personal favorite because it
increases the power and flexibility of rsync immensely, is the
--link-dest
option. The
--link-dest
option allows a series
of daily backups that take up very little additional space for each
day and also take very little time to create.
Specify the previous
day's target directory with this option and a new directory for today.
rsync then creates today's new directory and a hard link for each file
in yesterday's directory is created in today's directory. So we now
have a bunch of hard links to yesterday's files in today's directory.
No new files have been created or duplicated. Just a bunch of hard
links have been created. Wikipedia has a very good description of
hard
links
. After creating the target directory for today with this set
of hard links to yesterday's target directory, rsync performs its sync
as usual, but when a change is detected in a file, the target hard
link is replaced by a copy of the file from yesterday and the changes
to the file are then copied from the source to the target.
There are also times when it is desirable to exclude certain
directories or files from being synchronized. For this, there is the
--exclude
option. Use this option and the pattern for the files
or directories you want to exclude. You might want to exclude browser
cache files so your new command will look like this.
Note that each file pattern you want to exclude must have a
separate exclude option.
rsync can sync files with remote hosts as either the source or the
target. For the next example, let's assume that the source directory
is on a remote computer with the hostname remote1 and the target
directory is on the local host. Even though SSH is the default
communications protocol used when transferring data to or from a
remote host, I always add the ssh option. The command now looks like
this.
This is the final form of my rsync backup command.
rsync has a very large number of options that you can use to
customize the synchronization process. For the most part, the
relatively simple commands that I have described here are perfect for
making backups for my personal needs. Be sure to read the extensive
man page for rsync to learn about more of its capabilities as well as
the options discussed here.
It can perform differential uploads and downloads (synchronization) of files across the network,
transferring only data that has changed. The rsync remote-update protocol allows rsync to transfer
just the differences between two sets of files across the network connection.
How do I install rsync?
Use any one of the following commands to install rsync. If you are using Debian or Ubuntu Linux,
type the following command: # apt-get install rsync
OR $ sudo apt-get install rsync
If you are using Red Hat Enterprise Linux (RHEL) / CentOS 4.x or older version, type the following
command: # up2date rsync
RHEL / CentOS 5.x or newer (or Fedora Linux) user type the following command: # yum install rsync
Always use rsync over ssh
Since rsync does not provide any security while transferring data it is recommended that you use
rsync over ssh session. This allows a secure remote connection. Now let us see some examples of rsync
command.
Comman rsync command options
--delete : delete files that don't exist on sender (system)
-v : Verbose (try -vv for more detailed information)
-e "ssh options" : specify the ssh as remote shell
-a : archive mode
-r : recurse into directories
-z : compress file data
Task : Copy file from a local computer to a remote server
Copy file from /www/backup.tar.gz to a remote server called openbsd.nixcraft.in $ rsync -v -e ssh /www/backup.tar.gz [email protected]:~
Output:
Password:
sent 19099 bytes received 36 bytes 1093.43 bytes/sec
total size is 19014 speedup is 0.99
Please note that symbol ~ indicate the users home directory (/home/jerry).
Task : Copy file from a remote server to a local computer
Copy file /home/jerry/webroot.txt from a remote server openbsd.nixcraft.in to a local computer's
/tmp directory: $ rsync -v -e ssh [email protected]:~/webroot.txt /tmp
Task: Synchronize a local directory with a remote directory
Task: Synchronize a local directory with a remote rsync server or vise-versa
$ rsync -r -a -v --delete rsync://rsync.nixcraft.in/cvs /home/cvs
OR $ rsync -r -a -v --delete /home/cvs rsync://rsync.nixcraft.in/cvs
Task: Mirror a directory between my "old" and "new" web server/ftp
You can mirror a directory between my "old" (my.old.server.com) and "new" web server with the
command (assuming that ssh keys are set for password less authentication) $ rsync -zavrR --delete --links --rsh="ssh -l vivek" my.old.server.com:/home/lighttpd /home/lighttpd
The rdiff command uses the rsync algorithm. A utility called rdiff-backup has been created which
is capable of maintaining a backup mirror of a file or directory over the network, on another server.
rdiff-backup stores incremental rdiff deltas with the backup, with which it is possible to recreate
any backup point. Next time I will write about these utilities.
rsync for Windows Server/XP/7/8
Please note if you are using MS-Windows, try any one of the program:
The purpose of creating a mirror of your Web Server with Rsync is if your main web server fails,
your backup server can take over to reduce downtime of your website. This way of creating a web server
backup is very good and effective for small and medium size web businesses. Advantages of Syncing
Web Servers
The main advantages of creating a web server backup with rsync are as follows:
Rsync syncs only those bytes and blocks of data that have changed.
Rsync has the ability to check and delete those files and directories at backup server that
have been deleted from the main web server.
It takes care of permissions, ownerships and special attributes while copying data remotely.
It also supports SSH protocol to transfer data in an encrypted manner so that you will be
assured that all data is safe.
Rsync uses compression and decompression method while transferring data which consumes less
bandwidth.
How To Sync Two Apache Web Servers
Let's proceed with setting up rsync to create a mirror of your web server. Here, I'll be using
two servers.
Main Server
IP Address : 192.168.0.100
Hostname : webserver.example.com
Backup Server
IP Address : 192.168.0.101
Hostname : backup.example.com
Step 1: Install Rsync Tool
Here in this case web server data of webserver.example.com will be mirrored on backup.example.com
. And to do so first, we need to install Rsync on both the server with the help of following command.
[root@tecmint]# yum install rsync [On
Red Hat
based systems]
[root@tecmint]# apt-get install rsync [On
Debian
based systems]
Step 2: Create a User to run Rsync
We can setup rsync with root user, but for security reasons, you can create an unprivileged user
on main webserver i.e webserver.example.com to run rsync.
[email protected]'s password:
receiving incremental file list
sent 128 bytes received 32.67K bytes 5.96K bytes/sec
total size is 12.78M speedup is 389.70
You can see that your rsync is now working absolutely fine and syncing data. I have used " /var/www
" to transfer; you can change the folder location according to your needs.
Step 4: Automate Sync with SSH Passwordless Login
Now, we are done with rsync setups and now its time to setup a cron for rsync. As we are going
to use rsync with SSH protocol, ssh will be asking for authentication and if we won't provide a password
to cron it will not work. In order to work cron smoothly, we need to setup passwordless ssh logins
for rsync.
Here in this example, I am doing it as root to preserve file ownerships as well, you can do it
for alternative users too.
First, we'll generate a public and private key with following commands on backups server (i.e.
backup.example.com ).
[root@backup]# ssh-keygen -t rsa -b 2048
When you enter this command, please don't provide passphrase and click enter for Empty passphrase
so that rsync cron will not need any password for syncing data.
Sample Output
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
9a:33:a9:5d:f4:e1:41:26:57:d0:9a:68:5b:37:9c:23 [email protected]
The key's randomart image is:
+--[ RSA 2048]----+
| .o. |
| .. |
| ..++ . |
| o=E * |
| .Sooo o |
| =.o o |
| * . o |
| o + |
| . . |
+-----------------+
Now, our Public and Private key has been generated and we will have to share it with main server
so that main web server will recognize this backup machine and will allow it to login without asking
any password while syncing data.
Let's setup a cron for this. To setup a cron, please open crontab file with the following command.
[root@backup ~]# crontab –e
It will open up /etc/crontab file to edit with your default editor. Here In this example, I am
writing a cron to run it every 5 minutes to sync the data.
The above cron and rsync command simply syncing " /var/www/ " from the main web server to a backup
server in every 5 minutes . You can change the time and folder location configuration according to
your needs. To be more creative and customize with Rsync and Cron command, you can check out our
more detailed articles at:
Great demonstration and very easy to follow Don! Just a
note to anyone who might come across this and start
using it in production based systems is that you
certainly would not want to be rsyncing with root
accounts. In addition you would use key based auth with
SSH as an additional layer of security. Just my 2cents
;-)
curtis shaw
11
months ago
Best rsync tutorial on the web. Thanks.
These two options allows us to include and exclude
files by specifying parameters with these option helps us to specify those files or directories
which you want to include in your sync and exclude files and folders with you don't want
to be transferred.
Here in this example, rsync command will include those files and directory only which
starts with 'R' and exclude all other files and directory.
[root@tecmint]# rsync -avze ssh --include 'R*' --exclude '*' [email protected]:/var/lib/rpm/ /root/rpm
[email protected]'s password:
receiving incremental file list
created directory /root/rpm
./
Requirename
Requireversion
sent 67 bytes received 167289 bytes 7438.04 bytes/sec
total size is 434176 speedup is 2.59
6. Use of –delete Option
If a file or directory not exist at the source, but already exists at the destination,
you might want to delete that existing file/directory at the target while syncing .
We can use '–delete' option to delete files that are not there in source
directory.
Source and target are in sync. Now creating new file test.txt at the
target.
[root@tecmint]# touch test.txt
[root@tecmint]# rsync -avz --delete [email protected]:/var/lib/rpm/ .
Password:
receiving file list ... done
deleting test.txt
./
sent 26 bytes received 390 bytes 48.94 bytes/sec
total size is 45305958 speedup is 108908.55
Target has the new file called test.txt, when synchronize with the source
with '–delete' option, it removed the file test.txt.
7. Set the Max Size of Files to be Transferred
You can specify the Max file size to be transferred or sync. You can
do it with "–max-size" option. Here in this example, Max file size is
200k, so this command will transfer only those files which are equal or
smaller than 200k.
[root@tecmint]# rsync -avzhe ssh --max-size='200k' /var/lib/rpm/ [email protected]:/root/tmprpm
[email protected]'s password:
sending incremental file list
created directory /root/tmprpm
./
Conflictname
Group
Installtid
Name
Provideversion
Pubkeys
Requireversion
Sha1header
Sigmd5
Triggername
__db.001
sent 189.79K bytes received 224 bytes 13.10K bytes/sec
total size is 38.08M speedup is 200.43
8. Automatically Delete source Files after successful Transfer
Now, suppose you have a main web server and a data backup server, you created a daily
backup and synced it with your backup server, now you don't want to keep that local copy
of backup in your web server.
So, will you wait for transfer to complete and then delete those local backup file manually?
Of Course NO. This automatic deletion can be done using '–remove-source-files'
option.
[root@tecmint]# rsync --remove-source-files -zvh backup.tar /tmp/backups/
backup.tar
sent 14.71M bytes received 31 bytes 4.20M bytes/sec
total size is 16.18M speedup is 1.10
[root@tecmint]# ll backup.tar
ls: backup.tar: No such file or directory
9. Do a Dry Run with rsync
If you are a newbie and using rsync and don't know what exactly your command going do.
Rsync could really mess up the things in your destination folder and then doing an undo
can be a tedious job.
Use of this option will not make any changes only do a dry run of the command and shows
the output of the command, if the output shows exactly same you want to do then you can
remove '–dry-run' option from your command and run on the terminal.
root@tecmint]# rsync --dry-run --remove-source-files -zvh backup.tar /tmp/backups/
backup.tar
sent 35 bytes received 15 bytes 100.00 bytes/sec
total size is 16.18M speedup is 323584.00 (DRY RUN)
10. Set Bandwidth Limit and Transfer File
You can set the bandwidth limit while transferring data from one machine to another machine
with the the help of '–bwlimit' option. This options helps us to limit
I/O bandwidth.
[root@tecmint]# rsync --bwlimit=100 -avzhe ssh /var/lib/rpm/ [email protected]:/root/tmprpm/
[email protected]'s password:
sending incremental file list
sent 324 bytes received 12 bytes 61.09 bytes/sec
total size is 38.08M speedup is 113347.05
Also, by default rsync syncs changed blocks and bytes only, if you want explicitly want
to sync whole file then you use '-W' option with it.
[root@tecmint]# rsync -zvhW backup.tar /tmp/backups/backup.tar
backup.tar
sent 14.71M bytes received 31 bytes 3.27M bytes/sec
total size is 16.18M speedup is 1.10
That's all with rsync now, you can see man pages for more options. Stay
connected with Tecmint for more exciting and interesting tutorials in future.
Do leave your comments and suggestions.
Back-up (push) your photo directories to a second drive: rsync -av ~/Pictures /mnt/drive2
This creates a backup of your photos in /mnt/drive2/Pictures/
Back-up to a USB thumb drive is similar: rsync -av ~/Pictures /media/KINGSTON
When you add new photos, just re-execute this rsync command to backup the latest changes.
Note: The drive name will be dependent on the manufacturer.
[Potential Pitfall]: Do not include the source directory
name in the destination. rsync -av ~/Pictures /mnt/drive2/Pictures
This will result in /mnt/drive2/Pictures/Pictures/
Note that rsync destinations acts just like the cp and rcp commands.
Also note that rsync -av ~/Pictures/ /mnt/drive2/ has a different behavior from rsync
-av ~/Pictures /mnt/drive2
Back-up (push) two directory source to a single directory on the second drive: rsync -av ~/Pictures ~/Images /mnt/drive2
This creates a backup of your photos from the two directories in /mnt/drive2/Pictures/
Sync directories and if any were deleted in ~/Pictures, also delete it in /mnt/drive2/Pictures/:
rsync -a --progress --delete ~/Pictures /mnt/drive2
This creates a backup in /mnt/drive2/Pictures/
Sync one specific file in a directory: rsync -a ~/Pictures/Group-photo-2001-A.jpg /mnt/drive2/Pictures
Sync a group of specific files in a directory: rsync -a ~/Pictures/2001-*.jpg /mnt/drive2/Pictures
This creates a backup in /mnt/drive2/Pictures/
Note that when transferring files only, the directory name has to be provided in the destination
path.
Sync files and directories listed in a file: rsync -ar --files-from=Filelist.txt ~/Data /mnt/drive2
This creates a backup in /mnt/drive2/Data/
Directory paths are included if specified with a closing slash such as pathx/pathy/pathz/. Path
names must be terminated with a "/" or "/."
Back-up (push) your source code and compress to save space. Ignore object files: rsync -avz --delete --exclude='*.o' --exclude='*.so' ~/src /mnt/drive2
This creates a backup in /mnt/drive2/src/ but does not transfer files with the ".o" and
".so" extensions.
Back-up (push) your source code and ignore object and shared object code files: rsync -av --delete --filter='+ *.[ch]' --filter='- *.o' ~/src /mnt/drive2
This transfers files with the extension ".c" and ".h" but does not transfer object files with
the ".o" extensions.
same as rsync.exe -av --exclude='*.o' --filter='+ *.c *.h' ~/src /mnt/drive2
Back-up (push) your source code and ignore CM directories, object and shared object code files:
rsync -artv --progress --delete --filter='+ *.[ch]' --filter='- *.o' --exclude=".svn" ~/src
/mnt/drive2
Note that --exclude overides the include filter --filter='+ *.[ch]' so that
".c" and ".h" files under .svn/ are not copied.
Referencing directories - errors and subtleties:
rsync -ar dir1 dir2
This will copy dir1 into dir2 to give dir2/dir1 thus dir1/file.txt
gets copied to dir2/dir1/file.txt
rsync -ar dir1/ dir2/
This will copy the contents of dir1 into dir2 to give dir2/contents-of-dir1,
eg dir1/file.txt gets copied to dir2/file.txt
The following all achieve the same results:
rsync -ar dir1/ dir2/
rsync -ar dir1/* dir2
rsync -ar dir1/* dir2/
Rsync Options: (partial list)
Command line argument
Description
-a
(--archive)
Archive.
Includes options:
-r: recursion
-l: preserve symbolic links as symbolic links. Opposite of -L
-p: preserve permissions (Linux/unix only)
-t: preserve file modification time
-g: preserve group ownership
-o: preserve user ownership
-D: preserve special files and devices (Linux/unix only)
-d
(--dirs)
Copy directory tree structure without copying the files within the directories
--existing
Update only existing files from source directory which are already present at the destination.
No new files will be transferred.
-L
(--copy-links)
Transform a symbolic link to a copied file upon transfer
--stats
Print verbose set of statistics on the transfer
Add -h (--human-readable) to print stats in an understandable fashion
-p
(--perms)
Preserve permissions (not relevant for MS/Windows client)
-r
(--recursive)
Recursive through directories and sub-directories
-t
(--times)
Preserve file modification times
-v
(--verbose)
Verbose
-z
(--compress)
Compress files during transfer to reduce network bandwidth. Files not stored in an altered
or compressed state.
Note that compression will have little or no effect on JPG, PNG and files already using
compression.
Use arguments --skip-compress=gz/bz2/jpg/jpeg/ogg/mp[34]/mov/avi/rpm/deb/ to avoid
compressing files already compressed
--delete
Delete extraneous files from destination directories. Delete files on archive server
if they were also deleted on client.
Use the argument -m (--prune-empty-dirs) to delete empty directories (no
longer useful after its contents are deleted)
--include
--exclude
--filter
Specify a pattern for specific inclusion or exclusion or use the more universal
filter for inclusion (+)/exclusion (-).
Do not transfer files ending with ".o": --exclude='*.o'
Transfer all files ending with ".c" or ".h": --filter='+ *.[ch]'
-i
(--itemize-changes)
Print information about the transfer. List everything (all file copies and file changes)
rsync is going to perform
--list-only
--dry-run
Don't copy anything, just list what rsync would copy if this option was not given. This
helps when debugging the correct exclusion/inclusion filters.
--progress
Shows percent complete, Kb transferred and Kb/s transfer rate. Includes verbose output.
Note that rsync will be able to handle files with blanks in the file name or directory name as well
as with dashes ("-") or underscores ("_").
Rsync Client-Server Configuration and Operation:
Rsync can be configured in multiple client-server modes.
connect client to a sever running rsync in daemon mode
connect client to a sever using a ssh shell
These configurations are specified with the use of the colon ":"
Double colon refers to a connection to a host running the rsync daemon in the format hostname::module/path
where the module name is identified by the configuration in /etc/rsyncd.conf. The double colon
is equivalent to using the URL prefix rsync://
Single colon refers to the use of a remote shell
No colon then the directory is considered to be local to the system.
1) Rsync daemon server:
The Rsync server is often referred to as rsyncd or the rsync daemon. This is in fact the same
rsync executable run with the command line argument "--daemon". This can be run stand-alone
or using xinetd as is typically configured on most Linux distributions.
Configure xinetd to manage rsync:
File: /etc/xinetd.d/rsync
Default: "disable = yes". Change to "disable = no"
Typical Linux distributions do not pre-configure rsync for server use. Both Ubuntu and Red Hat
based distributions require that one generates the configuration file "/etc/rsyncd.conf"
Push: rsync -avr /home/user1/Proj1/Data server-host-name::Proj1
(eg. update server backup from mobile laptop)
This will initially copy over directory Data and all of its contents to /tmp/Proj1/Data
on the remote server.
Pull: rsync -avr server-host-name::Proj1 /home/user1/Proj1/Data
(eg. update mobile laptop from server backup)
2) Rsync to server using ssh shell:
Using this method does not use the configuration "modules" in /etc/rsyncd.conf but instead
uses the paths as if logged in using ssh.
First configure ssh for "password-less" login:
Note that current Linux distributions use ssh version 2 and rsa.
[user1@myclient ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/user1/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/user1/.ssh/id_rsa.
Your public key has been saved in /home/user1/.ssh/id_rsa.pub.
The key fingerprint is:
aa:1c:76:33:8a:9c:10:51:............
Note that "Enter" was pressed when asked for a "passphrase" to take the default.
Two files are generated:
Local client (private key): ~/.ssh/id_rsa
Contents (one line) of file (public key): ~/.ssh/id_rsa.pub to be copied into file on server:
~/.ssh/authorized_keys
Now try logging into the machine, with "ssh 'user1@remote-server'", and check in: .ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
Test "password-less" ssh connection: ssh remote-server
This command should log you in without asking for a password.
Example 1. Synchronize Two Directories in a Local Server
To sync two directories in a local computer, use the following rsync -zvr command.
$ rsync -zvr /var/opt/installation/inventory/ /root/temp
building file list ... done
sva.xml
svB.xml
.
sent 26385 bytes received 1098 bytes 54966.00 bytes/sec
total size is 44867 speedup is 1.63
$
In the above rsync example:
-z is to enable compression
-v verbose
-r indicates recursive
Now let us see the timestamp on one of the files that was copied from source to destination. As
you see below, rsync didn't preserve timestamps during sync.
$ ls -l /var/opt/installation/inventory/sva.xml /root/temp/sva.xml
-r--r--r-- 1 bin bin 949 Jun 18 2009 /var/opt/installation/inventory/sva.xml
-r--r--r-- 1 root bin 949 Sep 2 2009 /root/temp/sva.xml
Example 2. Preserve timestamps during Sync using rsync -a
rsync option -a indicates archive mode. -a option does the following,
Recursive mode
Preserves symbolic links
Preserves permissions
Preserves timestamp
Preserves owner and group
Now, executing the same command provided in example 1 (But with the rsync option -a) as shown
below:
$ rsync -azv /var/opt/installation/inventory/ /root/temp/
building file list ... done
./
sva.xml
svB.xml
.
sent 26499 bytes received 1104 bytes 55206.00 bytes/sec
total size is 44867 speedup is 1.63
$
As you see below, rsync preserved timestamps during sync.
$ ls -l /var/opt/installation/inventory/sva.xml /root/temp/sva.xml
-r--r--r-- 1 root bin 949 Jun 18 2009 /var/opt/installation/inventory/sva.xml
-r--r--r-- 1 root bin 949 Jun 18 2009 /root/temp/sva.xml
Example 3. Synchronize Only One File
To copy only one file, specify the file name to rsync command, as shown below.
$ rsync -v /var/lib/rpm/Pubkeys /root/temp/
Pubkeys
sent 42 bytes received 12380 bytes 3549.14 bytes/sec
total size is 12288 speedup is 0.99
Example 4. Synchronize Files From Local to Remote
rsync allows you to synchronize files/directories between the local and remote system.
$ rsync -avz /root/temp/ [email protected]:/home/thegeekstuff/temp/
Password:
building file list ... done
./
rpm/
rpm/Basenames
rpm/Conflictname
sent 15810261 bytes received 412 bytes 2432411.23 bytes/sec
total size is 45305958 speedup is 2.87
While doing synchronization with the remote server, you need to specify username and ip-address
of the remote server. You should also specify the destination directory on the remote server. The
format is username@machinename:path
As you see above, it asks for password while doing rsync from local to remote server.
Sometimes you don't want to enter the password while backing up files from local to remote server.
For example, If you have a backup shell script, that copies files from local to remote server using
rsync, you need the ability to rsync without having to enter the password.
When you want to synchronize files from remote to local, specify remote path in source and local
path in target as shown below.
$ rsync -avz [email protected]:/var/lib/rpm /root/temp
Password:
receiving file list ... done
rpm/
rpm/Basenames
.
sent 406 bytes received 15810230 bytes 2432405.54 bytes/sec
total size is 45305958 speedup is 2.87
Example 6. Remote shell for Synchronization
rsync allows you to specify the remote shell which you want to use. You can use rsync ssh to enable
the secured remote connection.
Use rsync -e ssh to specify which remote shell to use. In this case, rsync will use ssh.
$ rsync -avz -e ssh [email protected]:/var/lib/rpm /root/temp
Password:
receiving file list ... done
rpm/
rpm/Basenames
sent 406 bytes received 15810230 bytes 2432405.54 bytes/sec
total size is 45305958 speedup is 2.87
Example 7. Do Not Overwrite the Modified Files at the Destination
In a typical sync situation, if a file is modified at the destination, we might not want to overwrite
the file with the old file from the source.
Use rsync -u option to do exactly that. (i.e do not overwrite a file at the destination, if it
is modified). In the following example, the file called Basenames is already modified at the destination.
So, it will not be overwritten with rsync -u.
$ ls -l /root/temp/Basenames
total 39088
-rwxr-xr-x 1 root root 4096 Sep 2 11:35 Basenames
$ rsync -avzu [email protected]:/var/lib/rpm /root/temp
Password:
receiving file list ... done
rpm/
sent 122 bytes received 505 bytes 114.00 bytes/sec
total size is 45305958 speedup is 72258.31
$ ls -lrt
total 39088
-rwxr-xr-x 1 root root 4096 Sep 2 11:35 Basenames
Example 8. Synchronize only the Directory Tree Structure (not the files)
Use rsync -d option to synchronize only directory tree from source to the destination. The below
example, synchronize only directory tree in recursive manner, not the files in the directories.
$ rsync -v -d [email protected]:/var/lib/ .
Password:
receiving file list ... done
logrotate.status
CAM/
YaST2/
acpi/
sent 240 bytes received 1830 bytes 318.46 bytes/sec
total size is 956 speedup is 0.46
Example 9. View the rsync Progress during Transfer
When you use rsync for backup, you might want to know the progress of the backup. i.e how many
files are copies, at what rate it is copying the file, etc.
rsync –progress option displays detailed progress of rsync execution as shown below.
$ rsync -avz --progress [email protected]:/var/lib/rpm/ /root/temp/
Password:
receiving file list ...
19 files to consider
./
Basenames
5357568 100% 14.98MB/s 0:00:00 (xfer#1, to-check=17/19)
Conflictname
12288 100% 35.09kB/s 0:00:00 (xfer#2, to-check=16/19)
.
.
.
sent 406 bytes received 15810211 bytes 2108082.27 bytes/sec
total size is 45305958 speedup is 2.87
Example 10. Delete the Files Created at the Target
If a file is not present at the source, but present at the target, you might want to delete the
file at the target during rsync.
In that case, use –delete option as shown below. rsync delete option deletes files that are not
there in source directory.
# Source and target are in sync. Now creating new file at the target.
$ > new-file.txt
$ rsync -avz --delete [email protected]:/var/lib/rpm/ .
Password:
receiving file list ... done
deleting new-file.txt
./
sent 26 bytes received 390 bytes 48.94 bytes/sec
total size is 45305958 speedup is 108908.55
Target has the new file called new-file.txt, when synchronize with the source with –delete option,
it removed the file new-file.txt
Example 11. Do not Create New File at the Target
If you like, you can update (Sync) only the existing files at the target. In case source has new
files, which is not there at the target, you can avoid creating these new files at the target. If
you want this feature, use –existing option with rsync command.
First, add a new-file.txt at the source.
[/var/lib/rpm ]$ > new-file.txt
Next, execute the rsync from the target.
$ rsync -avz --existing [email protected]:/var/lib/rpm/ .
[email protected]'s password:
receiving file list ... done
./
sent 26 bytes received 419 bytes 46.84 bytes/sec
total size is 88551424 speedup is 198991.96
If you see the above output, it didn't receive the new file new-file.txt
Example 12. View the Changes Between Source and Destination
This option is useful to view the difference in the files or directories between source and destination.
$ ls -l /root/temp
-rw-r--r-- 1 root root 12288 May 28 2008 Conflictname
-rw-r--r-- 1 bin bin 1179648 Jun 24 05:27 Dirnames
-rw-r--r-- 1 root root 0 Sep 3 06:39 Basenames
In the above example, between the source and destination, there are two differences. First, owner
and group of the file Dirname differs. Next, size differs for the file Basenames.
Now let us see how rsync displays this difference. -i option displays the item changes.
$ rsync -avzi [email protected]:/var/lib/rpm/ /root/temp/
Password:
receiving file list ... done
>f.st.... Basenames
.f....og. Dirnames
sent 48 bytes received 2182544 bytes 291012.27 bytes/sec
total size is 45305958 speedup is 20.76
In the output it displays some 9 letters in front of the file name or directory name indicating
the changes.
In our example, the letters in front of the Basenames (and Dirnames) says the following:
> specifies that a file is being transferred to the local host.
f represents that it is a file.
s represents size changes are there.
t represents timestamp changes are there.
o owner changed
g group changed.
Example 13. Include and Exclude Pattern during File Transfer
rsync allows you to give the pattern you want to include and exclude files or directories while
doing synchronization.
$ rsync -avz --include 'P*' --exclude '*' [email protected]:/var/lib/rpm/ /root/temp/
Password:
receiving file list ... done
./
Packages
Providename
Provideversion
Pubkeys
sent 129 bytes received 10286798 bytes 2285983.78 bytes/sec
total size is 32768000 speedup is 3.19
In the above example, it includes only the files or directories starting with 'P' (using rsync
include) and excludes all other files. (using rsync exclude '*' )
Example 14. Do Not Transfer Large Files
You can tell rsync not to transfer files that are greater than a specific size using rsync –max-size
option.
$ rsync -avz --max-size='100K' [email protected]:/var/lib/rpm/ /root/temp/
Password:
receiving file list ... done
./
Conflictname
Group
Installtid
Name
Sha1header
Sigmd5
Triggername
sent 252 bytes received 123081 bytes 18974.31 bytes/sec
total size is 45305958 speedup is 367.35
max-size=100K makes rsync to transfer only the files that are less than or equal to 100K. You
can indicate M for megabytes and G for gigabytes.
Example 15. Transfer the Whole File
One of the main feature of rsync is that it transfers only the changed block to the destination,
instead of sending the whole file.
If network bandwidth is not an issue for you (but CPU is), you can transfer the whole file, using
rsync -W option. This will speed-up the rsync process, as it doesn't have to perform the checksum
at the source and destin t ... done ./ Basenames Conflictname Dirnames Filemd5s Group Installtid
Name sent 406 bytes received 15810211 bytes 2874657.64 bytes/sec total size is 45305958 speedup is
2.87
The main purpose of the rsync command in Linux distributions is to copy contents from an origin
to a destination, but it can also be used to state the differences between to directory trees. Besides,
both origin and destination can be local, or reside in a remote server.
We compare directoryA and directoryB using two rsync commands:
The basic usage of the rsync command to copy directoryA to directoryB is:
$ rsync directoryA/ directoryB/
Warning: It is important to end the origin directory name with the '/' character, because otherwise
the rsync would create a subdirectory named 'directoryA' under the destination directory 'directoryB'.
The options "–dry-run -v" tell rsync not to perform any actual copy, but just to print the
names of the files that it would copy.
Option "-r" tells rsync to execute recursively.
Option "-c" sets that the file comparison is to be performed by computing a checksum of the
content of the files, instead of just comparing the date and size of the files.
Finally, option "–delete" tells rsync to remove existing files in the destination directory,
that do not exist in the origin directory (but because of the –dry-run option, it will just print
the names of the files that it would delete, with lines like this: "deleting filename").
The first of the above rsync commands will print:
The names of files existing in both directories, having different content.
The names of files existing only on directoryB and not in directoryA (as files to be deleted)
The second command will print:
The names of files existing in both directories, having different content (these should
be the same as those printed by the first command)
The names of files existing only on directoryA and not in directoryB (as files to be deleted)
Example:
By comparing to mediawiki installations wikiA and wikiB, we get the the output:
$ rsync --dry-run -v -r -c --delete wikiA/ wikiB/
sending incremental file list
LocalSettings.php
deleting imagenes/logo135x135.png
imagenes/Home_icon.jpg
imagenes/Home_icon.png
imagenes/domowiki.gif
imagenes/domowiki_logo.png
sent 112402 bytes received 247 bytes 75099.33 bytes/sec
total size is 68027992 speedup is 603.89 (DRY RUN)
$ rsync --dry-run -v -r -c --delete wikiA/ wikiB/
sending incremental file list
LocalSettings.php
deleting imagenes/domowiki_logo.png
deleting imagenes/domowiki.gif
deleting imagenes/Home_icon.png
deleting imagenes/Home_icon.jpg
imagenes/logo135x135.png
sent 112321 bytes received 244 bytes 225130.00 bytes/sec
total size is 68041474 speedup is 604.46 (DRY RUN)
$
As we see, in this case LocalSettings.php files exist in both directories, but their
content differs, and there are also some images existing only under wikiA, and some images existing
only under wikiB.
Running ubuntu 12.04, I want to compare 2 directories, say folder1/ and folder2/ and copy any files
that are different to folder3/. There are also nested files, so matching subdirectories should be
copied as well
Is there a single command that would help me? I can get the full list of changed files running:
rsync -rcnC --out-format="%f" folder1/ folder2/
But rsync doesn't seem to have the ability to "export" these files on a different target directory.
Can I pipe the list to cp or some other program, so that the files are copied, while the directories
are created as well? For example, I tried
but that wouldn't preserve directories as well, it would simply copy all files inside folder3/
A:
Use --compare-dest.
From the man page:
--compare-dest=DIR - This option instructs rsync to use DIR on the destination machine as an
additional hierarchy to compare destination files against doing transfers (if the files are missing
in the destination directory). If a file is found in DIR that is identical to the sender's file,
the file will NOT be transferred to the destination directory. This is useful for creating a sparse
backup of just files that have changed from an earlier backup.
I'm a UNIX dev myself, and wouldn't be asking if this were a UNIX system we're dealing with, but
alas. Also, this is for a custom nightly backup solution, where reliability and data integrity is
a priority, so given that
a few weeks ago I couldn't even figure out a for-loop in a batch script, I'm pretty sure I lack
the experience to do this right, or even determine the best way to do this.
--dry-run # don't actually rsync (touch) any files
--itemize-changes # list changes rsync _would_ have made
--out-format="%i|%n|" # define an output format for the list of changes
File system permissions depend on the umask of the mount-user root and his user- and group-ID.
To gain write access to the vfat partition also as normal user, set "umask=002" to give rights to
read, write and execute to members of the group. All the other users do not have write access to
the partition (can't be bad to treat a multi-user system accordingly!) Now add the parameter "gid=100"
so all the files stored on the vfat partition belong to the group "users" (at least on my Debian
system). Additionally, we'll add the parameter "uid=1000" to make sure that files copied from our
source to the vfat partition don't get "root" but the actual user as owner. (On my system, 1000 is
the user-ID of the main user "t" who is member of the group "users"). On my Fedora Core 3 system,
I used "uid=500" and "gid=500" which is my user and group-ID)
> mount /dev/sda1 /myvfat -t vfat -o shortname=mixed,codepage=850,umask=002,uid=1000,gid=100
> mount | grep sda1
/dev/sda1 in /myvfat type vfat (rw,shortname=mixed,codepage=850,umask=002,uid=1000,gid=100)
> grep sda1 /proc/mounts
/dev/sda1 /myvfat vfat rw,nodiratime,uid=1000,gid=100,fmask=0002,dmask=0002,codepage=cp850,shortname=mixed 0 0
> cd /myvfat
> ls -la
-rwxrwxr-x 1 t users 0 Dec 31 16:05 ABCDEFGH
-rwxrwxr-x 1 t users 0 Dec 31 16:05 ABCDEFGHI
If you want to add these options to your /etc/fstab, you may want to add the parameters "noauto"
and "user", so that the file system does not get mounted automatically at system start and can be
mounted by a normal user. The parameter "user" implies, for security reasons, also "noexec", "nosuid",
and " nodev". This, however, should not be a problem in our example, because we assumed to deal with
an external hard drive for pure data storage. If you want to execute programs on the vfat partition,
add the parameter "exec". As an optimisation you can turn off the updating of the last file access
mit the parameters "noatime" and "nodiratime", if you don't need this information. Personally, I
do use this information, for example to find (with "find -atime -21") the audio files that I listened
to during the last three weeks.
The resulting /etc/fstab-entry looks like this:
/dev/sda1 /myvfat vfat shortname=mixed,codepage=850,umask=002,uid=1000,gid=100,noauto,user 0 0
> mount /myvfat
> mount | grep sda1
/dev/sda1 on /myvfat type vfat (rw,noexec,nosuid,nodev,shortname=mixed,codepage=850,umask=002,uid=1000,gid=100)
> grep sda1 /proc/mounts
/dev/sda1 /myvfat vfat rw,nodiratime,nosuid,nodev,noexec,uid=1000,gid=100,fmask=0002,dmask=0002,codepage=cp850,
shortname=mixed 0 0
Further information can be found in the man-pages of the "mount"-command, especially in the sections
"Mount options for fat" and "Mount options for vfat"
2.2. Adjusting the system time
A very sneaky problem occurs for everyone who uses a system time like "Europe/Berlin". Contrary
to vfat, Unix file systems do take leap seconds in the past and daylight savings time into account.
The time of the last file change of a file created in January would be deferred by one hour in June.
As a consequence, rsync would transmit all files on every clock change.
An example:
# On vfat during daylight savings time the date gets supplemented with the current time zone, thus being forged by one hour.
> TZ=Europe/Berlin ls -l --time-style=full-iso /myvfat/testfile
-rwxrwxrwx [...] 2003-11-26 02:53:02.000000000 +0200
# On ext3 the date is displayed correctly also during daylight savings time.
> TZ=Europe/Berlin ls -l --time-style=full-iso /myext3/testfile
-rw-rw-rw- [...] 2003-11-26 02:53:02.000000000 +0100
As I did not find any mount parameter to change the time zone, I adjusted the system time to a
zone that does not have daylight savings time (e.g. UTC) and set the local time zone "Europe/Berlin"
for all users. As a consequence, however, all syslogd time stamps also use UTC instead of the local
time zone.
Debian users can adjust this via "base-config". "Configure timezone" to "None of the above" and
"UTC" gets the job done. Afterwards it should look like this:
Linux only supports files up to 2GB size on vfat. You can at least backup
bigger files by splitting them up into smaller parts. In the following example, I am backing up a
5GB-cryptoloop file to a vfat partition by splitting it into 2000MB pieces:
> cd /myvfat
> split -d -b 2000m /myext3/cryptfile cryptfile_backup
> ls
cryptfile_backup00 cryptfile_backup01 cryptfile_backup02
Files bigger than 2GB must be excluded from rsync of course.
Reassembling the original file can be done via:
# Linux
> cat cryptfile_backup00 cryptfile_backup01 cryptfile_backup02 > cryptfile
# DOS
> copy /b cryptfile_backup00 + cryptfile_backup01 + cryptfile_backup02 cryptfile
2.4. Using rsync
The characteristics of vfat must also be considered when calling up rsync. As vfat does not support
symbolic links, file permissions, owners, groups and devices, usage of the parameter "-a", which
considers all of the above will not have any effect (aside from the error messages). Thus it's best
to use only the parameters that actually work:
-r, --recursive treat folders recursively
-v, --verbose show transmitted files
-t, --times keep time settings
-n, --dry-run test only
--exclude ignore files
As the date of the last file change is very important for rsync, the option "-t" is essential.
It's also very wise to test every rsync change with "-n" before.
The last roadblock to a successful synchronization with rsync is the time resolution
of the vfat time stamp. It amounts to a little more than one second. You'll have to set the parameter
"--modify-window=1" to gain a tolerance of one second, so that rsync isn't "on the dot".
In a nutshell, the command to efficiently transmit all files from an ext3 file system to a vfat
file system is:
The rapid growth of hard drive size poses big problems for the rather neglected dosfstools and
vfat-driver.
3.1. Lost Clusters
Under Linux Kernel 2.4.x., a limit of the cluster data type results in data loss, as soon as the
vfat file system holds around 130GB. In Kernel 2.6.x., this problem was - rather accidently - solved,
when many variables were consequently provided with a new type. A detailed description of this bug,
including a testsuite and a patch (by Erik Andersen) can be found
here. (The patch also
allows for file sizes up to 4GB).
If you, however, work with a 2.4.x. Kernel and have a "full" vfat partition, be prepared to lose
data: any written file in a new folder will be lost after unmounting the file system. When you mount
the file system again, these files have a size of 0 and the clusters are in limbo. You can delete
the unassigned clusters via dosfsck.
3.2. dosfsck and high RAM Demand
To conduct file system checks as efficiently as possible, dosfsck copies both FATs to RAM. With
a very large file system on a 250GB drive, the high number of clusters yields a very high demand
for RAM, that surpassed my 350MB (including swap). Thus dosfsck aborted with a malloc-error. Roman
Hodek, the maintainer of dosfsck, proposed to convert the program to "mmap()", but also said that
this change would be complex. As long as this situation has not changed, be sure top have sufficient
RAM.
3.3. Executing dosfsck
As long as the vfat file system is mounted, dosfsck can be executed, but all repairs silently
fail. Thus you should make sure that your partition is not mounted befora using dosfsck. In the following
example, an unassigned cluster (due to the bug in Kernel 2.4.x.) is located and deleted. By the way,
the command fsck.vfat is a symbolic link to dosfsck.
> fsck.vfat -vr /dev/sda1
dosfsck 2.10 (22 Sep 2003)
dosfsck 2.10, 22 Sep 2003, FAT32, LFN
Checking we can access the last sector of the filesystem
Boot sector contents:
System ID "mkdosfs"
Media byte 0xf8 (hard disk)
512 bytes per logical sector
16384 bytes per cluster
32 reserved sectors
First FAT starts at byte 16384 (sector 32)
2 FATs, 32 bit entries
39267840 bytes per FAT (= 76695 sectors)
Root directory start at cluster 2 (arbitrary size)
Data area starts at byte 78552064 (sector 153422)
9816944 data clusters (160840810496 bytes)
63 sectors/track, 255 heads
0 hidden sectors
314295660 sectors total
Checking for unused clusters.
Reclaimed 1 unused cluster (16384 bytes).
Checking free cluster summary.
Free cluster summary wrong (641900 vs. really 641901)
1) Correct
2) Don't correct
? 1
Perform changes ? (y/n) y
/dev/sda1: 143 files, 9175043/9816944 clusters
3.4. Formatting a large vfat file system
When formatting with mkfs.vfat you have to add the option -F 32, so that a 32-bit file system
is created. Without this option, a 12-bit or 16-bit file system is created, depending on the partition
size, or the formatting process aborts (on an oversized partition). Fat16 only supports file systems
up to 2GB, fat32 allows for up to 2TB (terabytes).
> mkfs.vfat -F 32 /dev/sda1
4. Conclusion
Solving the problems described here cost me a lot of time. But to me, being able to perform my
work exclusively with Free Software is a luxury that makes it quite worthwhile. Thus I want to thank
all developers of these programs heartily. If the psychological strain of the aforementioned problems
grows big enough, there will some volunteers who will approach the remaining problems.
This text is subject to the GNU Free Documentation License (FDL). Free spreading in modified or
unmodified form is allowed. Modifications must be marked unmistakeably and also distributed under
the FDL.
Translated by Mag. Christian Paratschek. More of my work can be found on my
website.
It took several tries, and a lot of poking around, but I finally have my music collection mirrored
to a disk I can take around (most notably to work). The hard part was getting rsync to work
right. Finally I got it working after
finding a helpful article
on the topic. To summarize (in less than 3 pages), I used to following 2 commands:
Now I won't lose them, and maybe they'll help you. The only reason for having problems is
that I was using the vfat filesystem under FC3 Linux (where my custom-built audio archive exists)
to make a disk I could plug in to my work laptop. Windows filesystems aren't so great, they have
problems doing mixed case and being very accurate with times. So this makes it work!
================
On 2013-02-02 03:46, jdmcdaniel3 wrote:
>
> I found this example, perhaps it might help?
>
>> RSYNC AND
>> VFAT (\"HTTP://WWW.KYLEV.COM/2005/03/29/RSYNC-AND-VFAT/\")
>> It took several tries, and a lot of poking around, but I finally have
>> my music collection mirrored to a disk I can take around (most notably
>> to work). The hard part was getting rsync to work right. Finally I got
>> it working after 'finding a helpful article on the topic'
>> (http://www.osnews.com/story.php?news_id=9681&page=1).
To summarize (in
>> less than 3 pages), I used to following 2 commands:
>>
>>>
> Code:
> --------------------
> > > mount -t vfat -o shortname=mixed,iocharset=utf8 /dev/sda1 /mnt
> >
> > rsync --modify-window=1 -rtv --delete /data/mp3/ /mnt/mp3
> --------------------
--modify-window
When comparing two timestamps, rsync treats the timestamps as
being equal if they differ by no more than
the modify-window value.
This is normally 0 (for an exact
match), but you may find it useful to set
this to a larger value in some situations. In particular,
when transferring to or from an MS Windows
FAT filesystem (which represents times with a 2-second
resolution), --modify-window=1 is useful (allow-
ing times to differ by up to 1 second).
Interesting! That will be it.
-r, --recursive recurse into directories
-t, --times preserve modification times
-v, --verbose increase verbosity
I'll try it.
....
Ok, first run was slow, I thought it failed. But a second run just after
did run in seconds, so it appears to work. I then umounted the device,
mounted it again (automatic mount under xfce), run the copy again, and
it was away in seconds. So yes, that is the trick, thankyou.
rsync is useful when large amounts of data need to be transmitted regularly while not changing
too much. This is, for example, often the case when creating backups. Another application concerns
staging servers. These are servers that store complete directory trees of Web servers that are regularly
mirrored onto a Web server in a DMZ.
28.4.1. Configuration and Operation
rsync can be operated in two different modes. It can be used to archive or copy data. To accomplish
this, only a remote shell, like ssh, is required on the target system. However, rsync can also be
used as a daemon to provide directories to the network.
The basic mode of operation of rsync does not require any special configuration. rsync directly
allows mirroring complete directories onto another system. As an example, the following command creates
a backup of the home directory of tux on a backup server named sun:
rsync -baz -e ssh /home/tux/ tux@sun:backup
The following command is used to play the directory back:
rsync -az -e ssh tux@sun:backup /home/tux/
Up to this point, the handling does not differ much from that of a regular copying tool, like
scp.
rsync should be operated in "rsync" mode to make all its features fully available. This is done
by starting the rsyncd daemon on one of the systems. Configure it in the file /etc/rsyncd.conf.
For example, to make the directory /srv/ftp available with rsync, use the following
configuration:
gid = nobody
uid = nobody
read only = true
use chroot = no
transfer logging = true
log format = %h %o %f %l %b
log file = /var/log/rsyncd.log
[FTP]
path = /srv/ftp
comment = An Example
Then start rsyncd with rcrsyncdstart. rsyncd can also be started
automatically during the boot process. Set this up by activating this service in the runlevel editor
provided by YaST or by manually entering the command insservrsyncd.
rsyncd can alternatively be started by xinetd. This is, however, only recommended for servers that
rarely use rsyncd.
The example also creates a log file listing all connections. This file is stored in /var/log/rsyncd.log.
It is then possible to test the transfer from a client system. Do this with the following command:
rsync -avz sun::FTP
This command lists all files present in the directory /srv/ftp of the server. This
request is also logged in the log file /var/log/rsyncd.log. To start an actual transfer,
provide a target directory. Use . for the current directory. For example:
rsync -avz sun::FTP .
By default, no files are deleted while synchronizing with rsync. If this should be forced, the
additional option --delete must be stated. To ensure that no newer files are deleted,
the option --update can be used instead. Any conflicts that arise must be resolved manually.
Name the script: server-download.sh and put some comments at the start of the
script before the actual rsync command:
# server-download.sh
#
# Install on Notebook PCs
#
# rsync tool to download server data
# from [server name]
# to [user name's notebook PC]
#
# uses ssh key pairs to login automatically
#
# last edited: [last edit date] by [author]
#
# download only those files on [server name] in [server target directory]
# that are newer than what is already on the notebook PC
rsync -avzu [user name]@[server name]:[server directory] [notebook PC directory]
Notes:
You should add the -n option to rsync the first time this is run to do a
"dry run" which does not do any actual downloading. By doing this you will be able to see if the
script is going to correctly do what you want. Thus, the rsync command becomes:
rsync -avzun [remainder as above]
You can create additional rsync lines in the script file to download data from more than one
directory on the server.
Next you need to create an icon on the desktop that will permit the script to be easily run by
the user so that they won't have to enter the script name in a command window to do the file download.
Distributing software packages to all of our servers is a tedious task. Currently, a release manager
makes a connection to each server and transfers files using ftp. This involves entering passwords
multiple times, waiting for transfers to complete, changing directories, and keeping files organized.
We developed a shell script, distribute_release (
Listing 1
), that makes the job easier.
Our script has some advantages over the ftp process:
Directory trees can be used to organize release modules.
A distribution network defines how files are transferred from server to server.
When a release module is ready to be distributed, it is replicated to all of the servers in
the network using rsync, which helps minimize network traffic.
Various authentication methods can be used to avoid entering passwords for each server.
We'll describe the directory structures including creating the distribution network. Then we'll
talk about the scripts. Finally, we'll discuss an example.
Directory Structures
Each release module is stored in the directory /var/spool/pkg/release/[module]/. A module directory
can be flat, or it can contain subdirectories. Hidden directory trees under the ./release/ directory
define the distribution network. Therefore, the names of these directories cannot be used as module
names.
Transport protocols supported by distribute_release include nfs, rsh, and ssh. If a release module
is distributed using nfs, then the directory /var/spool/pkg/release/.nfs/[module]/ contains symbolic
links corresponding to the hosts in the server's distribution network:
When using nfs, rsync functions like an improved copy command, transferring files between the directories
/var/spool/pkg/release/[module]/ and /var/spool/pkg/release/.nfs/[module]/[host]/[module].
When using rsh or ssh, the directory structures are similar. With rsh, for example, empty files
of the form /var/spool/pkg/release/.rsh/[module]/[host] define the hosts in the distribution network.
The Scripts
Before distribute_release can be called, the directory structures and the distribution network
must be created. The script create_distribution (
Listing 2
) facilitates these tasks.
One argument, the name of a release module, must be passed to create_distribution. When no options
are used, the local host functions as a terminal node in the distribution network. In other words,
the system may receive updates from another host, but it will not propagate those updates to downstream
hosts. Downstream hosts and transport protocols may be specified with the -h and -t
options respectively.
When using distribute_release, the name of a release module must be passed to the script. The
-q and -v options may be used to control the amount of information displayed to the
user. Hosts to be included or excluded from the distribution may be specified using the -i
and -e options. The -r option may be used to determine how many times the program will
recursively call itself to distribute the module to successive levels in a distribution hierarchy.
When using nfs, the recursive calls are made locally. With rsh and ssh, the program calls itself
on a remote server.
Distribute_release first gets the argument and any command-line options. Then, for each transport
protocol, the script builds a distribution list and executes the appropriate rsync command
for each host in the list. If a recursion depth is specified, then another instance of distribute_release
is executed in a detached screen session, allowing the parent instance to continue running while
the child processes propagate the module to other hosts.
An Example
Our example network (see
Figure 1
) contains five servers -- bambi, pongo, pluto, nemo, and goofy. One of the release modules is
named TS1 (located on bambi) and the module is named TS2 (located on pluto). By executing the create_distributions
script ( Listing
3 ) on each server, the complete distribution network for both modules is built using the proper
create_distribution calls.
Consider the TS1 release module; after the module has been distributed to all of the systems in
the network, the directory /var/spool/pkg/release/TS1/ contains the following files and subdirectories:
On bambi, the directory /var/spool/pkg/release/.ssh/TS1/ contains a file named pongo. So, executing
"distribute_release TS1" on bambi synchronizes the TS1 module with pongo using ssh as the transport
protocol. The TS1 module can be distributed from pongo to all servers in the network using the
-r option:
distribute_release -r 2 TS1
When using ssh, passwords can be avoided by using public/private key pairs with empty passphrases.
When using rsh, you can update /etc/hosts.equiv or the appropriate .rhosts file. Obviously, passwords
are not an issue with nfs. Deciding which protocol to use depends on security concerns, potential
performance issues, and configuration complexity.
John Spurgeon is a software developer and systems administrator for Intel's Factory Information
Control Systems, IFICS, in Hillsboro, Oregon. He is currently preparing to ride a single-speed bicycle
in Race Across America in 2007.
Ed Schaefer is a frequent contributor to Sys Admin. He is a software developer and DBA for
Intel's Factory Information Control Systems, IFICS, in Hillsboro, Oregon. Ed also hosts the monthly
Shell Corner column on UnixReview.com. He can be reached at: [email protected].
This will copy the local dir 〔~/web/〕 to the remote dir 〔~/〕 on the machine with domain name "example.org",
using login "mary" thru the ssh protocol. The "-z" is to use compression. The "-a" is for archived
mode, basically making the file's meta data (owner/perm/timestamp) same as the local file (when possible)
and do recursive (i.e. upload the whole dir). The "-v" is for verbose mode, which basically makes
rsync print out which files is being updated. (rsync does not upload files that's already on the
destination and identical.)
For example, here's what i use to sync/upload my website on my local machine to my server.
I used this command daily. The "--exclude" tells it to disregard any files matching that pattern
(i.e. if it matches, don't upload it nor delete it on remote server)
Note that "-r" is used intstead of "-a". The "-r" means recursive, all sub directories and files.
Don't use "-a" because that will sync file owner, group, permissions, andbecause Windows and unix
has different permission systems and file systems, so "-a" is usually not what you want. (For a short
intro to perm systems on unix and Windows, see:
Unix And Windows File Permission
Systems)
You can creat a bash alias for the long command e.g. alias l="ls -al";, or use bash's
back history by 【Ctrl+r】 then type rsync.
Much like cp, rsync copies files from a source to a destination. Unlike
cp, the source and destination of an rsync operation can be local or remote.
For instance, the command in Listing 1 copies the directory
/tmp/photos and its entire contents verbatim to a home directory.
$ rsync -n -av /tmp/photos ~
building file list ... done
photos/
photos/Photo 2.jpg
photos/Photo 3.jpg
photos/Photo 6.jpg
photos/Photo 9.jpg
sent 218 bytes received 56 bytes 548.00 bytes/sec
total size is 375409 speedup is 1370.11
The -v option enables verbose messages. The -a option (where a stands
for archive), is a shorthand for -rlptgoD (recurse, copy symbolic links as symbolic
links, preserve permissions, preserve file times, preserve group, preserve owner, and preserve devices
and special files, respectively). Typically, -a mirrors files; exceptions occur when the
destination cannot or does not support the same attributes. For example, copying a directory from
UNIX to Windows® does not map perfectly. Some suggestions for unusual cases appear below.
rsync has a lot of options. If you worry that your options or source or destination specifications are incorrect, use -n to perform
a dry run. A dry run previews what will happen to each file but does not move a single byte. When you are confident of all the settings, drop the
-n and proceed.
Listing 2 provides an example where -n is
invaluable. The command in Listing 1 and the following
command yield different results.
$ rsync -av /tmp/photos/ ~
./
Photo 2.jpg
Photo 3.jpg
Photo 6.jpg
Photo 9.jpg
sent 210 bytes received 56 bytes 532.00 bytes/sec
total size is 375409 speedup is 1411.31
What is the difference? The difference is the trailing slash on the source argument. If the source
has a trailing slash, the contents of the named directory but not the directory itself are
copied. A slash on the end of the destination is immaterial.
And Listing 3 provides an example of moving the same directory to another system.
$ rsync -av /tmp/photos example.com:album
created directory album
Photo 2.jpg
Photo 3.jpg
Photo 6.jpg
Photo 9.jpg
sent 210 bytes received 56 bytes 21.28 bytes/sec
total size is 375409 speedup is 1411.31
Assuming that you have the same login name on the remote machine, rsync prompts you
with a password and, given the proper credential, creates the directory album and copies the
images to that directory. By default, rsync uses Secure Shell (SSH) as its transport mechanism;
you can reuse your machine aliases and public keys with rsync.
The examples in Listing 2 and
Listing 3 demonstrate two of rsync's four modes.
The first example was shell mode, also dubbed local mode. The second sample was
remote shell mode and is so named because SSH powers the underlying connection and transfers.
rsync has two additional modes. List mode acts like ls: It lists the
contents of source, as shown in Listing 4.
The fourth mode is server mode. Here, the rsync daemon runs perennially on a machine,
accepting requests to transfer files. A transfer can send files to the daemon or request files from
it. Server mode is ideal for creating a central backup server or project repository.
To differentiate between remote shell mode and server mode, the latter employs two colons (:)
in the source and destination names. Assuming that whatever.example.com exists, the next command
copies files from the source to a local destination:
$ rsync -av whatever.example.com::src /tmp
And what exactly is src? It's an rsync module that you define and configure
on the daemon's host. A module has a name, a path that contains its files, and some other parameters,
such as read only, which protects the contents from modification.
To run an rsync daemon, type:
$ sudo rsync --daemon
Running the rsync daemon as the superuser, root, is not strictly necessary, but the practice
protects other files on your machine. Running as root, rsync restricts itself to the module's
directory hierarchy (its path) using chroot. After a chroot, all other files
and directories seem to vanish. If you choose to run the rsync daemon with your own privileges,
choose an unused socket and make sure its modules have sufficient permissions to allow download and/or
upload. Listing 5 shows a minimal configuration to share
some files in your home directory without the need for sudo. The configuration is stored
in file rsyncd.conf.
motd file = /home/strike/rsyncd/rsync.motd_file
pid file = /home/strike/rsyncd/rsyncd.pid
port = 7777
use chroot = no
[demo]
path = /home/strike
comment = Martin home directory
list = no
[dropbox]
path = /home/strike/public/dropbox
comment = A place to leave things for Martin
read only = no
[pickup]
path = /home/strike/public/pickup
comment = Get your files here!
The file has two segments. The first segment-here, the first four lines-configures the operation
of the rsync daemon. (Other options are available, too.) The first line points to a file
with a friendly message to identify your server. The second line points to another file to record
the process ID of the server. This is a convenience in the event you must manually kill the
rsync daemon:
kill -INT `cat /home/strike/rsyncd/rsyncd.pid`
The two files are in a home directory, because this example does not use superuser privileges
to run the software. Similarly, the port chosen for the daemon is above 1000, which users can claim
for any application. The fourth line turns off chroot.
The remaining segment is subdivided into small sections, one section per module. Each section,
in turn, has a header line and a list of (key-value) pairs to set options for each module. By default,
all modules are read only; set read only = no to allow Write operations. Also by default,
all modules are listed in the module catalog; set list = no to hide the module.
To start the daemon, run:
$ rsync --daemon --config=rsyncd.conf
Now, connect to the daemon from another machine, and omit a module name. You should see this:
rsync --port=7777 mymachine.example.com::
Hello! Welcome to Martin's rsync server.
dropbox A place to leave things for Martin
pickup Get your files here!
If you do not name a module after the colons (::), the daemon responds with a list
of available modules. If you name a module but do not name a specific file or directory within the
module, the daemon provides a catalog of the module's contents, as shown in
Listing 6.
You can also perform an upload by reversing the source and destination, then pointing to the module
for writes, as shown in Listing 8.
Listing 8. Reverse source and destination directories
$ rsync -v --port=7777 application.js mymachine.example.com::dropbox
Hello! Welcome to Martin's rsync server.
application.js
sent 245 bytes received 38 bytes 113.20 bytes/sec
total size is 164 speedup is 0.58
That's a quick but thorough review. Next, let's see how you can apply rsync to daily tasks.
rsync is especially useful for backups. And because it can synchronize a local file with
its remote counterpart-and can do that for an entire file system, too-it's ideal for managing large
clusters of machines that must be (at least partially) identical.
Performing backups on a frequent basis is a critical but typically ignored chore. Perhaps it's
the demands of running a lengthy backup each day or the need to have large external media to store
files; never mind the excuse, copying data somewhere for safekeeping should be an everyday practice.
To make the task painless, use rsync and point to a remote server-perhaps one that
your service provider hosts and backs up. Each of your UNIX machines can use the same technique,
and it's ideal for keeping the data on your laptop safe.
Establish SSH keys and an rsync daemon on the remote machine, and create a backup module
to permit writes. Once established, run rsync to create a daily backup that takes hardly
any space, as shown in Listing 9.
#!/bin/sh
# This script based on work by Michael Jakl (jakl.michael AT gmail DOTCOM) and used
# with express permission.
HOST=mymachine.example.com
SOURCE=$HOME
PATHTOBACKUP=home-backup
date=`date "+%Y-%m-%dT%H:%M:%S"`
rsync -az --link-dest=$PATHTOBACKUP/current $SOURCE $HOST:PATHTOBACKUP/back-$date
ssh $HOST "rm $PATHTOBACKUP/current && ln -s back-$date $PATHTOBACKUP/current"
Replace HOST with the name of your backup host and SOURCE with the directory
you want to save. Change PATHTOBACKUP to the name of your module. (You can also embed
the three final lines of the script in a loop, dynamically change SOURCE, and back up
a series of separate directories on the same system.) Here's how the backup works:
To begin, date is set to the current date and time and yields a string like
2009-08-23T12:32:18, which identifies the backup uniquely.
The rsync command performs the heavy lifting. -az preserves all file
information and compresses the transfers. The magic lies in --link-dest=$PATHTOBACKUP/current,
which specifies that if a file has not changed, do not copy it to the new backup. Instead, create
a hard link from the new backup to the same file in the existing backup. In other words, the new
backup only contains files that have changed; the rest are links.
More specifically (and expanding all variables), mymachine.example.com::home-backup/current
is the current archive. The new archive for /home/strike is targeted to mymachine.example.com::home-backup/back-2009-08-23T12:32:18.
If a file in /home/strike has not changed, the file is represented in the new backup by a hard
link to the current archive. Otherwise, the new file is copied to the new archive.
If you touch but a few file or perhaps a handful of directories each day, the additional space
required for what is effectively a full backup is paltry. moreover, because each daily backup
(except the very first) is so small, you can keep a long history of the files on hand.
The last step is to alter the organization of the backups on the remote machine to promote
the newly created archive to be the current archive, thereby minimizing the differences t command
removes the current archive (which is merely a symbolic link) and recreates the same symbolic
link pointing to the new archive.
Keep in mind that a hard link to a hard link points to the same file. Hard links are very cheap
to create and maintain, so a full backup is simulated using only an incremental scheme.
Distributing software packages to all of our servers is a tedious task. Currently, a release manager
makes a connection to each server and transfers files using ftp. This involves entering passwords
multiple times, waiting for transfers to complete, changing directories, and keeping files organized.
We developed a shell script, distribute_release (
Listing 1
), that makes the job easier.
Our script has some advantages over the ftp process:
Directory trees can be used to organize release modules.
A distribution network defines how files are transferred from server to server.
When a release module is ready to be distributed, it is replicated to all of the servers in
the network using rsync, which helps minimize network traffic.
Various authentication methods can be used to avoid entering passwords for each server.
We'll describe the directory structures including creating the distribution network. Then we'll
talk about the scripts. Finally, we'll discuss an example.
Directory Structures
Each release module is stored in the directory /var/spool/pkg/release/[module]/. A module directory
can be flat, or it can contain subdirectories. Hidden directory trees under the ./release/ directory
define the distribution network. Therefore, the names of these directories cannot be used as module
names.
Transport protocols supported by distribute_release include nfs, rsh, and ssh. If a release module
is distributed using nfs, then the directory /var/spool/pkg/release/.nfs/[module]/ contains symbolic
links corresponding to the hosts in the server's distribution network:
When using nfs, rsync functions like an improved copy command, transferring files between the directories
/var/spool/pkg/release/[module]/ and /var/spool/pkg/release/.nfs/[module]/[host]/[module].
When using rsh or ssh, the directory structures are similar. With rsh, for example, empty files
of the form /var/spool/pkg/release/.rsh/[module]/[host] define the hosts in the distribution network.
The Scripts
Before distribute_release can be called, the directory structures and the distribution network
must be created. The script create_distribution (
Listing 2
) facilitates these tasks.
One argument, the name of a release module, must be passed to create_distribution. When no options
are used, the local host functions as a terminal node in the distribution network. In other words,
the system may receive updates from another host, but it will not propagate those updates to downstream
hosts. Downstream hosts and transport protocols may be specified with the -h and -t
options respectively.
When using distribute_release, the name of a release module must be passed to the script. The
-q and -v options may be used to control the amount of information displayed to the
user. Hosts to be included or excluded from the distribution may be specified using the -i
and -e options. The -r option may be used to determine how many times the program will
recursively call itself to distribute the module to successive levels in a distribution hierarchy.
When using nfs, the recursive calls are made locally. With rsh and ssh, the program calls itself
on a remote server.
Distribute_release first gets the argument and any command-line options. Then, for each transport
protocol, the script builds a distribution list and executes the appropriate rsync command
for each host in the list. If a recursion depth is specified, then another instance of distribute_release
is executed in a detached screen session, allowing the parent instance to continue running while
the child processes propagate the module to other hosts.
An Example
Our example network (see
Figure 1
) contains five servers -- bambi, pongo, pluto, nemo, and goofy. One of the release modules is
named TS1 (located on bambi) and the module is named TS2 (located on pluto). By executing the create_distributions
script ( Listing
3 ) on each server, the complete distribution network for both modules is built using the proper
create_distribution calls.
Consider the TS1 release module; after the module has been distributed to all of the systems in
the network, the directory /var/spool/pkg/release/TS1/ contains the following files and subdirectories:
On bambi, the directory /var/spool/pkg/release/.ssh/TS1/ contains a file named pongo. So, executing
"distribute_release TS1" on bambi synchronizes the TS1 module with pongo using ssh as the transport
protocol. The TS1 module can be distributed from pongo to all servers in the network using the
-r option:
distribute_release -r 2 TS1
When using ssh, passwords can be avoided by using public/private key pairs with empty passphrases.
When using rsh, you can update /etc/hosts.equiv or the appropriate .rhosts file. Obviously, passwords
are not an issue with nfs. Deciding which protocol to use depends on security concerns, potential
performance issues, and configuration complexity.
John Spurgeon is a software developer and systems administrator for Intel's Factory Information
Control Systems, IFICS, in Hillsboro, Oregon. He is currently preparing to ride a single-speed bicycle
in Race Across America in 2007.
Ed Schaefer is a frequent contributor to Sys Admin. He is a software developer and DBA for
Intel's Factory Information Control Systems, IFICS, in Hillsboro, Oregon. Ed also hosts the monthly
Shell Corner column on UnixReview.com. He can be reached at: [email protected].
If you work with both a laptop and a desktop computer, you know you have to synchronize the machines
to keep them up to date. In addition, you probably want to run the synchronization not only at your
home but also from a remote site; in my case, whenever I travel with my laptop, I make sure that
whatever I do on it gets backed up to my desktop computer. (Losing your laptop and thereby losing
all your work isn't nice at all!) Many solutions to this problem exist: This article introduces one
such tool-rsync-and mentions several related tools, all of which provide easy synchronization
procedures.
Rsync::Config is a module who can be used to create rsync configuration files. A configuration
file (from Rsync::Config point of view) is made by atoms and modules with atoms. A atom is the smallest
piece from the configuration file. This module inherits from Rsync::Config::Module .
Rsync is a very useful alternative to rcp written by Andrew Tridgell and Paul Mackerras.
This tool lets you copy files and directories between a local host and a remote host (source and
destination can also be local if you need.) The main advantage of using Rsync instead of rcp, is
that rsync can use SSH as a secure channel, send/receive only the bytes inside files that changed
since the last replication, and remove files on the destination host if those files were deleted
on the source host to keep both hosts in sync. In addition to using rcp/ssh for transport, you can
also use Rsync itself, in which case you will connect to TCP port 873.
Whether you rely on SSH or use Rsync explicitely, Rsync still needs to be installed on both hosts.
A Win32 port is available
if you need, so you can have either one of the host or both be NT hosts. Rsync's
web site has some good infos and links. There
is also an
HOWTO.
Configuring /etc/rsyncd.conf
Being co-written by Andrew Tridgell, author of Samba, it's no surprise that Rsync's configuration
file looks just like Samba (and Windows' :-), and that Rsync lets you create projects that look like
shared directories under Samba. Accessing remote resources through this indirect channel offers more
independence, as it lets you move files on the source Rsync server without changing anything on the
destination host.
Any parameters listed before any [module] section are global, default parameters.
Each module is a symbolic name for a directory on the local host. Here's an example:
#/etc/rsyncd.conf
secrets file = /etc/rsyncd.secrets
motd file = /etc/rsyncd.motd #Below are actually defaults, but to be on
the safe side...
read only = yes
list = yes
uid = nobody
gid = nobody
[out]
comment = Great stuff from remote.acme.com
path = /home/rsync/out
[confidential]
comment = For your eyes only
path = /home/rsync/secret-out
auth users = joe,jane
hosts allow = *.acme.com
hosts deny = *
list = false
Note: Rsync will not grant access to a protected share if the password file (/etc/rsyncd.secrets,
here) is world-readable.
Running RSYNCd
Per the manual page:
The rsync daemon is launched by specifying the --daemon option to rsync. You can
launch it either via inetd or as a stand-alone daemon. When run via inetd you should
add a line like this to /etc/services:
rsync 873/tcp
... and a single line something like this to /etc/inetd.conf:
You will then need to send inetd a HUP signal to tell it to reread its config file.
Note that you should not send the rsync server a HUP signal to force it to reread the /etc/rsyncd.conf.
The file is re-read on each client connection.
Per the HOWTO:
The rsync daemon is robust, so it is safe to launch it as a stand-alone server. The code that
loops waiting for requests is only a few lines long then it forks a new copy. If the forked process
dies then it doesn't harm the main daemon.
The big advantage of running as a daemon will come when the planned directory cache system is
implemented. The caching system will probably only be enable when running as a daemon. For this
reason, busy sites is recommended to run rsync as a daemon. Also, the daemon mode makes it easy
to limit the number of concurrent connections.
Since it's not included in the 2.4.3 RPM package, here's the init script to be copied as
/etc/rc.d/init.d/rsyncd with symlinks to /etc/rc.d/rc3.d:
#!/bin/sh
# Rsyncd This shell script takes care of starting and stopping the rsync
daemon
# description: Rsync is an awesome replication tool.
# Source function library.
. /etc/rc.d/init.d/functions
An important thing here, is that the presence or absence of a trailing "/" in the source directory
determines whether the directory itself is copied, or simply the contents of this source directory.
In other words, the above means that the local host must have a directory available (here,
/home/rsync/from_remote to receive the contents of /home/rsync/out sitting
on the remote host, otherwise Rsync will happily download all files into the path given as destination
without asking for confirmation, and you could end up with a big mess.
On the other hand, rsync -avz -e ssh [email protected]:/home/rsync/out /home/rsync/from_remote
means that the an "out" sub-directory is first created under /home/rsync/from_remote on
the destination host, and will be populated with the contents of the remote directory ./out.
In this case, files will be save on the local host in /home/rsync/from_remote/out, so
the former commands looks like a better choice.
Here's how to replicate an Rsync share from a remote host:
Notice that we do not use a path to give the source resource, but instead just a name ("out"),
and that we use :: to separate the server's name and the resource it offers. In the Rsync
configuration that we'll see just below, this is shown as a [out] section. This way, admins on remote.acme.com
can move files on their server; As long as they remember to update the actual path in the [out] section
(eg. PATH=/home/rsync/out to PATH=/home/outgoing), remote Rsync users are not
affected.
An Rsync server displays the list of available anonymous shares through rsync remote.acme.com::.
Note the ::. For added security, it is possible to prompt for a password when listing
private shares, so that only authorized remote users know about the Rsync shares available from your
server.
Any NT version?
The NT port only requires the latest and greatest RSYNCx.EXE and Cygnus' CYGWIN1.DLL.
The easiest is to keep both in the same directory, but the DLL can be located in any directory found
in your PATH environment variable.
Robert Scholte's excellent tutorial on using the NT port of Rsync can be found
here.
Instructions on how to install rsync as an NT service are
here.
Here's an example based on the sample above:
C:\Rsync>rsync243 -avz [email protected]::confidential ./confidential
Password:
receiving file list ... done
./
./
wrote 109 bytes read 123 bytes 66.29 bytes/sec
total size is 0 speedup is 0.00
Useful command-line switches
-v, --verbose increase verbosity
-q, --quiet decrease verbosity
-c, --checksum always checksum
-a, --archive archive mode. It is a quick way of saying you want recursion
and want to preserve everything.
-r, --recursive recurse into directories
-R, --relative use relative path names
-u, --update update only (don't overwrite newer files)
-t, --times preserve times
-n, --dry-run show what would have been transferred
-W, --whole-file copy whole files, no incremental checks
-I, --ignore-times Normally rsync will skip any files that are already the
same length and have the same time-stamp. This option turns off this behavior.
--existing only update files that already exist
--delete delete files that don't exist on the sending side
--delete-after delete after transferring, not before
--force force deletion of directories even if not empty
-c, --checksum always checksum
--size-only only use file size when determining if a file should be transferred
--progress show progress during transfer
-z, --compress compress file data
--exclude=PATTERN exclude files matching PATTERN
--daemon run as a rsync daemon
--password-file=FILE get password from FILE
Rsnapshot (higher-level backup utility
based on rsync, improved ease-of-use, allows you to keep multiple snapshots in time of your data,
local or remote)
In the last two months I've been traveling a lot. During the same period my main desktop computer
went belly up. I would have been in trouble without rsync at my disposal -- but thanks to my regular
use of this utility, my data (or most of it, anyway) was already copied offsite just waiting to be
used. It takes a little time to become familiar with rsync, but once you are, you should be able
to handle most of your backup needs with just a short script.
What's so great about rsync? First,
it's designed to speed up file transfer by copying the differences between two files rather than
copying an entire file every time. For example, when I'm writing this article, I can make a copy
via rsync now and then another copy later. The second (and third, fourth, fifth, etc.) time I copy
the file, rsync copies the differences only. That takes far less time, which is especially
important when you're doing something like copying a whole directory offsite for daily backup. The
first time may take a long time, but the next will only take a few minutes (assuming you don't change
that much in the directory on a daily basis).
Another benefit is that rsync can preserve permissions and ownership information, copy symbolic
links, and generally is designed to intelligently handle your files.
You shouldn't need to do anything to get rsync installed -- it should be available on almost any
Linux distribution by default. If it's not, you should be able to install it from your distribution's
package repositories. You will need rsync on both machines if you're copying data to a remote system,
of course.
When you're using it to copy files to another host, the rsync utility typically works over a remote
shell, such as Secure Shell (SSH) or Remote Shell (RSH). We'll work with SSH in the following examples,
because RSH is not secure and you probably don't want to be copying your data using it. It's also
possible to connect to a remote host using an rsync daemon, but since SSH is practically ubiquitous
these days, there's no need to bother.
I hate making backups by hand. It costs a lot of time and usually I have far
better things to do. Long ago (in the Windows 98 era) I made backups to CD only before I needed to
reïnstall the OS, which was about once every 18 months, and my code projects maybe twice as often.
A lot has changed since those dark times though. My single PC expanded into a network with multiple
desktops and a server, I installed a mix of Debian an Ubuntu and ditched Windows, and I have a nice
broadband link - just as my friends do. Finally a lazy git like me can set up a decent backup system
that takes care of itself, leaving me time to do the "better" things (such as writing about it :-)
There are already
quitea
few tutorials on the internet explaining various ways to backup your Linux system using built-in
commands and a script of some sorts, but I could not find one that suited me so I decided to write
another one - one that takes care of backing up my entire network.
[Dec 8, 2006] rsnapshot A
Perl-based filesystem snapshot utility.
rsnapshot is a filesystem snapshot utility based on rsync. It makes it easy to make periodic snapshots
of local machines, and remote machines over ssh. It uses hard links whenever possible, to greatly
reduce the disk space required.
[Dec 8, 2006] Warsync A Perl-based
server replication program based on rsync.
Warsync (Wrapper Around Rsync) is a server replication system mainly used to sync servers in LVS
clusters. It is based on rsync over ssh and has native support for Debian package synchronization.
Rsync is a wonderful little utility that's amazingly easy to set up on your machines. Rather than
have a scripted FTP session, or some other form of file transfer script -- rsync copies only the
diffs of files that have actually changed, compressed and through ssh if you want to for security.
That's a mouthful -- but what it means is:
Diffs - Only actual changed pieces of files are transferred, rather than the whole file. This
makes updates faster, especially over slower links like modems. FTP would transfer the entire
file, even if only one byte changed.
Compression - The tiny pieces of diffs are then compressed on the fly, further saving you
file transfer time and reducing the load on the network.
Secure Shell - The security concious of you out there would like this, and you should all
be using it. The stream from rsync is passed through the ssh protocol to encrypt your session
instead of rsh, which is also an option (and required if you don't use ssh - enable it in your
/etc/inet.d and restart your inet daemon if you disabled it for security).
Rsync is rather versatile as a backup/mirroring tool, offering many features above and beyond
the above. I personally use it to synchronize Website trees from staging to production servers and
to backup key areas of the filesystems both automatically through cron and by a CGI script. Here
are some other key features of rsync:
Support for copying links, devices, owners, groups and permissions
Exclude and exclude-from options similar to GNU tar
A CVS exclude mode for ignoring the same files that CVS would ignore
Does not require root privileges
Pipelining of file transfers to minimize latency costs
Support for anonymous or authenticated rsync servers (ideal for mirroring)
How does it work?
You must set up one machine or another of a pair to be an "rsync server" by running rsync in a
daemon mode ("rsync --daemon" at the commandline) and setting up a short, easy configuration
file (/etc/rsyncd.conf). Below I'll detail a sample configuration file. The options are readily
understood, few in number -- yet quite powerful.
Any number of machines with rsync installed may then synchronize to and/or from the machine running
the rsync daemon. You can use this to make backups, mirror filesystems, distribute files or any number
of similar operations. Through the use of the "rsync algorithm" which transfers only the diffs between
files (similar to a patch file) and then compressing them -- you are left with a very efficient system.
For those of you new to secure shell ("ssh" for short), you should be using it! There's a very
useful and quite thourough Getting Started
with SSH document available. You may also want to visit the
Secure Shell Web Site. Or, just hit the
Master FTP Site in Finland and snag it for yourself. It
provides a secure, encrypted "pipe" for your network traffic. You should be using it instead of telnet,
rsh or rlogin and use the replacement "scp" command instead of "rcp."
Setting up a Server
You must set up a configuration file on the machine meant to be the server and run the rsync binary
in daemon mode. Even your rsync client machines can run rsync in daemon mode for two-way transfers.
You can do this automatically for each connection via the inet daemon or at the commandline in standalone
mode to leave it running in the background for often repeated rsyncs. I personally use it in standalone
mode, like Apache. I have a crontab entry that synchronizes a Web site directory hourly. Plus there
is a CGI script that folks fire off frequently during the day for immediate updating of content.
This is a lot of rsync calls! If you start off the rsync daemon through your inet daemon, then you
incur much more overhead with each rsync call. You basically restart the rsync daemon for every connection
your server machine gets! It's the same reasoning as starting Apache in standalone mode rather than
through the inet daemon. It's quicker and more efficient to start rsync in standalone mode if you
anticipate a lot of rsync traffic. Otherwise, for the occasional transfer follow the procedure to
fire off rsync via the inet daemon. This way the rsync daemon, as small as it is, doesn't sit in
memory if you only use it once a day or whatever. Your call.
Below is a sample rsync configuration file. It is placed in your /etc directory as rsyncd.conf.
motd file = /etc/rsyncd.motd
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock
[simple_path_name]
path = /rsync_files_here
comment = My Very Own Rsync Server
uid = nobody
gid = nobody
read only = no
list = yes
auth users = username
secrets file = /etc/rsyncd.scrt
Various options that you would modify right from the start are the areas in italics in the sample
above. I'll start at the top, line by line, and go through what you should pay attention to. What
the sample above does is setup a single "path" for rsync transfers to that machine.
Starting at the top are four lines specifying files and their paths for rsync running in daemon
mode. The first is a "message of the day" (motd) file like you would use for an FTP server. This
file's contents get displayed when clients connect to this machine. Use it as a welcome, warning
or simply identification. The next line specifies a log file to send diagnostic and norml run-time
messages to. The PID file contains the "process ID" (PID) number of the running rsync daemon. A lock
file is used to ensure that things run smoothly. These options are global to the rsync daemon.
The next block of lines is specific to a "path" that rsync uses. The options contained therein
have effect only within the block (they're local, not global options). Start with the "path" name.
It's somewhat confusing that rsync uses the term "path" -- as it's not necessarily a full pathname.
It serves as an "rsync area nickname" of sorts. It's a short, easy to remember (and type!) name that
you assign to a try filesystem path with all the options you specify. Here are the things you need
to set up first and foremost:
path - this is the actual filesystem path to where the files are rsync'ed from and/or to.
comment - a short, descriptive explanation of what and where the path points to for listings.
auth users - you really should put this in to restrict access to only a pre-defined user that
you specify in the following secrets file - does not have to be a valid system user.
secrets file - the file containing plaintext key/value pairs of usernames and passwords.
One thing you should seriously consider is the "hosts allow" and "hosts deny" options for your
path. Enter the IPs or hostnames that you wish to specifically allow or deny! If you don't do this,
or at least use the "auth users" option, then basically that area of your filesystem is wide open
to the world by anyone using rsync! Something I seriously think you should avoid...
Check the rsyncd.conf man page with "man rsyncd.conf" and read it very carefully where
security options are concerned. You don't want just anyone to come in and rsync up an empty directory
with the "--delete" option, now do you?
The other options are all explained in the man page for rsyncd.conf. Basically, the above options
specify that the files are chmod'ed to uid/gid, the filesystem path is read/write and that the rsync
path shows up in rsync listings. The rsync secrets file I keep in /etc/ along with the configuration
and motd files, and I prefix them with "rsyncd." to keep them together.
Using Rsync Itself
Now on to actually using, or initiating an rsync transfer with rsync itself. It's the same binary
as the daemon, just without the "--daemon" flag. It's simplicity is a virtue. I'll start with a commandline
that I use in a script to synchronize a Web tree below.
Let's go through it one line at a time. The first line calls rsync itself and specifies the options
"verbose," progress" and "stats" so that you can see what's going on this first time around. The
"compress" and "rsh" options specify that you want your stream compressed and to send it through
ssh (remember from above?) for security's sake.
The next line specifies how rsync itself operates on your files. You're telling rsync here to
go through your source pathname recursively with "recursive" and to preserve the file timestamps
and permissions with "times" and "perms." Copy symbolic links with "links" and delete things from
the remote rsync server that are also deleted locally with "delete."
Now we have a line where there's quite a bit of power and flexibility. You can specify GNU tar-like
include and exclude patterns here. In this example, I'm telling rsync to ignore some backup files
that are common in this Web tree ("*.bak" and "*~" files). You can put whatever you want to match
here, suited to your specific needs. You can leave this line out and rsync will copy all your files
as they are locally to the remote machine. Depends on what you want.
Finally, the line that specifies the source pathname, the remote rsync machine and rsync "path."
The first part "/www/*" specifies where on my local filesytem I want rsync to grab the files
from for transmission to the remote rsync server. The next word, "webserver" should be the DNS name
or IP address of your rsync server. It can be "w.x.y.z" or "rsync.mydomain.com"
or even just "webserver" if you have a nickname defined in your /etc/hosts file,
as I do here. The single colon specifies that you want the whole mess sent through your ssh tunnel,
as opposed to the regular rsh tunnel. This is an important point to pay attention to! If you use
two colons, then despite the specification of ssh on the commandline previously, you'll still go
through rsh. Ooops. The last "www" in that line is the rsync "path" that you set up on the server
as in the sample above.
Yes, that's it! If you run the above command on your local rsync client, then you will transfer
the entire "/www/*" tree to the remote "webserver" machine except backup files, preserving file timestamps
and permissions -- compressed and secure -- with visual feedback on what's happening.
Note that in the above example, I used GNU style long options so that you can see what the commandline
is all about. You can also use abbreviations, single letters -- to do the same thing. Try running
rsync with the "--help" option alone and you can see what syntax and options are available.
There are also various pages of information on rsync out there, many of which reside on the rsync
Web site. Below are three documents that you should also read thouroughly before using rsync so that
you understand it well:
The Last but not LeastTechnology is dominated by
two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt.
Ph.D
FAIR USE NOTICEThis site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
You can use PayPal to to buy a cup of coffee for authors
of this site
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society.We do not warrant the correctness
of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be
tracked by Google please disable Javascript for this site. This site is perfectly usable without
Javascript.