|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
Rsync is a program that utilizes an efficient method for mirroring files between two machines similar to rdist (although rdist can mirror file to multiple targets). On remote filesystems (for example via ssh) Rsync can work if and only if times on local and remote servers are synchronized. Discrepancies make the program inoperative.
Both tools operate by checking for differences between the files on the source machine and those on the destination. Rdist simply checks the timestamp on each file to see if it needs updated, and sends the entire updated file from source to destination when necessary. Rsync, on the other hand, uses a more advanced algorithm for checking to see if files need to be updated and to perform subsequent updates. Briefly, rsync operates by:
This algorithms favor files that are appended on which which there s small changes in strings of the same length (often true for configuration files. The most common example of file for which rsync works extremely well are as logs which grow in size but where initial part of the file remains unchanged. In this case only tail will be copied. It very efficient for this type of files.
Rsync algorithms also can be adapted for use with UDP instead of TCP which speeds up transferred on high latency lines several times (say up to 20 times). IBM Aspera is a variation of this theme.
It might even work for some prepended files (fixed size blocks might not align). For text files it would work if lines are used instead of fixed size blocks.
A more detailed explanation of the rsync algorithm may be found in
Tridgell, A. and Mackerras, P. 1996. The rsync algorithm. Technical Report TR-CS-96-05 (June), Department of Computer Science, Australian National University.
As a result of using the above algorithm, rsync is able to cut the amount of data it transfers via network in case of copying remote filesystems. It also make sense for copying files into empty directory on a local filesystem because it preserver attributes.
Rsync can operate iether via ssh or as a client server solution. In the latter case the remote machine (the server) should have rsync package installed: one instance of rsync is running as a server (with --daemon option) and the other as a client. Rsync is transfers files in only one direction at a time, so the name is misleading. It is essentially a mirroring tool, not synchronization tool. For bidirectional synchronization see unison.
Rsync can serve as an alternative or complement to to rcp, scp, scripted FTP sessions, NFS mounting and FTP NetDrive. If the problem can be solved by mirroring information on a remote server instead of maintaining a single copy this is a tool to consider. Other tools especially for mirroing websites also exist.
It is important to note that rsync can be used locally as a very powerful copy utility with many options that are not available in cp
|
Rsync make sense mostly on large files and/or large directory trees (for example web sites) -- in this case advantage of rsync is that it copies only a faction of the files -- the blocks that it found changed. It transfer them in a compressed form further adding to efficiency. By default it uses ssh as a transfer protocol.
Key features:
Rsync is often used to synchronize Website trees from staging to production servers and to backup key areas of the filesystems
It can be run automatically via cron and/or via a CGI script.
Rsync is often used to synchronize website trees from staging to production. It needs times on both servers to be synchronized |
Often large websites like Softpanorama are first updated on so called staging instance and at night this instance is synchronized with production instance running on a different server using rsync. This mode of operation prevents "small disasters": when something wrong is done to the website tree during the day, it can be immediately detected and corrected.
To this end it has an impressive list of features:
Note that -a does not preserve hardlinks, because finding multiply-linked files is expensive. You must separately specify -H.
Source directory with trailing slash are treated differently in rsync. In other words /media/ISO; and /media/ISO/ are not the same thing to rsync:
rsync -av /media/ISO /backup
rsync will create the directory ISO as the subdirectory of the directory /backup
For example if you use
rsync -av /media/ISO/ /backup/ISO
this will be equivalent to previous example, but the directory /backup/ISO should already exist for copy to take place
Example 1: copy the directory /Apps into the backup directory /backup as source directory does not contain slash
rsync -av /Apps /backup
The result will be in /backup/Apps This is similar to
cp -rp /Apps/ /backup/Apps
and
cp -rp /Apps/. /backup
Example 2: For example to copy the content of the mounted ISO into the directory /home/iso/RHEL61 but do not recreate the directory (we will use trailing slash to achieve this)
rsync -av /media/ISO/ /home/iso/RHEL61
in this case you will have the result in /home/iso/RHEL61
Rsync moves hidden files (files whose names begin with a .) without any special options. If you want to exclude hidden files, you can use the option --exclude=".*/". You can also use the --exclude option to prevent copying things like Vim's swap files (.swp) and automatic backups (.bak) created by some programs.
If you are copying to mounted FAT32 partitions use -rtv (option -a will work but produce a lot of complains about inability to change ownership on FAT32 directories)
rsync -rtv /data/Working_set /mnt/backup141022
Like any file based transfer rsync does not copy files that are not readable if the filesystem in mounted as read-only. In this case duplication of, say, set of home directories will run into difficulties. Typically that means that content of .ssh is lost as this directory is readable only by the user. So to copy multiple user directories on read-only filesystems it is safer to assume user identity and copy them as a user, not as root
cd /home
# we assume that userid and name of home directories are identical
# and there are no "extra" dirs in /home that does not correspond to any user.
# If this is not the case script will be more complex
for u in * ; do
if [[ -d /home/$u ]] ; then
su $u -c "rsync -av /home/$u /backup/"
else
echo "ERROR -- directory /home$u does not exist. Skipping it"
fi
done
rsync -av ~ /mnt/backup`date +"%y%m%d"`
Simple copy a directory tree can be achieved using cp -rp command, but rsync provides more flexibility allowing to stay within a single filesystem. This is handy if there are symlinks, different owners, and other complicating circumstances. It is not faster then cp on local transfers unless the previous version of files exist, so that it can move less (which is important if you move a lot of data to USB drive).
It can be used as a regular Unix command, similar to cp -rp in Linux world. But in case of directories the source directory is created in the target of trailing "/" is missing from the source path.
rsync -rtv /data/Working_set /mnt/backup141022
This will copy dir1 into dir2 to give dir2/dir1 thus dir1/file.txt gets copied to dir2/dir1/file.txt
This will copy the contents of dir1 into dir2 to give dir2/contents-of-dir1,
eg dir1/file.txt gets copied to dir2/file.txt
The following all achieve the same results:
--rsh option allows to specify parameter to ssh.
Example:
rsync -avz --rsh="ssh -l mary" ~/web/ [email protected]:~/
[Potential Pitfall: Slash rule applies ]: Do not include the source directory name in the destination.
rsync -av ~/Pictures /mnt/drive2/PicturesThis will result in /mnt/drive2/Pictures/Pictures/
Note that rsync destinations acts just like the cp and rcp commands.
If you do not want to recreate the directory you should use trailing slash: rsync -av ~/Pictures/ /mnt/drive2/ will not recreate the directory Pictures on the target, and this is a different behaviour from rsync -av ~/Pictures /mnt/drive2
This will copy the local dir ~/web/ to the remote dir ~/ on the machine with domain name “example.org”, using login “mary” thru the ssh protocol.
For example, here's what i use to sync/upload my website on my local machine to my server.
rsync -z -a -v --exclude="*~" --exclude=".DS_Store" --exclude=".bash_history" --exclude="*/_curves_robert_yates/*.png" --exclude="logs/*" --exclude="xlogs/*" --delete --rsh="ssh -l u40651121" ~/web/ [email protected]:~/
I used this command daily. The “--exclude” tells it to disregard any files matching that pattern (i.e. if it matches, don't upload it nor delete it on remote server)
Utility rsync can use ssh for remote server or native protocol. In the latter case the remote server should be running rsync with the option --daemon. If firewall is enabled it should allow rsync port (or ssh port is rsync is via ssh)
On Red Hat and derivatives rsync is a standard package with the server part that installs into xinetd. It is disabled by default and you can enable it by manually editing /etc/xinetd.d/rsync file to change the line disable yes to disable no. The resulting file should look:
# default: off # description: The rsync server is a good addition to an ftp server, as it \ # allows crc checksumming etc. service rsync { disable = no socket_type = stream wait = no user = root server = /usr/bin/rsync server_args = --daemon log_on_failure += USERID }
After that you should restart xinetd daemon:
service xinetd restart
To test you can start it from the command line as root or user you want to be able to perform transfers
rsync --verbose --daemon
Use "rsync --daemon --help" to see the daemon-mode command-line options
After that as on a client side create the directory /tmp/test with a couple of files and issue the command:
/usr/bin/rsync -avv --progress /tmp/test testserver:/tmpYou will get protocol that looks something like:
opening connection using ssh testserver rsync --server -vvlogDtpr . /tmp root@testserver's password: building file list ... 3 files to consider delta-transmission enabled test/ test/.fte-history 33 100% 0.00kB/s 0:00:00 (xfer#1, to-check=1/3) test/bash_history 3926 100% 3.74MB/s 0:00:00 (xfer#2, to-check=0/3) total: matches=0 hash_hits=0 false_alarms=0 data=3959 sent 4155 bytes received 70 bytes 497.06 bytes/sec total size is 3959 speedup is 0.94
Taken from Yolinux rsync backup and restore
This creates a backup of your photos in /mnt/drive2/Pictures/
Back-up to a USB thumb drive is similar: rsync -av ~/Pictures /media/KINGSTON
When you add new photos, just re-execute this rsync command to backup the latest changes.
Note: The drive name will be dependent on the manufacturer.
[Potential Pitfall]: Do not include the source directory name
in the destination.
rsync -av ~/Pictures /mnt/drive2/Pictures
This will result in /mnt/drive2/Pictures/Pictures/
Note that rsync destinations acts just like the cp and rcp commands.
Also note that rsync -av ~/Pictures/ /mnt/drive2/ has a different behavior from rsync
-av ~/Pictures /mnt/drive2
This creates a backup of your photos from the two directories in /mnt/drive2/Pictures/
This creates a backup in /mnt/drive2/Pictures/
This creates a backup in /mnt/drive2/Pictures/
Note that when transferring files only, the directory name has to be provided in the destination path.
This creates a backup in /mnt/drive2/Data/
Directory paths are included if specified with a closing slash such as pathx/pathy/pathz/. Path names must be terminated with a "/" or "/."
This creates a backup in /mnt/drive2/src/ but does not transfer files with the ".o" and ".so" extensions.
This transfers files with the extension ".c" and ".h" but does not transfer object files with
the ".o" extensions.
same as rsync.exe -av --exclude='*.o' --filter='+ *.c *.h' ~/src /mnt/drive2
Note that --exclude overides the include filter --filter='+ *.[ch]' so that ".c" and ".h" files under .svn/ are not copied.
This will copy dir1 into dir2 to give dir2/dir1 thus dir1/file.txt gets copied to dir2/dir1/file.txt
This will copy the contents of dir1 into dir2 to give dir2/contents-of-dir1,
eg dir1/file.txt gets copied to dir2/file.txt
The following all achieve the same results:
Command line argument | Description |
---|---|
-a (--archive) |
Archive. Includes options:
|
-d (--dirs) |
Copy directory tree structure without copying the files within the directories |
--existing | Update only existing files from source directory which are already present at the destination. No new files will be transferred. |
-L (--copy-links) |
Transform a symbolic link to a copied file upon transfer |
--stats | Print verbose set of statistics on the transfer Add -h (--human-readable) to print stats in an understandable fashion |
-p (--perms) |
Preserve permissions (not relevant for MS/Windows client) |
-r (--recursive) |
Recursive through directories and sub-directories |
-t (--times) |
Preserve file modification times |
-v (--verbose) |
Verbose |
-z (--compress) |
Compress files during transfer to reduce network bandwidth. Files not stored in an altered
or compressed state. Note that compression will have little or no effect on JPG, PNG and files already using compression. Use arguments --skip-compress=gz/bz2/jpg/jpeg/ogg/mp[34]/mov/avi/rpm/deb/ to avoid compressing files already compressed |
--delete | Delete extraneous files from destination directories. Delete files on archive server if
they were also deleted on client. Use the argument -m (--prune-empty-dirs) to delete empty directories (no longer useful after its contents are deleted) |
--include --exclude --filter |
Specify a pattern for specific inclusion or exclusion or use the more universal
filter for inclusion (+)/exclusion (-). Do not transfer files ending with ".o": --exclude='*.o' Transfer all files ending with ".c" or ".h": --filter='+ *.[ch]' |
-i (--itemize-changes) |
Print information about the transfer. List everything (all file copies and file changes) rsync is going to perform |
--list-only --dry-run |
Don't copy anything, just list what rsync would copy if this option was not given. This helps when debugging the correct exclusion/inclusion filters. |
--progress | Shows percent complete, Kb transferred and Kb/s transfer rate. Includes verbose output. |
Rsync Client-Server Configuration and Operation: |
Rsync can be configured in multiple client-server modes.
The Rsync server is often referred to as rsyncd or the rsync daemon. This is in fact the same rsync executable run with the command line argument "--daemon". This can be run stand-alone or using xinetd as is typically configured on most Linux distributions.
Configure xinetd to manage rsync:File: /etc/xinetd.d/rsync
Default: "disable = yes". Change to "disable = no"
|
Typical Linux distributions do not pre-configure rsync for server use. Both Ubuntu and Red Hat based distributions require that one generates the configuration file "/etc/rsyncd.conf"
File: /etc/rsyncd.conf
|
(eg. update server backup from mobile laptop)
This will initially copy over directory Data and all of its contents to /tmp/Proj1/Data on
the remote server.
Pull: rsync -avr server-host-name::Proj1 /home/user1/Proj1/Data
(eg. update mobile laptop from server backup)
First configure ssh for "password-less" login:
Note that current Linux distributions use ssh version 2 and rsa.
[user1@myclient ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/user1/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/user1/.ssh/id_rsa. Your public key has been saved in /home/user1/.ssh/id_rsa.pub. The key fingerprint is: aa:1c:76:33:8a:9c:10:51:............ |
[user1@myclient ~]$ ls -l ~/.ssh/id_rsa -rw-------. 1 user1 user1 1675 Sep 7 14:55 /home/user1/.ssh/id_rsa |
[user1@myclient ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub user1@remote-server user1@remote-server's password: |
Test "password-less" ssh connection: ssh remote-server
This command should log you in without asking for a password.
Now try rsync (push) using ssh:
rsync -avr --rsh=/usr/bin/ssh /home/user1/Proj1/Data remote-server:/mnt/supersan/Proj1 |
Note that if this connection is is to be spawned by a cron job (eg. root user) then the shell user ID must be provided: user1@
rsync -avr --rsh=/usr/bin/ssh /home/user1/Proj1/Data user1@remote-server:/mnt/supersan/Proj1 |
SSH options may be put in the file ~/.ssh/config
Note that rsync is often used with cron to perform a nightly rsync.
eg. Rsync to get latest updates to the web server at 2:00am:
|
For a serious discussion of rsync's options, you should study README file that came with the distribution (found under /usr/doc/rsync on Debian and Red Hat) and to the manual page.
rsync supports both long and short options for most flags. Short argument forms are given first, if they exist, with long GNU-style arguments following.-a (--archive). This is the most useful option. This acts like the corresponding option to cp. In this case rsync will:
recurse subdirectories (-r);
recreate soft links (-l);
preserve permissions (-p);
preserve timestamps (-t);
attempt to copy devices (if invoked as root) (-D);
preserve group information (-g) (and userid, if invoked as root) (-o).
--exclude=PATTERN exclude files matching PATTERN
--exclude-from=FILE, read exclude patterns from FILE
--include=PATTERN, don't exclude files matching PATTERN
--include-from=FILE, read include patterns from FILE
If the host computer is not running SSH (or RSH), you can run rsync
as a daemon
on target computer. This would have rsync
listening to the port 873 for incoming connections
from other computers utilizing rsync
.
This is not recommended for the transfer of files across unsecured networks, such as the Internet, because the actual data transfer is not encrypted.
Our discussion is borrowed from Running rsync as a daemon - Juan Valencia's website
There are two different approaches to have
rsync
running as a daemon, one is to launch the program with the--daemon
parameter, and the other is to haveinetd
orxinetd
to launchrsync
and have it running as the other services thatinetd
andxinetd
handles.But first, we must configure the file
/etc/rsyncd.conf
and create a file namedrsyncd.secrets
in/etc
with the different usernames and passwords that will be allowed to connect to the rsync daemon.Configuring rsyncd.conf
This file is located in the directory
/etc
, if it doesn't already exists, we need to create it there. We open the file in our preferred text editor, I am going to use gedit for the examples but we can use any editor such as kate in KDE, nano in a terminal, Vim, etc.
sudo gedit /etc/rsyncd.conf
In this file we add the following lines:
lock file = /var/run/rsync.lock
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
[documents]
path = /home/juan/Documents
comment = The documents folder of Juan
uid = juan
gid = juan
read only = no
list = yes
auth users = rsyncclient
secrets file = /etc/rsyncd.secrets
hosts allow = 192.168.1.0/255.255.255.0We can divide this file in two sections, the global parameters and the modules section. The global parameters define the overall behavior of
rsync
. Besides the three parameters that I use here and which I explain below, we can also configure things such as the portrsync
will listen too, but we are going to go with the default 873.
lock file
is the file thatrsync
uses to handle the maximum number of connectionslog file
is wherersync
will save any information about it's activity; when it started running, when and from where does other computers connect, and any errors it encounters.pid file
is where thersync
daemon will write the process id that has been assigned to it, this is useful because we can use this process id to stop the daemon.After the global parameters, we have the modules section, every module is a folder that we share with
rsync
, the important parts here are:
[name]
is the name that we assign to the module. Each module exports a directory tree. The module name can not contain slashes or a closing square bracket.path
is the path of the folder that we are making available withrsync
comment
is a comment that appears next to the module name when a client obtain the list of all available modulesuid
When thersync
daemon is run as root, we can specify which user owns the files that are transfer from and to.gid
This allows us to set the group that own the files that are transferred if the daemon is run as rootread only
determines if the clients who connect to rsync can upload files or not, the default of this parameter is true for all modules.list
allows the module to be listed when clients ask for a list of available modules, setting this to false hides the module from the listing.auth users
is a list of users allowed to access the content of this module, the users are separated by comas. The users don't need to exist in the system, they are defined by the secrets file.secrets file
defines the file that contains the usernames and passwords of the valid users forrsync
hosts allow
are the addresses allowed to connect to the system. Without this parameter all hosts are allowed to connect.Creating the secrets file
Once
rsyncd.conf
is properly set, we need to create the secrets file. This file contains all of the usernames and passwords that will be able to log in to thersync
daemon, this usernames and passwords are independent of the user that exist in the system, so we can create users whom already exist in the system without problems. As we specified the file/etc/rsyncd.secrets
inrsyncd.conf
, we will create and edit this file it in our favorite text editor:
sudo gedit /etc/rsyncd.secrets
In this file we add the usernames and the passwords, one per line, separated by a colon (I don't actually use passwords that are this simple, and you shouldn't either):
rsyncclient:passWord
juan:PassWord
backup:Password
user:passwordFinally, change the permission of this file so it can't be read or modified by other users,
rsync
will fail if the permissions of this file are not appropriately set:
sudo chmod 600 /etc/rsyncd.secrets
Launching rsync with the --daemon attribute
Once everything is set, one of the ways to use
rsync
as a daemon is launching it with the--daemon
parameter, if you followed the previous instructions you can simply use this command:
sudo rsync --daemon
We can check if it is running by seeing the log file that we defined in
rsyncd.conf
, in our example this is located in/var/log/rsyncd.log
. Additionally, if the daemon is running, the file/var/run/rsyncd.pid
will contain the process ID ofrsync
.If we launched
rsync
in this manner, we can stop it by killing its process. We can obtaining the process ID by reading the contents of the file/var/run/rsyncd.pid
and then invoke kill with this process ID. We can pass it directly to kill using:
sudo kill `cat /var/run/rsyncd.pid`
Using inetd to handle the rsync daemon
inetd
, the InterNET Daemon, can handle all the services associated with Internet, such as FTP, telnet, and e-mail. Whileinetd
is still used, due to security concerns it is being replaced by other more modern alternatives, a very popular one isxinetd
(eXtended InterNET Daemon). Since thersync
daemon works using an Internet connection, we can add it toinetd
orxinetd
and allow either of them to handle it.To enable
rsync
ininetd
, we need to open the file/etc/inetd.conf
in our favorite text editor and add the following line to it, assumingrsync
is in/usr/bin
as it should be in Linux distributions:
sudo gedit /etc/inetd.conf
Then add the following line:
rsync stream tcp nowait root /usr/bin/rsync rsync --daemon
When using
inetd
we need to get sure that the port 873 is appropriately mapped torsync
in the file/etc/services
, by default it must be, we can check using:
cat /etc/services | grep rsync
It should show us this:
rsync 873/tcp
If you don't see this, then open the file
/etc/services
in a text editor and add that line.Finally, restart the
inetd
daemon:
killall -1 inetd
Using xinetd to handle the rsync daemon
xinetd
, the eXtended InterNET daemon, is a widely adopted replacement forinetd
, asinetd
doesn't offer security mechanisms. The handling of services is different frominetd
.xinetd
may already have an entry for rsync that just needs to be enabled, thersync
configuration resides in the file/etc/xinetd.d/rsync
, open this file in your text editor:
sudo gedit /etc/xinetd.d/rsync
and change the line
disable = yes
todisable = no
.If this file doesn't already exist, you can create it and edit it:
sudo gedit /etc/xinetd.d/rsync
And add the following lines to it:
service rsync
{
disable = no
socket_type = stream
port = 873
protocol = tcp
wait = no
user = root
server = /usr/bin/rsync
server_args = --daemon
log_on_failure += USERID
}Unlike
inetd
,xinetd
doesn't need to have an entry in/etc/services
, it can handle the port/protocol by itself. If rsync is defined in/etc/services
, the lines port and protocol can be omitted. And now restart thexinetd
daemon:
killall -1 xinetd
Connecting to the rsync daemon
To connect to
rsync
when it is running as a Daemon, instead of use a colon as we do when using SSH, we need to use a double colon, followed by the module name, and the file or folder that we want to copy or synchronize, we can use:
rsync -rtv user@host::module/source/ destination/
Another way to access the file would be using
rsync://
followed by the host's address, the module, and finally the location of file or folder that we want to access:
rsync -rtv rsync://user@host/module/source/ destination/
For example, taking the parameters given in the example of
rsyncd.conf
that I posted, a way to transfer a folder called "source" inside the folder/home/juan/Documents
of the host computer, would be using any of this two parameters, assuming the host is located at192.168.1.100
rsync -rtv [email protected]::documents/source/ destination/
rsync -rtv rsync://[email protected]/documents/source/ destination/Just remember that the user that appears there is one of the users that we defined in
/etc/rsyncd.secrets
and not a user of the host computer.
uid = root gid = root exclude = /tmp exclude = /var/run [home] path = /home comment = All home directories read only = true hosts allow = testserver.my.com
This configuration file specifies an rsync daemon that runs as root, and offers /export/home to any client from testserver.my.com
The other options ensure that directories named "/tmp" or "/var/run" are never transferred, and that clients have read-only access to the server.
Another example:
gid = nobody uid = nobody read only = true use chroot = no transfer logging = true log format = %h %o %f %l %b log file = /var/log/rsyncd.log
For example, to make the directory /srv/ftp available with rsync using the alias FTP use the following configuration:
[FTP] path = /srv/ftp comment = An Example
Then start rsyncd with rcrsyncd start. rsyncd can also be started automatically during the boot process. Set this up by activating this service in the runlevel editor provided by YaST or by manually entering the command insserv rsyncd. The daemon rsyncd can alternatively be started by xinetd. This is, however, only recommended for servers that rarely use rsyncd.
The example also creates a log file listing all connections. This file is stored in /var/log/rsyncd.log.
It is then possible to test the transfer from a client system using aliases for directories. Do this with the following command:
rsync -avz sun::FTP
This command lists all files present in the directory /srv/ftp of the server. This request is also logged in the log file /var/log/rsyncd.log. To start an actual transfer, provide a target directory. Use . for the current directory. For example:
rsync -avz sun::FTP
By default, no files are deleted while synchronizing with rsync. If this should be forced, the additional option --delete must be stated. To ensure that no newer files are deleted, the option --update can be used instead. Any conflicts that arise must be resolved manually.
From SUSE Linux Enterprise Server (SLES 10) Installation and Administration Guide:
rsync is useful when large amounts of data need to be transmitted regularly while not changing too much. This is, for example, often the case when creating backups. Another application concerns staging servers. These are servers that store complete directory trees of Web servers that are regularly mirrored onto a Web server in a DMZ.
40.6.1 Configuration and Operationrsync can be operated in two different modes. It can be used to archive or copy data. To accomplish this, only a remote shell, like ssh, is required on the target system. However, rsync can also be used as a daemon to provide directories to the network.
The basic mode of operation of rsync does not require any special configuration. rsync directly allows mirroring complete directories onto another system. As an example, the following command creates a backup of the home directory of tux on a backup server named sun:
rsync -baz -e ssh /home/tux/ tux@sun:/backupThe following command is used to play the directory back:
rsync -az -e ssh tux@sun:/backup /home/tux/Up to this point, the handling does not differ much from that of a regular copying tool, like scp.
rsync should be operated in
rsyncmode to make all its features fully available.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
May 06, 2020 | www.howtoforge.com
Configure Lsyncd to Synchronize Local Directories
In this section, we will configure Lsyncd to synchronize /etc/ directory to /mnt/ directory on local system.
First, create a directory for Lsyncd with the following command:
mkdir /etc/lsyncdNext, create a new Lsyncd configuration file and define the source and destination directory that you want to sync.
nano /etc/lsyncd/lsyncd.conf.luaAdd the following lines:
settings { logfile = "/var/log/lsyncd/lsyncd.log", statusFile = "/var/log/lsyncd/lsyncd.status", statusInterval = 20, nodaemon = false } sync { default.rsync, source = "/etc/", target = "/mnt" }Save and close the file when you are finished.
systemctl start lsyncd systemctl enable lsyncdYou can also check the status of the Lsyncd service with the following command:
systemctl status lsyncdYou should see the following output:
? lsyncd.service - LSB: lsyncd daemon init script Loaded: loaded (/etc/init.d/lsyncd; generated) Active: active (running) since Fri 2020-05-01 03:31:20 UTC; 9s ago Docs: man:systemd-sysv-generator(8) Process: 36946 ExecStart=/etc/init.d/lsyncd start (code=exited, status=0/SUCCESS) Tasks: 2 (limit: 4620) Memory: 12.5M CGroup: /system.slice/lsyncd.service ??36921 /usr/bin/lsyncd -pidfile /var/run/lsyncd.pid /etc/lsyncd/lsyncd.conf.lua ??36952 /usr/bin/lsyncd -pidfile /var/run/lsyncd.pid /etc/lsyncd/lsyncd.conf.lua May 01 03:31:20 ubuntu20 systemd[1]: lsyncd.service: Succeeded. May 01 03:31:20 ubuntu20 systemd[1]: Stopped LSB: lsyncd daemon init script. May 01 03:31:20 ubuntu20 systemd[1]: Starting LSB: lsyncd daemon init script... May 01 03:31:20 ubuntu20 lsyncd[36946]: * Starting synchronization daemon lsyncd May 01 03:31:20 ubuntu20 lsyncd[36951]: 03:31:20 Normal: --- Startup, daemonizing --- May 01 03:31:20 ubuntu20 lsyncd[36946]: ...done. May 01 03:31:20 ubuntu20 systemd[1]: Started LSB: lsyncd daemon init script.You can check the Lsyncd log file for more details as shown below:
tail -f /var/log/lsyncd/lsyncd.logYou should see the following output:
/lsyncd/lsyncd.conf.lua Fri May 1 03:30:57 2020 Normal: Finished a list after exitcode: 0 Fri May 1 03:31:20 2020 Normal: --- Startup, daemonizing --- Fri May 1 03:31:20 2020 Normal: recursive startup rsync: /etc/ -> /mnt/ Fri May 1 03:31:20 2020 Normal: Startup of /etc/ -> /mnt/ finished.You can also check the syncing status with the following command:
tail -f /var/log/lsyncd/lsyncd.statusYou should be able to see the changes in the /mnt directory with the following command:
ls /mnt/You should see that all the files and directories from the /etc directory are added to the /mnt directory:
acpi dconf hosts logrotate.conf newt rc2.d subuid- adduser.conf debconf.conf hosts.allow logrotate.d nginx rc3.d sudoers alternatives debian_version hosts.deny lsb-release nsswitch.conf rc4.d sudoers.d apache2 default init lsyncd ntp.conf rc5.d sysctl.conf apparmor deluser.conf init.d ltrace.conf openal rc6.d sysctl.d apparmor.d depmod.d initramfs-tools lvm opt rcS.d systemd apport dhcp inputrc machine-id os-release resolv.conf terminfo apt dnsmasq.d insserv.conf.d magic overlayroot.conf rmt timezone at.deny docker iproute2 magic.mime PackageKit rpc tmpfiles.d bash.bashrc dpkg iscsi mailcap pam.conf rsyslog.conf ubuntu-advantage bash_completion e2scrub.conf issue mailcap.order pam.d rsyslog.d ucf.conf bash_completion.d environment issue.net manpath.config passwd screenrc udev bindresvport.blacklist ethertypes kernel mdadm passwd- securetty ufw binfmt.d fonts kernel-img.conf mime.types perl security update-manager byobu fstab landscape mke2fs.conf php selinux update-motd.d ca-certificates fuse.conf ldap modprobe.d pki sensors3.conf update-notifier ca-certificates.conf fwupd ld.so.cache modules pm sensors.d vdpau_wrapper.cfg calendar gai.conf ld.so.conf modules-load.d polkit-1 services vim console-setup groff ld.so.conf.d mtab pollinate shadow vmware-tools cron.d group legal multipath popularity-contest.conf shadow- vtrgb cron.daily group- letsencrypt multipath.conf profile shells vulkan cron.hourly grub.d libaudit.conf mysql profile.d skel wgetrc cron.monthly gshadow libnl-3 nanorc protocols sos.conf X11 crontab gshadow- locale.alias netplan pulse ssh xattr.conf cron.weekly gss locale.gen network python3 ssl xdg cryptsetup-initramfs hdparm.conf localtime networkd-dispatcher python3.8 subgid zsh_command_not_found crypttab host.conf logcheck NetworkManager rc0.d subgid- dbus-1 hostname login.defs networks rc1.d subuidConfigure Lsyncd to Synchronize Remote DirectoriesIn this section, we will configure Lsyncd to synchronize /etc/ directory on the local system to the /opt/ directory on the remote system. Advertisements
Before starting, you will need to setup SSH key-based authentication between the local system and remote server so that the local system can connect to the remote server without password.
On the local system, run the following command to generate a public and private key:
ssh-keygen -t rsaYou should see the following output:
Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa Your public key has been saved in /root/.ssh/id_rsa.pub The key fingerprint is: SHA256:c7fhjjhAamFjlk6OkKPhsphMnTZQFutWbr5FnQKSJjE root@ubuntu20 The key's randomart image is: +---[RSA 3072]----+ | E .. | | ooo | | oo= + | |=.+ % o . . | |[email protected] oSo. o | |ooo=B o .o o o | |=o.... o o | |+. o .. o | | . ... . | +----[SHA256]-----+The above command will generate a private and public key inside ~/.ssh directory.
Next, you will need to copy the public key to the remote server. You can copy it with the following command: Advertisements
ssh-copy-id root@remote-server-ipYou will be asked to provide the password of the remote root user as shown below:
[email protected]'s password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh '[email protected]'" and check to make sure that only the key(s) you wanted were added.Once the user is authenticated, the public key will be appended to the remote user authorized_keys file and connection will be closed.
Now, you should be able log in to the remote server without entering password.
To test it just try to login to your remote server via SSH:
ssh root@remote-server-ipIf everything went well, you will be logged in immediately.
Next, you will need to edit the Lsyncd configuration file and define the rsyncssh and target host variables:
nano /etc/lsyncd/lsyncd.conf.luaChange the file as shown below:
settings { logfile = "/var/log/lsyncd/lsyncd.log", statusFile = "/var/log/lsyncd/lsyncd.status", statusInterval = 20, nodaemon = false } sync { default.rsyncssh, source = "/etc/", host = "remote-server-ip", targetdir = "/opt" }Save and close the file when you are finished. Then, restart the Lsyncd service to start the sync.
systemctl restart lsyncdYou can check the status of synchronization with the following command:
tail -f /var/log/lsyncd/lsyncd.logYou should see the following output: Advertisements
Fri May 1 04:32:05 2020 Normal: --- Startup, daemonizing --- Fri May 1 04:32:05 2020 Normal: recursive startup rsync: /etc/ -> 45.58.38.21:/opt/ Fri May 1 04:32:06 2020 Normal: Startup of "/etc/" finished: 0You should be able to see the changes in the /opt directory on the remote server with the following command:
ls /optYou should see that all the files and directories from the /etc directory are added to the remote server's /opt directory:
acpi dconf hosts logrotate.conf newt rc2.d subuid- adduser.conf debconf.conf hosts.allow logrotate.d nginx rc3.d sudoers alternatives debian_version hosts.deny lsb-release nsswitch.conf rc4.d sudoers.d apache2 default init lsyncd ntp.conf rc5.d sysctl.conf apparmor deluser.conf init.d ltrace.conf openal rc6.d sysctl.d apparmor.d depmod.d initramfs-tools lvm opt rcS.d systemd apport dhcp inputrc machine-id os-release resolv.conf terminfo apt dnsmasq.d insserv.conf.d magic overlayroot.conf rmt timezone at.deny docker iproute2 magic.mime PackageKit rpc tmpfiles.d bash.bashrc dpkg iscsi mailcap pam.conf rsyslog.conf ubuntu-advantage bash_completion e2scrub.conf issue mailcap.order pam.d rsyslog.d ucf.conf bash_completion.d environment issue.net manpath.config passwd screenrc udev bindresvport.blacklist ethertypes kernel mdadm passwd- securetty ufw binfmt.d fonts kernel-img.conf mime.types perl security update-manager byobu fstab landscape mke2fs.conf php selinux update-motd.d ca-certificates fuse.conf ldap modprobe.d pki sensors3.conf update-notifier ca-certificates.conf fwupd ld.so.cache modules pm sensors.d vdpau_wrapper.cfg calendar gai.conf ld.so.conf modules-load.d polkit-1 services vim console-setup groff ld.so.conf.d mtab pollinate shadow vmware-tools cron.d group legal multipath popularity-contest.conf shadow- vtrgb cron.daily group- letsencrypt multipath.conf profile shells vulkan cron.hourly grub.d libaudit.conf mysql profile.d skel wgetrc cron.monthly gshadow libnl-3 nanorc protocols sos.conf X11 crontab gshadow- locale.alias netplan pulse ssh xattr.conf cron.weekly gss locale.gen network python3 ssl xdg cryptsetup-initramfs hdparm.conf localtime networkd-dispatcher python3.8 subgid zsh_command_not_found crypttab host.conf logcheck NetworkManager rc0.d subgid- dbus-1 hostname login.defs networks rc1.d subuidConclusionIn the above guide, we learned how to install and configure Lsyncd for local synchronization and remote synchronization. You can now use Lsyncd in the production environment for backup purposes. Feel free to ask me if you have any questions.
May 06, 2020 | axkibe.github.io
Lsyncd - Live Syncing (Mirror) Daemon Description
Lsyncd uses a filesystem event interface (inotify or fsevents) to watch for changes to local files and directories. Lsyncd collates these events for several seconds and then spawns one or more processes to synchronize the changes to a remote filesystem. The default synchronization method is rsync . Thus, Lsyncd is a light-weight live mirror solution. Lsyncd is comparatively easy to install and does not require new filesystems or block devices. Lysncd does not hamper local filesystem performance.
As an alternative to rsync, Lsyncd can also push changes via rsync+ssh. Rsync+ssh allows for much more efficient synchronization when a file or direcotry is renamed or moved to a new location in the local tree. (In contrast, plain rsync performs a move by deleting the old file and then retransmitting the whole file.)
Fine-grained customization can be achieved through the config file. Custom action configs can even be written from scratch in cascading layers ranging from shell scripts to code written in the Lua language . Thus, simple, powerful and flexible configurations are possible.
Lsyncd 2.2.1 requires rsync >= 3.1 on all source and target machines.
License: GPLv2 or any later GPL version.
When to useLsyncd is designed to synchronize a slowly changing local directory tree to a remote mirror. Lsyncd is especially useful to sync data from a secure area to a not-so-secure area.
Other synchronization toolsDRBD operates on block device level. This makes it useful for synchronizing systems that are under heavy load. Lsyncd on the other hand does not require you to change block devices and/or mount points, allows you to change uid/gid of the transferred files, separates the receiver through the one-way nature of rsync. DRBD is likely the better option if you are syncing databases.
GlusterFS and BindFS use a FUSE-Filesystem to interject kernel/userspace filesystem events.
Mirror is an asynchronous synchronisation tool that takes use of the inotify notifications much like Lsyncd. The main differences are: it is developed specifically for master-master use, thus running on a daemon on both systems, uses its own transportation layer instead of rsync and is Java instead of Lsyncd's C core with Lua scripting.
Lsyncd usage examplesThis watches and rsyncs the local directory /home with all sub-directories and transfers them to 'remotehost' using the rsync-share 'share'.
This will also rsync/watch '/home', but it uses a ssh connection to make moves local on the remotehost instead of re-transmitting the moved file over the wire.
DisclaimerBesides the usual disclaimer in the license, we want to specifically emphasize that neither the authors, nor any organization associated with the authors, can or will be held responsible for data-loss caused by possible malfunctions of Lsyncd.
Apr 08, 2020 | www.2daygeek.com
by Magesh Maruthamuthu · Last Updated: April 2, 2020
Typically, you use the rsync command or scp command to copy files from one server to another.
But if you want to perform these commands in reverse mode, how do you do that?
Have you tried this? Have you had a chance to do this?
Why would you want to do that? Under what circumstances should you use it?
Scenario-1:
When you copy a file from "Server-1" to "Server-2" , you must use the rsync or scp command in the standard way.Also, you can do from "Server-2" to "Server-1" if you need to.
To do so, you must have a password for both systems.
Scenario-2:
You have a jump server and only enabled the ssh key-based authentication to access other servers (you do not have the password for that).In this case you are only allowed to access the servers from the jump server and you cannot access the jump server from other servers.
In this scenario, if you want to copy some files from other servers to the jump server, how do you do that?
Yes, you can do this using the reverse mode of the scp or rsync command.
General Syntax of the rsync and scp Command:
The following is a general syntax of the rsync and scp commands.
rsync: rsync [Options] [Source_Location] [Destination_Location] scp: scp [Options] [Source_Location] [Destination_Location]General syntax of the reverse rsync and scp command:
The general syntax of the reverse rsync and scp commands as follows.
rsync: rsync [Options] [Destination_Location] [Source_Location] scp: scp [Options] [Destination_Location] [Source_Location]1) How to Use rsync Command in Reverse Mode with Standard Port
We will copy the "2daygeek.tar.gz" file from the "Remote Server" to the "Jump Server" using the reverse rsync command with the standard port.
# rsync -avz -e ssh [email protected]:/root/2daygeek.tar.gz /root/backup The authenticity of host 'jump.2daygeek.com (jump.2daygeek.com)' can't be established. RSA key fingerprint is 6f:ad:07:15:65:bf:54:a6:8c:5f:c4:3b:99:e5:2d:34. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'jump.2daygeek.com' (RSA) to the list of known hosts. [email protected]'s password: receiving file list ... done 2daygeek.tar.gz sent 42 bytes received 23134545 bytes 1186389.08 bytes/sec total size is 23126674 speedup is 1.00You can see the file copied using the ls command .
# ls -h /root/backup/*.tar.gz total 125M -rw------- 1 root root 23M Oct 26 01:00 2daygeek.tar.gz2) How to Use rsync Command in Reverse Mode with Non-Standard Port
We will copy the "2daygeek.tar.gz" file from the "Remote Server" to the "Jump Server" using the reverse rsync command with the non-standard port.
# rsync -avz -e "ssh -p 11021" [email protected]:/root/backup/weekly/2daygeek.tar.gz /root/backup The authenticity of host '[jump.2daygeek.com]:11021 ([jump.2daygeek.com]:11021)' can't be established. RSA key fingerprint is 9c:ab:c0:5b:3b:44:80:e3:db:69:5b:22:ba:d6:f1:c9. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '[jump.2daygeek.com]:11021' (RSA) to the list of known hosts. [email protected]'s password: receiving incremental file list 2daygeek.tar.gz sent 30 bytes received 23134526 bytes 1028202.49 bytes/sec total size is 23126674 speedup is 1.003) How to Use scp Command in Reverse Mode on Linux
We will copy the "2daygeek.tar.gz" file from the "Remote Server" to the "Jump Server" using the reverse scp command.
# scp [email protected]:/root/backup/weekly/2daygeek.tar.gz /root/backupShare this:Tags: file copy Linux Linux Backup Reverse rsync Reverse scp rsync Command Examples scp Command Examples
Magesh Maruthamuthu
Feb 22, 2020 | www.digitalocean.com
... ... ...
Useful Options for Rsync
Rsync provides many options for altering the default behavior of the utility. We have already discussed some of the more necessary flags.
If you are transferring files that have not already been compressed, like text files, you can reduce the network transfer by adding compression with the -z option:
- rsync -az source destination
The -P flag is very helpful. It combines the flags –progress and –partial . The first of these gives you a progress bar for the transfers and the second allows you to resume interrupted transfers:
- rsync -azP source destination
If we run the command again, we will get a shorter output, because no changes have been made. This illustrates rsync's ability to use modification times to determine if changes have been made.
- rsync -azP source destination
We can update the modification time on some of the files and see that rsync intelligently re-copies only the changed files:
- touch dir1/file{1..10}
- rsync -azP source destination
In order to keep two directories truly in sync, it is necessary to delete files from the destination directory if they are removed from the source. By default, rsync does not delete anything from the destination directory.
We can change this behavior with the –delete option. Before using this option, use the –dry-run option and do testing to prevent data loss:
- rsync -a --delete source destination
If you wish to exclude certain files or directories located inside a directory you are syncing, you can do so by specifying them in a comma-separated list following the –exclude= option:
- rsync -a --exclude= pattern_to_exclude source destination
If we have specified a pattern to exclude, we can override that exclusion for files that match a different pattern by using the –include= option.
- rsync -a --exclude= pattern_to_exclude --include= pattern_to_include source destination
Finally, rsync's--backup
--backup-dir
rsync -a --delete --backup --backup-dir= /path/to/backups /path/to/source destination
Aug 12, 2019 | www.cyberciti.biz
I need to copy all the *.c files from local laptop named hostA to hostB including all directories. I am using the following scp command but do not know how to exclude specific files (such as *.out): $ scp -r ~/projects/ user@hostB:/home/delta/projects/ How do I tell scp command to exclude particular file or directory at the Linux/Unix command line? One can use scp command to securely copy files between hosts on a network. It uses ssh for data transfer and authentication purpose. Typical scp command syntax is as follows:
Scp exclude filesscp file1 user@host:/path/to/dest/ scp -r /path/to/source/ user@host:/path/to/dest/ scp [options] /dir/to/source/ user@host:/dir/to/dest/
I don't think so you can filter or exclude files when using scp command. However, there is a great workaround to exclude files and copy it securely using ssh. This page explains how to filter or excludes files when using scp to copy a directory recursively.
How to use rsync command to exclude filesThe syntax is:
rsync -av -e ssh --exclude='*.out' /path/to/source/ user@hostB:/path/to/dest/
Where,
Example of rsync command
- -a : Recurse into directories i.e. copy all files and subdirectories. Also, turn on archive mode and all other options (-rlptgoD)
- -v : Verbose output
- -e ssh : Use ssh for remote shell so everything gets encrypted
- --exclude='*.out' : exclude files matching PATTERN e.g. *.out or *.c and so on.
In this example copy all file recursively from ~/virt/ directory but exclude all *.new files:
$ rsync -av -e ssh --exclude='*.new' ~/virt/ root@centos7:/tmp
Jun 23, 2019 | www.fsarchiver.org
An important test is done using rsync. It requires two partitions: the original one, and a spare partition where to restore the archive. It allows to know whether or not there are differences between the original and the restored filesystem. rsync is able to compare both the files contents, and files attributes (timestamps, permissions, owner, extended attributes, acl, ), so that's a very good test. The following command can be used to know whether or not files are the same (data and attributes) on two file-systems:
rsync -axHAXnP /mnt/part1/ /mnt/part2/
May 15, 2013 | stackoverflow.com
Glitches , May 15, 2013 at 18:06
I am trying to backup my file server to a remove file server using rsync. Rsync is not successfully resuming when a transfer is interrupted. I used the partial option but rsync doesn't find the file it already started because it renames it to a temporary file and when resumed it creates a new file and starts from beginning.Here is my command:
rsync -avztP -e "ssh -p 2222" /volume1/ myaccont@backup-server-1:/home/myaccount/backup/ --exclude "@spool" --exclude "@tmp"
When this command is ran, a backup file named OldDisk.dmg from my local machine get created on the remote machine as something like .OldDisk.dmg.SjDndj23 .
Now when the internet connection gets interrupted and I have to resume the transfer, I have to find where rsync left off by finding the temp file like .OldDisk.dmg.SjDndj23 and rename it to OldDisk.dmg so that it sees there already exists a file that it can resume.
How do I fix this so I don't have to manually intervene each time?
Richard Michael , Nov 6, 2013 at 4:26
TL;DR : Use --timeout=X (X in seconds) to change the default rsync server timeout, not --inplace .The issue is the rsync server processes (of which there are two, see rsync --server ... in ps output on the receiver) continue running, to wait for the rsync client to send data.
If the rsync server processes do not receive data for a sufficient time, they will indeed timeout, self-terminate and cleanup by moving the temporary file to it's "proper" name (e.g., no temporary suffix). You'll then be able to resume.
If you don't want to wait for the long default timeout to cause the rsync server to self-terminate, then when your internet connection returns, log into the server and clean up the rsync server processes manually. However, you must politely terminate rsync -- otherwise, it will not move the partial file into place; but rather, delete it (and thus there is no file to resume). To politely ask rsync to terminate, do not SIGKILL (e.g., -9 ), but SIGTERM (e.g., pkill -TERM -x rsync - only an example, you should take care to match only the rsync processes concerned with your client).
Fortunately there is an easier way: use the --timeout=X (X in seconds) option; it is passed to the rsync server processes as well.
For example, if you specify rsync ... --timeout=15 ... , both the client and server rsync processes will cleanly exit if they do not send/receive data in 15 seconds. On the server, this means moving the temporary file into position, ready for resuming.
I'm not sure of the default timeout value of the various rsync processes will try to send/receive data before they die (it might vary with operating system). In my testing, the server rsync processes remain running longer than the local client. On a "dead" network connection, the client terminates with a broken pipe (e.g., no network socket) after about 30 seconds; you could experiment or review the source code. Meaning, you could try to "ride out" the bad internet connection for 15-20 seconds.
If you do not clean up the server rsync processes (or wait for them to die), but instead immediately launch another rsync client process, two additional server processes will launch (for the other end of your new client process). Specifically, the new rsync client will not re-use/reconnect to the existing rsync server processes. Thus, you'll have two temporary files (and four rsync server processes) -- though, only the newer, second temporary file has new data being written (received from your new rsync client process).
Interestingly, if you then clean up all rsync server processes (for example, stop your client which will stop the new rsync servers, then SIGTERM the older rsync servers, it appears to merge (assemble) all the partial files into the new proper named file. So, imagine a long running partial copy which dies (and you think you've "lost" all the copied data), and a short running re-launched rsync (oops!).. you can stop the second client, SIGTERM the first servers, it will merge the data, and you can resume.
Finally, a few short remarks:
- Don't use --inplace to workaround this. You will undoubtedly have other problems as a result, man rsync for the details.
- It's trivial, but -t in your rsync options is redundant, it is implied by -a .
- An already compressed disk image sent over rsync without compression might result in shorter transfer time (by avoiding double compression). However, I'm unsure of the compression techniques in both cases. I'd test it.
- As far as I understand --checksum / -c , it won't help you in this case. It affects how rsync decides if it should transfer a file. Though, after a first rsync completes, you could run a second rsync with -c to insist on checksums, to prevent the strange case that file size and modtime are the same on both sides, but bad data was written.
JamesTheAwesomeDude , Dec 29, 2013 at 16:50
Just curious: wouldn't SIGINT (aka ^C ) be 'politer' than SIGTERM ? – JamesTheAwesomeDude Dec 29 '13 at 16:50Richard Michael , Dec 29, 2013 at 22:34
I didn't test how the server-side rsync handles SIGINT, so I'm not sure it will keep the partial file - you could check. Note that this doesn't have much to do with Ctrl-c ; it happens that your terminal sends SIGINT to the foreground process when you press Ctrl-c , but the server-side rsync has no controlling terminal. You must log in to the server and use kill . The client-side rsync will not send a message to the server (for example, after the client receives SIGINT via your terminal Ctrl-c ) - might be interesting though. As for anthropomorphizing, not sure what's "politer". :-) – Richard Michael Dec 29 '13 at 22:34d-b , Feb 3, 2015 at 8:48
I just tried this timeout argument rsync -av --delete --progress --stats --human-readable --checksum --timeout=60 --partial-dir /tmp/rsync/ rsync://$remote:/ /src/ but then it timed out during the "receiving file list" phase (which in this case takes around 30 minutes). Setting the timeout to half an hour so kind of defers the purpose. Any workaround for this? – d-b Feb 3 '15 at 8:48Cees Timmerman , Sep 15, 2015 at 17:10
@user23122 --checksum reads all data when preparing the file list, which is great for many small files that change often, but should be done on-demand for large files. – Cees Timmerman Sep 15 '15 at 17:10
Feb 11, 2019 | www.mankier.com
prsync -- parallel file sync program
SynopsisDescription
prsync [ - v A r a z ] [ -h hosts_file ] [ -H [ user @] host [: port ]] [ -l user ] [ -p par ] [ -o outdir ] [ -e errdir ] [ -t timeout ] [ -O options ] [ -x args ] [ -X arg ] [ -S args ] local ... remote
prsync is a program for copying files in parallel to a number of hosts using the popular rsync program. It provides features such as passing a password to ssh, saving output to files, and timing out.
OptionsTips
- -h host_file
- --hosts host_file
- Read hosts from the given host_file . Lines in the host file are of the form [ user @] host [: port ] and can include blank lines and comments (lines beginning with "#"). If multiple host files are given (the -h option is used more than once), then prsync behaves as though these files were concatenated together. If a host is specified multiple times, then prsync will connect the given number of times.
- -H
- [ user @] host [: port ]
- --host
- [ user @] host [: port ]
- -H
- "[ user @] host [: port ] [ [ user @] host [: port ] ... ]"
- --host
- "[ user @] host [: port ] [ [ user @] host [: port ] ... ]"
Add the given host strings to the list of hosts. This option may be given multiple times, and may be used in conjunction with the -h option.
- -l user
- --user user
- Use the given username as the default for any host entries that don't specifically specify a user.
- -p parallelism
- --par parallelism
- Use the given number as the maximum number of concurrent connections.
- -t timeout
- --timeout timeout
- Make connections time out after the given number of seconds. With a value of 0, prsync will not timeout any connections.
- -o outdir
- --outdir outdir
- Save standard output to files in the given directory. Filenames are of the form [ user @] host [: port ][. num ] where the user and port are only included for hosts that explicitly specify them. The number is a counter that is incremented each time for hosts that are specified more than once.
- -e errdir
- --errdir errdir
- Save standard error to files in the given directory. Filenames are of the same form as with the -o option.
- -x args
- --extra-args args
- Passes extra rsync command-line arguments (see the rsync(1) man page for more information about rsync arguments). This option may be specified multiple times. The arguments are processed to split on whitespace, protect text within quotes, and escape with backslashes. To pass arguments without such processing, use the -X option instead.
- -X arg
- --extra-arg arg
- Passes a single rsync command-line argument (see the rsync(1) man page for more information about rsync arguments). Unlike the -x option, no processing is performed on the argument, including word splitting. To pass multiple command-line arguments, use the option once for each argument.
- -O options
- --options options
- SSH options in the format used in the SSH configuration file (see the ssh_config(5) man page for more information). This option may be specified multiple times.
- -A
- --askpass
- Prompt for a password and pass it to ssh. The password may be used for either to unlock a key or for password authentication. The password is transferred in a fairly secure manner (e.g., it will not show up in argument lists). However, be aware that a root user on your system could potentially intercept the password.
- -v
- --verbose
- Include error messages from rsync with the -i and \ options.
- -r
- --recursive
- Recursively copy directories.
- -a
- --archive
- Use rsync archive mode (rsync's -a option).
- -z
- --compress
- Use rsync compression.
- -S args
- --ssh-args args
- Passes extra SSH command-line arguments (see the ssh(1) man page for more information about SSH arguments). The given value is appended to the ssh command (rsync's -e option) without any processing.
The ssh_config file can include an arbitrary number of Host sections. Each host entry specifies ssh options which apply only to the given host. Host definitions can even behave like aliases if the HostName option is included. This ssh feature, in combination with pssh host files, provides a tremendous amount of flexibility.
Exit StatusThe exit status codes from prsync are as follows:
Authors
- 0
- Success
- 1
- Miscellaneous error
- 2
- Syntax or usage error
- 3
- At least one process was killed by a signal or timed out.
- 4
- All processes completed, but at least one rsync process reported an error (exit status other than 0).
Written by Brent N. Chun <[email protected]> and Andrew McNabb <[email protected]>.
https://github.com/lilydjwg/pssh
See Alsorsync(1) , ssh(1) , ssh_config(5) , pssh(1) , prsync (1), pslurp(1) , pnuke(1) ,
Referenced Bypnuke(1) , pscp.pssh(1) , pslurp(1) , pssh(1) .
Jul 20, 2017 | www.linuxjournal.com
Anonymous on Fri, 11/08/2002 - 03:00.
Anonymous on Sun, 11/10/2002 - 03:00.The Subject, not the content, really brings back memories.
Imagine this, your tasked with complete control over the network in a multi-million dollar company. You've had some experience in the real world of network maintaince, but mostly you've learned from breaking things at home.
Time comes to implement (yes this was a startup company), a backup routine. You carefully consider the best way to do it and decide copying data to a holding disk before the tape run would be perfect in the situation, faster restore if the holding disk is still alive.
So off you go configuring all your servers for ssh pass through, and create the rsync scripts. Then before the trial run you think it would be a good idea to create a local backup of all the websites.
You logon to the web server, create a temp directory and start testing your newly advance rsync skills. After a couple of goes, you think your ready for the real thing, but you decide to run the test one more time.
Everything seems fine so you delete the temp directory. You pause for a second and your month drops open wider than it has ever opened before, and a feeling of terror overcomes you. You want to hide in a hole and hope you didn't see what you saw.
I RECURSIVELY DELETED ALL THE LIVE CORPORATE WEBSITES ON FRIDAY AFTERNOON AT 4PM!
This is why it's ALWAYS A GOOD IDEA to use Midnight Commander or something similar to delete directories!!
...Root for (5) years and never trashed a filesystem yet (knockwoody)...
Anonymous on Fri, 11/08/2002 - 03:00.
rsync with ssh as the transport mechanism works very well with my nightly LAN backups. I've found this page to be very helpful: http://www.mikerubel.org/computers/rsync_snapshots/
May 15, 2013 | stackoverflow.com
Glitches , May 15, 2013 at 18:06
I am trying to backup my file server to a remove file server using rsync. Rsync is not successfully resuming when a transfer is interrupted. I used the partial option but rsync doesn't find the file it already started because it renames it to a temporary file and when resumed it creates a new file and starts from beginning.Here is my command:
rsync -avztP -e "ssh -p 2222" /volume1/ myaccont@backup-server-1:/home/myaccount/backup/ --exclude "@spool" --exclude "@tmp"
When this command is ran, a backup file named OldDisk.dmg from my local machine get created on the remote machine as something like .OldDisk.dmg.SjDndj23 .
Now when the internet connection gets interrupted and I have to resume the transfer, I have to find where rsync left off by finding the temp file like .OldDisk.dmg.SjDndj23 and rename it to OldDisk.dmg so that it sees there already exists a file that it can resume.
How do I fix this so I don't have to manually intervene each time?
Richard Michael , Nov 6, 2013 at 4:26
TL;DR : Use--timeout=X
(X in seconds) to change the default rsync server timeout, not--inplace
.The issue is the rsync server processes (of which there are two, see
rsync --server ...
inps
output on the receiver) continue running, to wait for the rsync client to send data.If the rsync server processes do not receive data for a sufficient time, they will indeed timeout, self-terminate and cleanup by moving the temporary file to it's "proper" name (e.g., no temporary suffix). You'll then be able to resume.
If you don't want to wait for the long default timeout to cause the rsync server to self-terminate, then when your internet connection returns, log into the server and clean up the rsync server processes manually. However, you must politely terminate rsync -- otherwise, it will not move the partial file into place; but rather, delete it (and thus there is no file to resume). To politely ask rsync to terminate, do not
SIGKILL
(e.g.,-9
), butSIGTERM
(e.g.,pkill -TERM -x rsync
- only an example, you should take care to match only the rsync processes concerned with your client).Fortunately there is an easier way: use the
--timeout=X
(X in seconds) option; it is passed to the rsync server processes as well.For example, if you specify
rsync ... --timeout=15 ...
, both the client and server rsync processes will cleanly exit if they do not send/receive data in 15 seconds. On the server, this means moving the temporary file into position, ready for resuming.I'm not sure of the default timeout value of the various rsync processes will try to send/receive data before they die (it might vary with operating system). In my testing, the server rsync processes remain running longer than the local client. On a "dead" network connection, the client terminates with a broken pipe (e.g., no network socket) after about 30 seconds; you could experiment or review the source code. Meaning, you could try to "ride out" the bad internet connection for 15-20 seconds.
If you do not clean up the server rsync processes (or wait for them to die), but instead immediately launch another rsync client process, two additional server processes will launch (for the other end of your new client process). Specifically, the new rsync client will not re-use/reconnect to the existing rsync server processes. Thus, you'll have two temporary files (and four rsync server processes) -- though, only the newer, second temporary file has new data being written (received from your new rsync client process).
Interestingly, if you then clean up all rsync server processes (for example, stop your client which will stop the new rsync servers, then
SIGTERM
the older rsync servers, it appears to merge (assemble) all the partial files into the new proper named file. So, imagine a long running partial copy which dies (and you think you've "lost" all the copied data), and a short running re-launched rsync (oops!).. you can stop the second client,SIGTERM
the first servers, it will merge the data, and you can resume.Finally, a few short remarks:
- Don't use
--inplace
to workaround this. You will undoubtedly have other problems as a result,man rsync
for the details.- It's trivial, but
-t
in your rsync options is redundant, it is implied by-a
.- An already compressed disk image sent over rsync without compression might result in shorter transfer time (by avoiding double compression). However, I'm unsure of the compression techniques in both cases. I'd test it.
- As far as I understand
--checksum
/-c
, it won't help you in this case. It affects how rsync decides if it should transfer a file. Though, after a first rsync completes, you could run a second rsync with-c
to insist on checksums, to prevent the strange case that file size and modtime are the same on both sides, but bad data was written.JamesTheAwesomeDude , Dec 29, 2013 at 16:50
Just curious: wouldn'tSIGINT
(aka^C
) be 'politer' thanSIGTERM
? – JamesTheAwesomeDude Dec 29 '13 at 16:50Richard Michael , Dec 29, 2013 at 22:34
I didn't test how the server-side rsync handles SIGINT, so I'm not sure it will keep the partial file - you could check. Note that this doesn't have much to do withCtrl-c
; it happens that your terminal sendsSIGINT
to the foreground process when you pressCtrl-c
, but the server-side rsync has no controlling terminal. You must log in to the server and usekill
. The client-side rsync will not send a message to the server (for example, after the client receivesSIGINT
via your terminalCtrl-c
) - might be interesting though. As for anthropomorphizing, not sure what's "politer". :-) – Richard Michael Dec 29 '13 at 22:34d-b , Feb 3, 2015 at 8:48
I just tried this timeout argumentrsync -av --delete --progress --stats --human-readable --checksum --timeout=60 --partial-dir /tmp/rsync/ rsync://$remote:/ /src/
but then it timed out during the "receiving file list" phase (which in this case takes around 30 minutes). Setting the timeout to half an hour so kind of defers the purpose. Any workaround for this? – d-b Feb 3 '15 at 8:48Cees Timmerman , Sep 15, 2015 at 17:10
@user23122--checksum
reads all data when preparing the file list, which is great for many small files that change often, but should be done on-demand for large files. – Cees Timmerman Sep 15 '15 at 17:10
Sep 15, 2012 | unix.stackexchange.com
Tim , Sep 15, 2012 at 23:36
I usedrsync
to copy a large number of files, but my OS (Ubuntu) restarted unexpectedly.After reboot, I ran
rsync
again, but from the output on the terminal, I found thatrsync
still copied those already copied before. But I heard thatrsync
is able to find differences between source and destination, and therefore to just copy the differences. So I wonder in my case ifrsync
can resume what was left last time?Gilles , Sep 16, 2012 at 1:56
Yes, rsync won't copy again files that it's already copied. There are a few edge cases where its detection can fail. Did it copy all the already-copied files? What options did you use? What were the source and target filesystems? If you run rsync again after it's copied everything, does it copy again? – Gilles Sep 16 '12 at 1:56Tim , Sep 16, 2012 at 2:30
@Gilles: Thanks! (1) I think I saw rsync copied the same files again from its output on the terminal. (2) Options are same as in my other post, i.e.sudo rsync -azvv /home/path/folder1/ /home/path/folder2
. (3) Source and target are both NTFS, buy source is an external HDD, and target is an internal HDD. (3) It is now running and hasn't finished yet. – Tim Sep 16 '12 at 2:30jwbensley , Sep 16, 2012 at 16:15
There is also the --partial flag to resume partially transferred files (useful for large files) – jwbensley Sep 16 '12 at 16:15Tim , Sep 19, 2012 at 5:20
@Gilles: What are some "edge cases where its detection can fail"? – Tim Sep 19 '12 at 5:20Gilles , Sep 19, 2012 at 9:25
@Tim Off the top of my head, there's at least clock skew, and differences in time resolution (a common issue with FAT filesystems which store times in 2-second increments, the--modify-window
option helps with that). – Gilles Sep 19 '12 at 9:25DanielSmedegaardBuus , Nov 1, 2014 at 12:32
First of all, regarding the "resume" part of your question,--partial
just tells the receiving end to keep partially transferred files if the sending end disappears as though they were completely transferred.While transferring files, they are temporarily saved as hidden files in their target folders (e.g.
.TheFileYouAreSending.lRWzDC
), or a specifically chosen folder if you set the--partial-dir
switch. When a transfer fails and--partial
is not set, this hidden file will remain in the target folder under this cryptic name, but if--partial
is set, the file will be renamed to the actual target file name (in this case,TheFileYouAreSending
), even though the file isn't complete. The point is that you can later complete the transfer by running rsync again with either--append
or--append-verify
.So,
--partial
doesn't itself resume a failed or cancelled transfer. To resume it, you'll have to use one of the aforementioned flags on the next run. So, if you need to make sure that the target won't ever contain files that appear to be fine but are actually incomplete, you shouldn't use--partial
. Conversely, if you want to make sure you never leave behind stray failed files that are hidden in the target directory, and you know you'll be able to complete the transfer later,--partial
is there to help you.With regards to the
--append
switch mentioned above, this is the actual "resume" switch, and you can use it whether or not you're also using--partial
. Actually, when you're using--append
, no temporary files are ever created. Files are written directly to their targets. In this respect,--append
gives the same result as--partial
on a failed transfer, but without creating those hidden temporary files.So, to sum up, if you're moving large files and you want the option to resume a cancelled or failed rsync operation from the exact point that
rsync
stopped, you need to use the--append
or--append-verify
switch on the next attempt.As @Alex points out below, since version 3.0.0
rsync
now has a new option,--append-verify
, which behaves like--append
did before that switch existed. You probably always want the behaviour of--append-verify
, so check your version withrsync --version
. If you're on a Mac and not usingrsync
fromhomebrew
, you'll (at least up to and including El Capitan) have an older version and need to use--append
rather than--append-verify
. Why they didn't keep the behaviour on--append
and instead named the newcomer--append-no-verify
is a bit puzzling. Either way,--append
onrsync
before version 3 is the same as--append-verify
on the newer versions.
--append-verify
isn't dangerous: It will always read and compare the data on both ends and not just assume they're equal. It does this using checksums, so it's easy on the network, but it does require reading the shared amount of data on both ends of the wire before it can actually resume the transfer by appending to the target.Second of all, you said that you "heard that rsync is able to find differences between source and destination, and therefore to just copy the differences."
That's correct, and it's called delta transfer, but it's a different thing. To enable this, you add the
-c
, or--checksum
switch. Once this switch is used, rsync will examine files that exist on both ends of the wire. It does this in chunks, compares the checksums on both ends, and if they differ, it transfers just the differing parts of the file. But, as @Jonathan points out below, the comparison is only done when files are of the same size on both ends -- different sizes will cause rsync to upload the entire file, overwriting the target with the same name.This requires a bit of computation on both ends initially, but can be extremely efficient at reducing network load if for example you're frequently backing up very large files fixed-size files that often contain minor changes. Examples that come to mind are virtual hard drive image files used in virtual machines or iSCSI targets.
It is notable that if you use
--checksum
to transfer a batch of files that are completely new to the target system, rsync will still calculate their checksums on the source system before transferring them. Why I do not know :)So, in short:
If you're often using rsync to just "move stuff from A to B" and want the option to cancel that operation and later resume it, don't use
--checksum
, but do use--append-verify
.If you're using rsync to back up stuff often, using
--append-verify
probably won't do much for you, unless you're in the habit of sending large files that continuously grow in size but are rarely modified once written. As a bonus tip, if you're backing up to storage that supports snapshotting such asbtrfs
orzfs
, adding the--inplace
switch will help you reduce snapshot sizes since changed files aren't recreated but rather the changed blocks are written directly over the old ones. This switch is also useful if you want to avoid rsync creating copies of files on the target when only minor changes have occurred.When using
--append-verify
, rsync will behave just like it always does on all files that are the same size. If they differ in modification or other timestamps, it will overwrite the target with the source without scrutinizing those files further.--checksum
will compare the contents (checksums) of every file pair of identical name and size.UPDATED 2015-09-01 Changed to reflect points made by @Alex (thanks!)
UPDATED 2017-07-14 Changed to reflect points made by @Jonathan (thanks!)
Alex , Aug 28, 2015 at 3:49
According to the documentation--append
does not check the data, but--append-verify
does. Also, as @gaoithe points out in a comment below, the documentation claims--partial
does resume from previous files. – Alex Aug 28 '15 at 3:49DanielSmedegaardBuus , Sep 1, 2015 at 13:29
Thank you @Alex for the updates. Indeed, since 3.0.0,--append
no longer compares the source to the target file before appending. Quite important, really!--partial
does not itself resume a failed file transfer, but rather leaves it there for a subsequent--append(-verify)
to append to it. My answer was clearly misrepresenting this fact; I'll update it to include these points! Thanks a lot :) – DanielSmedegaardBuus Sep 1 '15 at 13:29Cees Timmerman , Sep 15, 2015 at 17:21
This says--partial
is enough. – Cees Timmerman Sep 15 '15 at 17:21DanielSmedegaardBuus , May 10, 2016 at 19:31
@CMCDragonkai Actually, check out Alexander's answer below about--partial-dir
-- looks like it's the perfect bullet for this. I may have missed something entirely ;) – DanielSmedegaardBuus May 10 '16 at 19:31Jonathan Y. , Jun 14, 2017 at 5:48
What's your level of confidence in the described behavior of--checksum
? According to theman
it has more to do with deciding which files to flag for transfer than with delta-transfer (which, presumably, isrsync
's default behavior). – Jonathan Y. Jun 14 '17 at 5:48
Jul 05, 2018 | unix.stackexchange.com
Tim ,Sep 15, 2012 at 23:36
I usedrsync
to copy a large number of files, but my OS (Ubuntu) restarted unexpectedly.After reboot, I ran
rsync
again, but from the output on the terminal, I found thatrsync
still copied those already copied before. But I heard thatrsync
is able to find differences between source and destination, and therefore to just copy the differences. So I wonder in my case ifrsync
can resume what was left last time?Gilles ,Sep 16, 2012 at 1:56
Yes, rsync won't copy again files that it's already copied. There are a few edge cases where its detection can fail. Did it copy all the already-copied files? What options did you use? What were the source and target filesystems? If you run rsync again after it's copied everything, does it copy again? – Gilles Sep 16 '12 at 1:56Tim ,Sep 16, 2012 at 2:30
@Gilles: Thanks! (1) I think I saw rsync copied the same files again from its output on the terminal. (2) Options are same as in my other post, i.e.sudo rsync -azvv /home/path/folder1/ /home/path/folder2
. (3) Source and target are both NTFS, buy source is an external HDD, and target is an internal HDD. (3) It is now running and hasn't finished yet. – Tim Sep 16 '12 at 2:30jwbensley ,Sep 16, 2012 at 16:15
There is also the --partial flag to resume partially transferred files (useful for large files) – jwbensley Sep 16 '12 at 16:15Tim ,Sep 19, 2012 at 5:20
@Gilles: What are some "edge cases where its detection can fail"? – Tim Sep 19 '12 at 5:20Gilles ,Sep 19, 2012 at 9:25
@Tim Off the top of my head, there's at least clock skew, and differences in time resolution (a common issue with FAT filesystems which store times in 2-second increments, the--modify-window
option helps with that). – Gilles Sep 19 '12 at 9:25DanielSmedegaardBuus ,Nov 1, 2014 at 12:32
First of all, regarding the "resume" part of your question,--partial
just tells the receiving end to keep partially transferred files if the sending end disappears as though they were completely transferred.While transferring files, they are temporarily saved as hidden files in their target folders (e.g.
.TheFileYouAreSending.lRWzDC
), or a specifically chosen folder if you set the--partial-dir
switch. When a transfer fails and--partial
is not set, this hidden file will remain in the target folder under this cryptic name, but if--partial
is set, the file will be renamed to the actual target file name (in this case,TheFileYouAreSending
), even though the file isn't complete. The point is that you can later complete the transfer by running rsync again with either--append
or--append-verify
.So,
--partial
doesn't itself resume a failed or cancelled transfer. To resume it, you'll have to use one of the aforementioned flags on the next run. So, if you need to make sure that the target won't ever contain files that appear to be fine but are actually incomplete, you shouldn't use--partial
. Conversely, if you want to make sure you never leave behind stray failed files that are hidden in the target directory, and you know you'll be able to complete the transfer later,--partial
is there to help you.With regards to the
--append
switch mentioned above, this is the actual "resume" switch, and you can use it whether or not you're also using--partial
. Actually, when you're using--append
, no temporary files are ever created. Files are written directly to their targets. In this respect,--append
gives the same result as--partial
on a failed transfer, but without creating those hidden temporary files.So, to sum up, if you're moving large files and you want the option to resume a cancelled or failed rsync operation from the exact point that
rsync
stopped, you need to use the--append
or--append-verify
switch on the next attempt.As @Alex points out below, since version 3.0.0
rsync
now has a new option,--append-verify
, which behaves like--append
did before that switch existed. You probably always want the behaviour of--append-verify
, so check your version withrsync --version
. If you're on a Mac and not usingrsync
fromhomebrew
, you'll (at least up to and including El Capitan) have an older version and need to use--append
rather than--append-verify
. Why they didn't keep the behaviour on--append
and instead named the newcomer--append-no-verify
is a bit puzzling. Either way,--append
onrsync
before version 3 is the same as--append-verify
on the newer versions.
--append-verify
isn't dangerous: It will always read and compare the data on both ends and not just assume they're equal. It does this using checksums, so it's easy on the network, but it does require reading the shared amount of data on both ends of the wire before it can actually resume the transfer by appending to the target.Second of all, you said that you "heard that rsync is able to find differences between source and destination, and therefore to just copy the differences."
That's correct, and it's called delta transfer, but it's a different thing. To enable this, you add the
-c
, or--checksum
switch. Once this switch is used, rsync will examine files that exist on both ends of the wire. It does this in chunks, compares the checksums on both ends, and if they differ, it transfers just the differing parts of the file. But, as @Jonathan points out below, the comparison is only done when files are of the same size on both ends -- different sizes will cause rsync to upload the entire file, overwriting the target with the same name.This requires a bit of computation on both ends initially, but can be extremely efficient at reducing network load if for example you're frequently backing up very large files fixed-size files that often contain minor changes. Examples that come to mind are virtual hard drive image files used in virtual machines or iSCSI targets.
It is notable that if you use
--checksum
to transfer a batch of files that are completely new to the target system, rsync will still calculate their checksums on the source system before transferring them. Why I do not know :)So, in short:
If you're often using rsync to just "move stuff from A to B" and want the option to cancel that operation and later resume it, don't use
--checksum
, but do use--append-verify
.If you're using rsync to back up stuff often, using
--append-verify
probably won't do much for you, unless you're in the habit of sending large files that continuously grow in size but are rarely modified once written. As a bonus tip, if you're backing up to storage that supports snapshotting such asbtrfs
orzfs
, adding the--inplace
switch will help you reduce snapshot sizes since changed files aren't recreated but rather the changed blocks are written directly over the old ones. This switch is also useful if you want to avoid rsync creating copies of files on the target when only minor changes have occurred.When using
--append-verify
, rsync will behave just like it always does on all files that are the same size. If they differ in modification or other timestamps, it will overwrite the target with the source without scrutinizing those files further.--checksum
will compare the contents (checksums) of every file pair of identical name and size.UPDATED 2015-09-01 Changed to reflect points made by @Alex (thanks!)
UPDATED 2017-07-14 Changed to reflect points made by @Jonathan (thanks!)
Alex ,Aug 28, 2015 at 3:49
According to the documentation--append
does not check the data, but--append-verify
does. Also, as @gaoithe points out in a comment below, the documentation claims--partial
does resume from previous files. – Alex Aug 28 '15 at 3:49DanielSmedegaardBuus ,Sep 1, 2015 at 13:29
Thank you @Alex for the updates. Indeed, since 3.0.0,--append
no longer compares the source to the target file before appending. Quite important, really!--partial
does not itself resume a failed file transfer, but rather leaves it there for a subsequent--append(-verify)
to append to it. My answer was clearly misrepresenting this fact; I'll update it to include these points! Thanks a lot :) – DanielSmedegaardBuus Sep 1 '15 at 13:29Cees Timmerman ,Sep 15, 2015 at 17:21
This says--partial
is enough. – Cees Timmerman Sep 15 '15 at 17:21DanielSmedegaardBuus ,May 10, 2016 at 19:31
@CMCDragonkai Actually, check out Alexander's answer below about--partial-dir
-- looks like it's the perfect bullet for this. I may have missed something entirely ;) – DanielSmedegaardBuus May 10 '16 at 19:31Jonathan Y. ,Jun 14, 2017 at 5:48
What's your level of confidence in the described behavior of--checksum
? According to theman
it has more to do with deciding which files to flag for transfer than with delta-transfer (which, presumably, isrsync
's default behavior). – Jonathan Y. Jun 14 '17 at 5:48Alexander O'Mara ,Jan 3, 2016 at 6:34
TL;DR:Just specify a partial directory as the rsync man pages recommends:
--partial-dir=.rsync-partial
Longer explanation:There is actually a built-in feature for doing this using the
Excerpt from the rsync man pages:--partial-dir
option, which has several advantages over the--partial
and--append-verify
/--append
alternative.--partial-dir=DIR A better way to keep partial files than the --partial option is to specify a DIR that will be used to hold the partial data (instead of writing it out to the destination file). On the next transfer, rsync will use a file found in this dir as data to speed up the resumption of the transfer and then delete it after it has served its purpose. Note that if --whole-file is specified (or implied), any par- tial-dir file that is found for a file that is being updated will simply be removed (since rsync is sending files without using rsync's delta-transfer algorithm). Rsync will create the DIR if it is missing (just the last dir -- not the whole path). This makes it easy to use a relative path (such as "--partial-dir=.rsync-partial") to have rsync create the partial-directory in the destination file's directory when needed, and then remove it again when the partial file is deleted. If the partial-dir value is not an absolute path, rsync will add an exclude rule at the end of all your existing excludes. This will prevent the sending of any partial-dir files that may exist on the sending side, and will also prevent the untimely deletion of partial-dir items on the receiving side. An example: the above --partial-dir option would add the equivalent of "-f '-p .rsync-partial/'" at the end of any other filter rules.By default, rsync uses a random temporary file name which gets deleted when a transfer fails. As mentioned, using
--partial
you can make rsync keep the incomplete file as if it were successfully transferred , so that it is possible to later append to it using the--append-verify
/--append
options. However there are several reasons this is sub-optimal.
- Your backup files may not be complete, and without checking the remote file which must still be unaltered, there's no way to know.
- If you are attempting to use
--backup
and--backup-dir
, you've just added a new version of this file that never even exited before to your version history.However if we use
--partial-dir
, rsync will preserve the temporary partial file, and resume downloading using that partial file next time you run it, and we do not suffer from the above issues.trs ,Apr 7, 2017 at 0:00
This is really the answer. Hey everyone, LOOK HERE!! – trs Apr 7 '17 at 0:00JKOlaf ,Jun 28, 2017 at 0:11
I agree this is a much more concise answer to the question. the TL;DR: is perfect and for those that need more can read the longer bit. Strong work. – JKOlaf Jun 28 '17 at 0:11N2O ,Jul 29, 2014 at 18:24
You may want to add the-P
option to your command.From the
man
page:--partial By default, rsync will delete any partially transferred file if the transfer is interrupted. In some circumstances it is more desirable to keep partially transferred files. Using the --partial option tells rsync to keep the partial file which should make a subsequent transfer of the rest of the file much faster. -P The -P option is equivalent to --partial --progress. Its pur- pose is to make it much easier to specify these two options for a long transfer that may be interrupted.So instead of:
sudo rsync -azvv /home/path/folder1/ /home/path/folder2Do:
sudo rsync -azvvP /home/path/folder1/ /home/path/folder2Of course, if you don't want the progress updates, you can just use
--partial
, i.e.:sudo rsync --partial -azvv /home/path/folder1/ /home/path/folder2gaoithe ,Aug 19, 2015 at 11:29
@Flimm not quite correct. If there is an interruption (network or receiving side) then when using --partial the partial file is kept AND it is used when rsync is resumed. From the manpage: "Using the --partial option tells rsync to keep the partial file which should <b>make a subsequent transfer of the rest of the file much faster</b>." – gaoithe Aug 19 '15 at 11:29DanielSmedegaardBuus ,Sep 1, 2015 at 14:11
@Flimm and @gaoithe, my answer wasn't quite accurate, and definitely not up-to-date. I've updated it to reflect version 3 + ofrsync
. It's important to stress, though, that--partial
does not itself resume a failed transfer. See my answer for details :) – DanielSmedegaardBuus Sep 1 '15 at 14:11guettli ,Nov 18, 2015 at 12:28
@DanielSmedegaardBuus I tried it and the-P
is enough in my case. Versions: client has 3.1.0 and server has 3.1.1. I interrupted the transfer of a single large file with ctrl-c. I guess I am missing something. – guettli Nov 18 '15 at 12:28Yadunandana ,Sep 16, 2012 at 16:07
I think you are forcibly calling thersync
and hence all data is getting downloaded when you recall it again. use--progress
option to copy only those files which are not copied and--delete
option to delete any files if already copied and now it does not exist in source folder...rsync -avz --progress --delete -e /home/path/folder1/ /home/path/folder2If you are using ssh to login to other system and copy the files,
rsync -avz --progress --delete -e "ssh -o UserKnownHostsFile=/dev/null -o \ StrictHostKeyChecking=no" /home/path/folder1/ /home/path/folder2let me know if there is any mistake in my understanding of this concept...
Fabien ,Jun 14, 2013 at 12:12
Can you please edit your answer and explain what your special ssh call does, and why you advice to do it? – Fabien Jun 14 '13 at 12:12DanielSmedegaardBuus ,Dec 7, 2014 at 0:12
@Fabien He tells rsync to set two ssh options (rsync uses ssh to connect). The second one tells ssh to not prompt for confirmation if the host he's connecting to isn't already known (by existing in the "known hosts" file). The first one tells ssh to not use the default known hosts file (which would be ~/.ssh/known_hosts). He uses /dev/null instead, which is of course always empty, and as ssh would then not find the host in there, it would normally prompt for confirmation, hence option two. Upon connecting, ssh writes the now known host to /dev/null, effectively forgetting it instantly :) – DanielSmedegaardBuus Dec 7 '14 at 0:12DanielSmedegaardBuus ,Dec 7, 2014 at 0:23
...but you were probably wondering what effect, if any, it has on the rsync operation itself. The answer is none. It only serves to not have the host you're connecting to added to your SSH known hosts file. Perhaps he's a sysadmin often connecting to a great number of new servers, temporary systems or whatnot. I don't know :) – DanielSmedegaardBuus Dec 7 '14 at 0:23moi ,May 10, 2016 at 13:49
"use --progress option to copy only those files which are not copied" What? – moi May 10 '16 at 13:49Paul d'Aoust ,Nov 17, 2016 at 22:39
There are a couple errors here; one is very serious:--delete
will delete files in the destination that don't exist in the source. The less serious one is that--progress
doesn't modify how things are copied; it just gives you a progress report on each file as it copies. (I fixed the serious error; replaced it with--remove-source-files
.) – Paul d'Aoust Nov 17 '16 at 22:39
Jan 22, 2017 | nac.uci.edu
v1.67 (Mac Beta) Table of Contents
1. Download
If you already know you want it, get it here: parsync+utils.tar.gz (contains parsync plus the kdirstat-cache-writer , stats , and scut utilities below) Extract it into a dir on your $PATH and after verifying the other dependencies below, give it a shot.
While parsync is developed for and test on Linux, the latest version of parsync has been modified to (mostly) work on the Mac (tested on OSX 10.9.5). A number of the Linux-specific dependencies have been removed and there are a number of Mac-specific work arounds.
Thanks to Phil Reese < [email protected] > for the code mods needed to get it started. It's the same package and instructions for both platforms.
2. Dependencies
parsync requires the following utilities to work:
non-default Perl utility: URI::Escape qw(uri_escape)
- stats - self-writ Perl utility for providing descriptive stats on STDIN
- scut - self-writ Perl utility like cut that allows regex split tokens
- kdirstat-cache-writer (included in the tarball mentioned above), requires a
sudo yum install perl-URI # CentOS-like sudo apt-get install liburi-perl # Debian-likeparsync needs to be installed only on the SOURCE end of the transfer and uses whatever rsync is available on the TARGET. It uses a number of Linux- specific utilities so if you're transferring between Linux and a FreeBSD host, install parsync on the Linux side. In fact, as currently written, it will only PUSH data to remote targets ; it will not pull data as rsync itself can do. This will probably in the near future. 3. Overview rsync is a fabulous data mover. Possibly more bytes have been moved (or have been prevented from being moved) by rsync than by any other application. So what's not to love? For transferring large, deep file trees, rsync will pause while it generates lists of files to process. Since Version 3, it does this pretty fast, but on sluggish filesystems, it can take hours or even days before it will start to actually exchange rsync data. Second, due to various bottlenecks, rsync will tend to use less than the available bandwidth on high speed networks. Starting multiple instances of rsync can improve this significantly. However, on such transfers, it is also easy to overload the available bandwidth, so it would be nice to both limit the bandwidth used if necessary and also to limit the load on the system. parsync tries to satisfy all these conditions and more by:
- using the kdir-cache-writer utility from the beautiful kdirstat directory browser which can produce lists of files very rapidly
- allowing re-use of the cache files so generated.
- doing crude loadbalancing of the number of active rsyncs, suspending and un-suspending the processes as necessary.
- using rsync's own bandwidth limiter (--bwlimit) to throttle the total bandwidth.
- using rsync's own vast option selection is available as a pass-thru (tho limited to those compatible with the --files-from option).
Beyond this introduction, parsync's internal help is about all you'll need to figure out how to use it; below is what you'll see when you type parsync -h . There are still edge cases where parsync will fail or behave oddly, especially with small data transfers, so I'd be happy to hear of such misbehavior or suggestions to improve it. Download the complete tarball of parsync, plus the required utilities here: parsync+utils.tar.gz Unpack it, move the contents to a dir on your $PATH , chmod it executable, and try it out.
Only use for LARGE data transfers The main use case for parsync is really only very large data transfers thru fairly fast network connections (>1Gb/s). Below this speed, a single rsync can saturate the connection, so there's little reason to use parsync and in fact the overhead of testing the existence of and starting more rsyncs tends to worsen its performance on small transfers to slightly less than rsync alone. parsync --helpor justparsyncBelow is what you should see:4. parsync help
parsync version 1.67 (Mac compatibility beta) Jan 22, 2017 by Harry Mangalam <[email protected]> || <[email protected]> parsync is a Perl script that wraps Andrew Tridgell's miraculous 'rsync' to provide some load balancing and parallel operation across network connections to increase the amount of bandwidth it can use. parsync is primarily tested on Linux, but (mostly) works on MaccOSX as well. parsync needs to be installed only on the SOURCE end of the transfer and only works in local SOURCE -> remote TARGET mode (it won't allow remote local SOURCE <- remote TARGET, emitting an error and exiting if attempted). It uses whatever rsync is available on the TARGET. It uses a number of Linux-specific utilities so if you're transferring between Linux and a FreeBSD host, install parsync on the Linux side. The only native rsync option that parsync uses is '-a' (archive) & '-s' (respect bizarro characters in filenames). If you need more, then it's up to you to provide them via '--rsyncopts'. parsync checks to see if the current system load is too heavy and tries to throttle the rsyncs during the run by monitoring and suspending / continuing them as needed. It uses the very efficient (also Perl-based) kdirstat-cache-writer from kdirstat to generate lists of files which are summed and then crudely divided into NP jobs by size. It appropriates rsync's bandwidth throttle mechanism, using '--maxbw' as a passthru to rsync's 'bwlimit' option, but divides it by NP so as to keep the total bw the same as the stated limit. It monitors and shows network bandwidth, but can't change the bw allocation mid-job. It can only suspend rsyncs until the load decreases below the cutoff. If you suspend parsync (^Z), all rsync children will suspend as well, regardless of current state. Unless changed by '--interface', it tried to figure out how to set the interface to monitor. The transfer will use whatever interface routing provides, normally set by the name of the target. It can also be used for non-host-based transfers (between mounted filesystems) but the network bandwidth continues to be (usually pointlessly) shown. [[NB: Between mounted filesystems, parsync sometimes works very poorly for reasons still mysterious. In such cases (monitor with 'ifstat'), use 'cp' or 'tnc' (https://goo.gl/5FiSxR) for the initial data movement and a single rsync to finalize. I believe the multiple rsync chatter is interfering with the transfer.]] It only works on dirs and files that originate from the current dir (or specified via "--rootdir"). You cannot include dirs and files from discontinuous or higher-level dirs. ** the ~/.parsync files ** The ~/.parsync dir contains the cache (*.gz), the chunk files (kds*), and the time-stamped log files. The cache files can be re-used with '--reusecache' (which will re-use ALL the cache and chunk files. The log files are datestamped and are NOT overwritten. ** Odd characters in names ** parsync will sometimes refuse to transfer some oddly named files, altho recent versions of rsync allow the '-s' flag (now a parsync default) which tries to respect names with spaces and properly escaped shell characters. Filenames with embedded newlines, DOS EOLs, and other odd chars will be recorded in the log files in the ~/.parsync dir. ** Because of the crude way that files are chunked, NP may be adjusted slightly to match the file chunks. ie '--NP 8' -> '--NP 7'. If so, a warning will be issued and the rest of the transfer will be automatically adjusted. OPTIONS ======= [i] = integer number [f] = floating point number [s] = "quoted string" ( ) = the default if any --NP [i] (sqrt(#CPUs)) ............... number of rsync processes to start optimal NP depends on many vars. Try the default and incr as needed --startdir [s] (`pwd`) .. the directory it works relative to. If you omit it, the default is the CURRENT dir. You DO have to specify target dirs. See the examples below. --maxbw [i] (unlimited) .......... in KB/s max bandwidth to use (--bwlimit passthru to rsync). maxbw is the total BW to be used, NOT per rsync. --maxload [f] (NP+2) ........ max total system load - if sysload > maxload, sleeps an rsync proc for 10s --checkperiod [i] (5) .......... sets the period in seconds between updates --rsyncopts [s] ... options passed to rsync as a quoted string (CAREFUL!) this opt triggers a pause before executing to verify the command. --interface [s] ............. network interface to /monitor/, not nec use. default: `/sbin/route -n | grep "^0.0.0.0" | rev | cut -d' ' -f1 | rev` above works on most simple hosts, but complex routes will confuse it. --reusecache .......... don't re-read the dirs; re-use the existing caches --email [s] ..................... email address to send completion message (requires working mail system on host) --barefiles ..... set to allow rsync of individual files, as oppo to dirs --nowait ................ for scripting, sleep for a few s instead of wait --version ................................. dumps version string and exits --help ......................................................... this help Examples ======== -- Good example 1 -- % parsync --maxload=5.5 --NP=4 --startdir='/home/hjm' dir1 dir2 dir3 hjm@remotehost:~/backups where = "--startdir='/home/hjm'" sets the working dir of this operation to '/home/hjm' and dir1 dir2 dir3 are subdirs from '/home/hjm' = the target "hjm@remotehost:~/backups" is the same target rsync would use = "--NP=4" forks 4 instances of rsync = -"-maxload=5.5" will start suspending rsync instances when the 5m system load gets to 5.5 and then unsuspending them when it goes below it. It uses 4 instances to rsync dir1 dir2 dir3 to hjm@remotehost:~/backups -- Good example 2 -- % parsync --rsyncopts="--ignore-existing" --reusecache --NP=3 --barefiles *.txt /mount/backups/txt where = "--rsyncopts='--ignore-existing'" is an option passed thru to rsync telling it not to disturb any existing files in the target directory. = "--reusecache" indicates that the filecache shouldn't be re-generated, uses the previous filecache in ~/.parsync = "--NP=3" for 3 copies of rsync (with no "--maxload", the default is 4) = "--barefiles" indicates that it's OK to transfer barefiles instead of recursing thru dirs. = "/mount/backups/txt" is the target - a local disk mount instead of a network host. It uses 3 instances to rsync *.txt from the current dir to "/mount/backups/txt". -- Error Example 1 -- % pwd /home/hjm # executing parsync from here % parsync --NP4 --compress /usr/local /media/backupdisk why this is an error: = '--NP4' is not an option (parsync will say "Unknown option: np4") It should be '--NP=4' = if you were trying to rsync '/usr/local' to '/media/backupdisk', it will fail since there is no /home/hjm/usr/local dir to use as a source. This will be shown in the log files in ~/.parsync/rsync-logfile-<datestamp>_# as a spew of "No such file or directory (2)" errors = the '--compress' is a native rsync option, not a native parsync option. You have to pass it to rsync with "--rsyncopts='--compress'" The correct version of the above command is: % parsync --NP=4 --rsyncopts='--compress' --startdir=/usr local /media/backupdisk -- Error Example 2 -- % parsync --start-dir /home/hjm mooslocal [email protected]:/usr/local why this is an error: = this command is trying to PULL data from a remote SOURCE to a local TARGET. parsync doesn't support that kind of operation yet. The correct version of the above command is: # ssh to hjm@moo, install parsync, then: % parsync --startdir=/usr local hjm@remote:/home/hjm/mooslocal
Jun 02, 2018 | unix.stackexchange.com
up vote 7 down vote favorite 4
Mandar Shinde ,Mar 13, 2015 at 6:51
I have been using arsync
script to synchronize data at one host with the data at another host. The data has numerous small-sized files that contribute to almost 1.2TB.In order to sync those files, I have been using
rsync
command as follows:rsync -avzm --stats --human-readable --include-from proj.lst /data/projects REMOTEHOST:/data/The contents of proj.lst are as follows:
+ proj1 + proj1/* + proj1/*/* + proj1/*/*/*.tar + proj1/*/*/*.pdf + proj2 + proj2/* + proj2/*/* + proj2/*/*/*.tar + proj2/*/*/*.pdf ... ... ... - *As a test, I picked up two of those projects (8.5GB of data) and I executed the command above. Being a sequential process, it tool 14 minutes 58 seconds to complete. So, for 1.2TB of data it would take several hours.
If I would could multiple
rsync
processes in parallel (using&
,xargs
orparallel
), it would save my time.I tried with below command with
parallel
(aftercd
ing to source directory) and it took 12 minutes 37 seconds to execute:parallel --will-cite -j 5 rsync -avzm --stats --human-readable {} REMOTEHOST:/data/ ::: .This should have taken 5 times less time, but it didn't. I think, I'm going wrong somewhere.
How can I run multiple
rsync
processes in order to reduce the execution time?Ole Tange ,Mar 13, 2015 at 7:25
Are you limited by network bandwidth? Disk iops? Disk bandwidth? – Ole Tange Mar 13 '15 at 7:25Mandar Shinde ,Mar 13, 2015 at 7:32
If possible, we would want to use 50% of total bandwidth. But, parallelising multiplersync
s is our first priority. – Mandar Shinde Mar 13 '15 at 7:32Ole Tange ,Mar 13, 2015 at 7:41
Can you let us know your: Network bandwidth, disk iops, disk bandwidth, and the bandwidth actually used? – Ole Tange Mar 13 '15 at 7:41Mandar Shinde ,Mar 13, 2015 at 7:47
In fact, I do not know about above parameters. For the time being, we can neglect the optimization part. Multiplersync
s in parallel is the primary focus now. – Mandar Shinde Mar 13 '15 at 7:47Mandar Shinde ,Apr 11, 2015 at 13:53
Following steps did the job for me:
- Run the
rsync --dry-run
first in order to get the list of files those would be affected.
rsync -avzm --stats --safe-links --ignore-existing --dry-run --human-readable /data/projects REMOTE-HOST:/data/ > /tmp/transfer.log
- I fed the output of
cat transfer.log
toparallel
in order to run 5rsync
s in parallel, as follows:
cat /tmp/transfer.log | parallel --will-cite -j 5 rsync -avzm --relative --stats --safe-links --ignore-existing --human-readable {} REMOTE-HOST:/data/ > result.log
Here,
--relative
option ( link ) ensured that the directory structure for the affected files, at the source and destination, remains the same (inside/data/
directory), so the command must be run in the source folder (in example,/data/projects
).Sandip Bhattacharya ,Nov 17, 2016 at 21:22
That would do an rsync per file. It would probably be more efficient to split up the whole file list usingsplit
and feed those filenames to parallel. Then use rsync's--files-from
to get the filenames out of each file and sync them. rm backups.* split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync --progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/ – Sandip Bhattacharya Nov 17 '16 at 21:22Mike D ,Sep 19, 2017 at 16:42
How does the second rsync command handle the lines in result.log that are not files? i.e.receiving file list ... done
created directory /data/
. – Mike D Sep 19 '17 at 16:42Cheetah ,Oct 12, 2017 at 5:31
On newer versions of rsync (3.1.0+), you can use--info=name
in place of-v
, and you'll get just the names of the files and directories. You may want to use --protect-args to the 'inner' transferring rsync too if any files might have spaces or shell metacharacters in them. – Cheetah Oct 12 '17 at 5:31Mikhail ,Apr 10, 2017 at 3:28
I would strongly discourage anybody from using the accepted answer, a better solution is to crawl the top level directory and launch a proportional number of rync operations.I have a large zfs volume and my source was was a cifs mount. Both are linked with 10G, and in some benchmarks can saturate the link. Performance was evaluated using
zpool iostat 1
.The source drive was mounted like:
mount -t cifs -o username=,password= //static_ip/70tb /mnt/Datahoarder_Mount/ -o vers=3.0Using a single
rsync
process:rsync -h -v -r -P -t /mnt/Datahoarder_Mount/ /StoragePodthe io meter reads:
StoragePod 30.0T 144T 0 1.61K 0 130M StoragePod 30.0T 144T 0 1.61K 0 130M StoragePod 30.0T 144T 0 1.62K 0 130MThis in synthetic benchmarks (crystal disk), performance for sequential write approaches 900 MB/s which means the link is saturated. 130MB/s is not very good, and the difference between waiting a weekend and two weeks.
So, I built the file list and tried to run the sync again (I have a 64 core machine):
cat /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount.log | parallel --will-cite -j 16 rsync -avzm --relative --stats --safe-links --size-only --human-readable {} /StoragePod/ > /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount_result.logand it had the same performance!
StoragePod 29.9T 144T 0 1.63K 0 130M StoragePod 29.9T 144T 0 1.62K 0 130M StoragePod 29.9T 144T 0 1.56K 0 129MAs an alternative I simply ran rsync on the root folders:
rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/Marcello_zinc_bone /StoragePod/Marcello_zinc_bone rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/fibroblast_growth /StoragePod/fibroblast_growth rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/QDIC /StoragePod/QDIC rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/sexy_dps_cell /StoragePod/sexy_dps_cellThis actually boosted performance:
StoragePod 30.1T 144T 13 3.66K 112K 343M StoragePod 30.1T 144T 24 5.11K 184K 469M StoragePod 30.1T 144T 25 4.30K 196K 373MIn conclusion, as @Sandip Bhattacharya brought up, write a small script to get the directories and parallel that. Alternatively, pass a file list to rsync. But don't create new instances for each file.
Julien Palard ,May 25, 2016 at 14:15
I personally use this simple one:ls -1 | parallel rsync -a {} /destination/directory/Which only is usefull when you have more than a few non-near-empty directories, else you'll end up having almost every
rsync
terminating and the last one doing all the job alone.Ole Tange ,Mar 13, 2015 at 7:25
A tested way to do the parallelized rsync is: http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Parallelizing-rsyncrsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.
The following will start one rsync per big file in src-dir to dest-dir on the server fooserver:
cd src-dir; find . -type f -size +100000 | \ parallel -v ssh fooserver mkdir -p /dest-dir/{//}\; \ rsync -s -Havessh {} fooserver:/dest-dir/{}The directories created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:
rsync -Havessh src-dir/ fooserver:/dest-dir/If you are unable to push data, but need to pull them and the files are called digits.png (e.g. 000000.png) you might be able to do:
seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/Mandar Shinde ,Mar 13, 2015 at 7:34
Any other alternative in order to avoidfind
? – Mandar Shinde Mar 13 '15 at 7:34Ole Tange ,Mar 17, 2015 at 9:20
Limit the -maxdepth of find. – Ole Tange Mar 17 '15 at 9:20Mandar Shinde ,Apr 10, 2015 at 3:47
If I use--dry-run
option inrsync
, I would have a list of files that would be transferred. Can I provide that file list toparallel
in order to parallelise the process? – Mandar Shinde Apr 10 '15 at 3:47Ole Tange ,Apr 10, 2015 at 5:51
cat files | parallel -v ssh fooserver mkdir -p /dest-dir/{//}\; rsync -s -Havessh {} fooserver:/dest-dir/{} – Ole Tange Apr 10 '15 at 5:51Mandar Shinde ,Apr 10, 2015 at 9:49
Can you please explain themkdir -p /dest-dir/{//}\;
part? Especially the{//}
thing is a bit confusing. – Mandar Shinde Apr 10 '15 at 9:49,
For multi destination syncs, I am usingparallel rsync -avi /path/to/source ::: host1: host2: host3:Hint: All ssh connections are established with public keys in
~/.ssh/authorized_keys
Jun 02, 2018 | www.gnu.org
rsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.
The following will start one rsync per big file in src-dir to dest-dir on the server fooserver :
cd src-dir; find . -type f -size +100000 | \ parallel -v ssh fooserver mkdir -p /dest-dir/{//}\; \ rsync -s -Havessh {} fooserver:/dest-dir/{}The dirs created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:
rsync -Havessh src-dir/ fooserver:/dest-dir/If you are unable to push data, but need to pull them and the files are called digits.png (e.g. 000000.png) you might be able to do:
seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/
Apr 22, 2018 | opensource.com
Necessity is frequently the mother of invention. I knew very little about BASH scripting but that was about to change rapidly. Working with the existing script and using online help forums, search engines, and some printed documentation, I setup Linux network attached storage computer running on Fedora Core. I learned how to create an SSH keypair and configure that along with rsync to move the backup file from the email server to the storage server. That worked well for a few days until I noticed that the storage servers disk space was rapidly disappearing. What was I going to do?
That's when I learned more about Bash scripting. I modified my rsync command to delete backed up files older than ten days. In both cases I learned that a little knowledge can be a dangerous thing but in each case my experience and confidence as Linux user and system administrator grew and due to that I functioned as a resource for other. On the plus side, we soon realized that the disk to disk backup system was superior to tape when it came to restoring email files. In the long run it was a win but there was a lot of uncertainty and anxiety along the way.
Dec 09, 2017 | stackoverflow.com
ash, May 11, 2015 at 20:05
There is a flag--files-from
that does exactly what you want. Fromman rsync
:--files-from=FILEUsing this option allows you to specify the exact list of files to transfer (as read from the specified FILE or - for standard input). It also tweaks the default behavior of rsync to make transferring just the specified files and directories easier:
- The --relative (-R) option is implied, which preserves the path information that is specified for each item in the file (use --no-relative or --no-R if you want to turn that off).
- The --dirs (-d) option is implied, which will create directories specified in the list on the destination rather than noisily skipping them (use --no-dirs or --no-d if you want to turn that off).
- The --archive (-a) option's behavior does not imply --recursive (-r), so specify it explicitly, if you want it.
- These side-effects change the default state of rsync, so the position of the --files-from option on the command-line has no bearing on how other options are parsed (e.g. -a works the same before or after --files-from, as does --no-R and all other options).
The filenames that are read from the FILE are all relative to the source dir -- any leading slashes are removed and no ".." references are allowed to go higher than the source dir. For example, take this command:
rsync -a --files-from=/tmp/foo /usr remote:/backupIf /tmp/foo contains the string "bin" (or even "/bin"), the /usr/bin directory will be created as /backup/bin on the remote host. If it contains "bin/" (note the trailing slash), the immediate contents of the directory would also be sent (without needing to be explicitly mentioned in the file -- this began in version 2.6.4). In both cases, if the -r option was enabled, that dir's entire hierarchy would also be transferred (keep in mind that -r needs to be specified explicitly with --files-from, since it is not implied by -a). Also note that the effect of the (enabled by default) --relative option is to duplicate only the path info that is read from the file -- it does not force the duplication of the source-spec path (/usr in this case).
In addition, the --files-from file can be read from the remote host instead of the local host if you specify a "host:" in front of the file (the host must match one end of the transfer). As a short-cut, you can specify just a prefix of ":" to mean "use the remote end of the transfer". For example:
rsync -a --files-from=:/path/file-list src:/ /tmp/copyThis would copy all the files specified in the /path/file-list file that was located on the remote "src" host.
If the --iconv and --protect-args options are specified and the --files-from filenames are being sent from one host to another, the filenames will be translated from the sending host's charset to the receiving host's charset.
NOTE: sorting the list of files in the --files-from input helps rsync to be more efficient, as it will avoid re-visiting the path elements that are shared between adjacent entries. If the input is not sorted, some path elements (implied directories) may end up being scanned multiple times, and rsync will eventually unduplicate them after they get turned into file-list elements.
Nicolas Mattia, Feb 11, 2016 at 11:06
Note that you still have to specify the directory where the files listed are located, for instance:rsync -av --files-from=file-list . target/
for copying files from the current dir. – Nicolas Mattia Feb 11 '16 at 11:06ash, Feb 12, 2016 at 2:25
Yes, and to reiterate: The filenames that are read from the FILE are all relative to the source dir . – ash Feb 12 '16 at 2:25Michael ,Nov 2, 2016 at 0:09
if the files-from file has anything starting with..
rsync appears to ignore the..
giving me an error likersync: link_stat "/home/michael/test/subdir/test.txt" failed: No such file or directory
(in this case running from the "test" dir and trying to specify "../subdir/test.txt" which does exist. – Michael Nov 2 '16 at 0:09xxx,
--files-from=
parameter needs trailing slash if you want to keep the absolute path intact. So your command would become something like below:rsync -av --files-from=/path/to/file / /tmp/This could be done like there are a large number of files and you want to copy all files to x path. So you would find the files and throw output to a file like below:
find /var/* -name *.log > file
Aug 28, 2017 | serverfault.com
I have one older ubuntu server, and one newer debian server and I am migrating data from the old one to the new one. I want to use rsync to transfer data across to make final migration easier and quicker than the equivalent tar/scp/untar process.
As an example, I want to sync the home folders one at a time to the new server. This requires root access at both ends as not all files at the source side are world readable and the destination has to be written with correct permissions into /home. I can't figure out how to give rsync root access on both sides.
I've seen a few related questions, but none quite match what I'm trying to do.
I have sudo set up and working on both servers. ubuntu ssh debian rsync root
share improve this question asked Apr 28 '10 at 9:18 Tim Abell 732 20 add a comment | 3 Answers active oldest votes
up vote down vote accepted Actually you do NOT need to allow root authentication via SSH to run rsync as Antoine suggests. The transport and system authentication can be done entirely over user accounts as long as you can run rsync with sudo on both ends for reading and writing the files. As a user on your destination server you can suck the data from your source server like this:
sudo rsync -aPe ssh --rsync-path='sudo rsync' boron:/home/fred /home/The user you run as on both servers will need passwordless* sudo access to the rsync binary, but you do NOT need to enable ssh login as root anywhere. If the user you are using doesn't match on the other end, you can add user@boron: to specify a different remote user.
Good luck.
*or you will need to have entered the password manually inside the timeout window.
share improve this answer edited Jun 30 '10 at 13:51 answered Apr 28 '10 at 22:06 Caleb 9,089 27 43 add a comment |
1 Although this is an old question I'd like to add word of CAUTION to this accepted answer. From my understanding allowing passwordless "sudo rsync" is equivalent to open the root account to remote login. This is because with this it is very easy to gain full root access, e.g. because all system files can be downloaded, modified and replaced without a password. – Ascurion Jan 8 '16 at 16:30
up vote down vote If your data is not highly sensitive, you could use tar and socat. In my experience this is often faster as rsync over ssh. You need socat or netcat on both sides.
On the target host, go to the directory where you would like to put your data, after that run: socat TCP-LISTEN:4444 - | tar xzf -
If the target host is listening, start it on the source like: tar czf - /home/fred /home/ | socat - TCP:ip-of-remote-server:4444
For this setup you'll need a reliably connection between the 2 servers.
share improve this answer answered Apr 28 '10 at 21:20 Jeroen Moors add a comment |
Good point. In a trusted environment, you'll pick up a lot of speed by not encrypting. It might not matter on small files, but with GBs of data it will. – pboin May 18 '10 at 10:53
up vote down vote Ok, i've pieced together all the clues to get something that works for me. Lets call the servers "src" & "dst".
Set up a key pair for root on the destination server, and copy the public key to the source server:
dest $ sudo -i dest # ssh-keygen dest # exit dest $ scp /root/id_rsa.pub src:Add the public key to root's authorized keys on the source server
src $ sudo -i src # cp /home/tim/id_rsa.pub .ssh/authorized_keysBack on the destination server, pull the data across with rsync:
dest $ sudo -i dest # rsync -aP src:/home/fred /home/
Feb 06, 2014 | www.cyberciti.biz
November 9, 2012 February 6, 2014 in Categories Commands , File system , Linux , UNIX last updated February 6, 2014How do I use the rsync tool to copy only the hidden files and directory (such as ~/.ssh/, ~/.foo, and so on) from /home/jobs directory to the /mnt/usb directory under Unix like operating system?
The rsync program is used for synchronizing files over a network or local disks. To view or display only hidden files with ls command:
ls -ld ~/.??*
OR
ls -ld ~/.[^.]*
Sample outputs:
rsync not synchronizing all hidden .dot files?
Fig:01 ls command to view only hidden files In this example, you used the pattern .[^.]* or .??* to select and display only hidden files using ls command . You can use the same pattern with any Unix command including rsync command. The syntax is as follows to copy hidden files with rsync:
rsync -av /path/to/dir/.??* /path/to/dest rsync -avzP /path/to/dir/.??* /mnt/usb rsync -avzP $HOME/.??* [email protected]:/path/to/backup/users/u/user1 rsync -avzP ~/.[^.]* [email protected]:/path/to/backup/users/u/user1rsync -av /path/to/dir/.??* /path/to/dest rsync -avzP /path/to/dir/.??* /mnt/usb rsync -avzP $HOME/.??* [email protected]:/path/to/backup/users/u/user1 rsync -avzP ~/.[^.]* [email protected]:/path/to/backup/users/u/user1
In this example, copy all hidden files from my home directory to /mnt/test:
rsync -avzP ~/.[^.]* /mnt/testrsync -avzP ~/.[^.]* /mnt/test
Sample outputs:
Vivek Gite is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating system/Unix shell scripting. He has worked with global clients and in various industries, including IT, education, defense and space research, and the nonprofit sector. Follow him on Twitter , Facebook , Google+ .
Fig.02 Rsync example to copy only hidden files
Aug 28, 2017 | superuser.com
1 Answer active oldest votes
up vote down vote favorite Trying to copy files with rsync, it complains: rsync: send_files failed to open "VirtualBox/Machines/Lubuntu/Lubuntu.vdi" \ (in media): Permission denied (13)That file is not copied. Indeed the file permissions of that file are very restrictive on the server side:
-rw------- 1 1000 1000 3133181952 Nov 1 2011 Lubuntu.vdiI call rsync with
sudo rsync -av --fake-super root@sheldon::media /mnt/mediaThe rsync daemon runs as root on the server. root can copy that file (of course). rsyncd has "fake super = yes" set in /etc/rsyncd.conf.
What can I do so that the file is copied without changing the permissions of the file on the server? rsync file-permissions
share improve this question asked Dec 29 '12 at 10:15 Torsten Bronger 207 add a comment |
If you use RSync as daemon on destination, please post grep rsync /var/log/daemon
to improve your question – F. Hauri Dec 29 '12 at 13:23
up vote down vote As you appear to have root access to both servers have you tried a: --force ? Alternatively you could bypass the rsync daemon and try a direct sync e.g.
rsync -optg --rsh=/usr/bin/ssh --rsync-path=/usr/bin/rsync --verbose --recursive --delete-after --force root@sheldon::media /mnt/media
share improve this answer edited Jan 2 '13 at 10:55 answered Dec 29 '12 at 13:21 arober11 376
Using ssh means encryption, which makes things slower. --force does only affect directories, if I read the man page correctly. – Torsten Bronger Jan 1 '13 at 23:08
Unless your using ancient kit, the CPU overhead of encrypting / decrypting the traffic shouldn't be noticeable, but you will loose 10-20% of your bandwidth, through the encapsulation process. Then again 80% of a working link is better than 100% of a non working one :) – arober11 Jan 2 '13 at 10:52
do have an "ancient kit". ;-) (Slow ARM CPU on a NAS.) But I now mount the NAS with NFS and use rsync (with "sudo") locally. This solves the problem (and is even faster). However, I still think that my original problem must be solvable using the rsync protocol (remote, no ssh). – Torsten Bronger Jan 4 '13 at 7:55
Aug 28, 2017 | unix.stackexchange.com
nixnotwin , asked Sep 21 '12 at 5:11
On my Ubuntu server there are about 150 shell accounts. All usernames begin with the prefix u12.. I have root access and I am trying to copy a directory named "somefiles" to all the home directories. After copying the directory the user and group ownership of the directory should be changed to user's. Username, group and home-dir name are same. How can this be done?Gilles , answered Sep 21 '12 at 23:44
Do the copying as the target user. This will automatically make the target files. Make sure that the original files are world-readable (or at least readable by all the target users). Runchmod
afterwards if you don't want the copied files to be world-readable.getent passwd | awk -F : '$1 ~ /^u12/ {print $1}' | while IFS= read -r user; do su "$user" -c 'cp -Rp /original/location/somefiles ~/' done
Aug 28, 2017 | stackoverflow.com
up vote 10 down vote favorite 4jeffery_the_wind , asked Mar 6 '12 at 15:36
I am using rsync to replicate a web folder structure from a local server to a remote server. Both servers are ubuntu linux. I use the following command, and it works well:rsync -az /var/www/ [email protected]:/var/www/The usernames for the local system and the remote system are different. From what I have read it may not be possible to preserve all file and folder owners and groups. That is OK, but I would like to preserve owners and groups just for the www-data user, which does exist on both servers.
Is this possible? If so, how would I go about doing that?
Thanks!
** EDIT **
There is some mention of rsync being able to preserve ownership and groups on remote file syncs here: http://lists.samba.org/archive/rsync/2005-August/013203.html
** EDIT 2 **
I ended up getting the desired affect thanks to many of the helpful comments and answers here. Assuming the IP of the source machine is 10.1.1.2 and the IP of the destination machine is 10.1.1.1. I can use this line from the destination machine:
sudo rsync -az [email protected]:/var/www/ /var/www/This preserves the ownership and groups of the files that have a common user name, like www-data. Note that using
rsync
withoutsudo
does not preserve these permissions.ghoti , answered Mar 6 '12 at 19:01
You can also sudo the rsync on the target host by using the--rsync-path
option:# rsync -av --rsync-path="sudo rsync" /path/to/files user@targethost:/pathThis lets you authenticate as
user
on targethost, but still get privileged write permission throughsudo
. You'll have to modify your sudoers file on the target host to avoid sudo's request for your password.man sudoers
or runsudo visudo
for instructions and samples.You mention that you'd like to retain the ownership of files owned by www-data, but not other files. If this is really true, then you may be out of luck unless you implement
chown
or a second run ofrsync
to update permissions. There is no way to tell rsync to preserve ownership for just one user .That said, you should read about rsync's
--files-from
option.rsync -av /path/to/files user@targethost:/path find /path/to/files -user www-data -print | \ rsync -av --files-from=- --rsync-path="sudo rsync" /path/to/files user@targethost:/pathI haven't tested this, so I'm not sure exactly how piping find's output into
--files-from=-
will work. You'll undoubtedly need to experiment.xato , answered Mar 6 '12 at 15:39
As far as I know, you cannotchown
files to somebody else than you, if you are not root. So you would have torsync
using thewww-data
account, as all files will be created with the specified user as owner. So you need tochown
the files afterwards.user2485267 , answered Jun 14 '13 at 8:22
I had a similar problem and cheated the rsync command,rsync -avz --delete [email protected]:/home//domains/site/public_html/ /home/domains2/public_html && chown -R wwwusr:wwwgrp /home/domains2/public_html/
the && runs the chown against the folder when the rsync completes successfully (1x '&' would run the chown regardless of the rsync completion status)
Graham , answered Mar 6 '12 at 15:51
The root users for the local system and the remote system are different.
What does this mean? The root user is uid 0. How are they different?
Any user with read permission to the directories you want to copy can determine what usernames own what files. Only root can change the ownership of files being written .
You're currently running the command on the source machine, which restricts your writes to the permissions associated with [email protected]. Instead, you can try to run the command as root on the target machine. Your read access on the source machine isn't an issue.
So on the target machine (10.1.1.1), assuming the source is 10.1.1.2:
# rsync -az [email protected]:/var/www/ /var/www/Make sure your groups match on both machines.
Also, set up access to [email protected] using a DSA or RSA key, so that you can avoid having passwords floating around. For example, as root on your target machine, run:
# ssh-keygen -dThen take the contents of the file
/root/.ssh/id_dsa.pub
and add it to~user/.ssh/authorized_keys
on the source machine. You canssh [email protected]
as root from the target machine to see if it works. If you get a password prompt, check your error log to see why the key isn't working.ghoti , answered Mar 6 '12 at 18:54
Well, you could skip the challenges of rsync altogether, and just do this through a tar tunnel.sudo tar zcf - /path/to/files | \ ssh user@remotehost "cd /some/path; sudo tar zxf -"You'll need to set up your SSH keys as Graham described.
Note that this handles full directory copies, not incremental updates like rsync.
The idea here is that:
- you tar up your directory,
- instead of creating a tar file, you send the tar output to stdout,
- that stdout is piped through an SSH command to a receiving tar on the other host,
- but that receiving tar is run by sudo, so it has privileged write access to set usernames.
Aug 28, 2017 | superuser.com
1 Answer active oldest votes
up vote down vote favorite I'm trying to use rsync to copy a set of files from one system to another. I'm running the command as a normal user (not root). On the remote system, the files are owned by apache and when copied they are obviously owned by the local account (fred). My problem is that every time I run the rsync command, all files are re-synched even though they haven't changed. I think the issue is that rsync sees the file owners are different and my local user doesn't have the ability to change ownership to apache, but I'm not including the
-a
or-o
options so I thought this would not be checked. If I run the command as root, the files come over owned by apache and do not come a second time if I run the command again. However I can't run this as root for other reasons. Here is the command:/usr/bin/rsync --recursive --rsh=/usr/bin/ssh --rsync-path=/usr/bin/rsync --verbose [email protected]:/src/dir/ /local/dirunix rsync
share improve this question edited May 2 '11 at 23:53 Gareth 13.9k 11 44 58 asked May 2 '11 at 23:43 Fred Snertz 11 add a comment |
Why can't you run rsync as root? On the remote system, does fred have read access to the apache-owned files? – chrishiestand May 3 '11 at 0:32
Ah, I left out the fact that there are ssh keys set up so that local fred can become remote root, so yes fred/root can read them. I know this is a bit convoluted but its real. – Fred Snertz May 3 '11 at 14:50
Always be careful when root can ssh into the machine. But if you have password and challenge response authentication disabled it's not as bad. – chrishiestand May 3 '11 at 17:32
up vote down vote Here's the answer to your problem: -c, --checksum This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a "quick check" that (by default) checks if each file's size and time of last modification match between the sender and receiver. This option changes this to compare a 128-bit checksum for each file that has a matching size. Generating the checksums means that both sides will expend a lot of disk I/O reading all the data in the files in the transfer (and this is prior to any reading that will be done to transfer changed files), so this can slow things down significantly. The sending side generates its checksums while it is doing the file-system scan that builds the list of the available files. The receiver generates its checksums when it is scanning for changed files, and will checksum any file that has the same size as the corresponding sender's file: files with either a changed size or a changed checksum are selected for transfer. Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is transferred, but that automatic after-the-transfer verification has nothing to do with this option's before-the-transfer "Does this file need to be updated?" check. For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5. For older protocols, the checksum used is MD4.So run:
/usr/bin/rsync -c --recursive --rsh=/usr/bin/ssh --rsync-path=/usr/bin/rsync --verbose [email protected]:/src/dir/ /local/dirNote there may be a time+disk churn tradeoff by using this option. Personally, I'd probably just sync the file's mtimes too:
/usr/bin/rsync -t --recursive --rsh=/usr/bin/ssh --rsync-path=/usr/bin/rsync --verbose [email protected]:/src/dir/ /local/dir
share improve this answer edited May 3 '11 at 17:55 answered May 3 '11 at 17:48 chrishiestand 1,098 10
Awesome. Thank you. Looks like the second option is going to work for me and I found the first very interesting. – Fred Snertz May 3 '11 at 18:40
psst, hit the green checkbox to give my answer credit ;-) Thx. – chrishiestand May 12 '11 at 1:56
Aug 28, 2017 | unix.stackexchange.com
up vote 11 down vote favorite 1
Eugene Yarmash , asked Apr 24 '13 at 16:35
I have a bash script which usesrsync
to backup files in Archlinux. I noticed thatrsync
failed to copy a file from/sys
, whilecp
worked just fine:# rsync /sys/class/net/enp3s1/address /tmp rsync: read errors mapping "/sys/class/net/enp3s1/address": No data available (61) rsync: read errors mapping "/sys/class/net/enp3s1/address": No data available (61) ERROR: address failed verification -- update discarded. rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.9] # cp /sys/class/net/enp3s1/address /tmp ## this worksI wonder why does
rsync
fail, and is it possible to copy the file with it?mattdm , answered Apr 24 '13 at 18:20
Rsync has code which specifically checks if a file is truncated during read and gives this error !ENODATA
. I don't know why the files in/sys
have this behavior, but since they're not real files, I guess it's not too surprising. There doesn't seem to be a way to tell rsync to skip this particular check.I think you're probably better off not rsyncing
/sys
and using specific scripts to cherry-pick out the particular information you want (like the network card address).Runium , answered Apr 25 '13 at 0:23
First off/sys
is a pseudo file system . If you look at/proc/filesystems
you will find a list of registered file systems where quite a few hasnodev
in front. This indicates they are pseudo filesystems . This means they exists on a running kernel as a RAM-based filesystem. Further they do not require a block device.$ cat /proc/filesystems nodev sysfs nodev rootfs nodev bdev ...At boot the kernel mount this system and updates entries when suited. E.g. when new hardware is found during boot or by
udev
.In
/etc/mtab
you typically find the mount by:sysfs /sys sysfs rw,noexec,nosuid,nodev 0 0For a nice paper on the subject read Patric Mochel's – The sysfs Filesystem .
stat of /sys filesIf you go into a directory under
/sys
and do als -l
you will notice that all files has one size. Typically 4096 bytes. This is reported bysysfs
.:/sys/devices/pci0000:00/0000:00:19.0/net/eth2$ ls -l -r--r--r-- 1 root root 4096 Apr 24 20:09 addr_assign_type -r--r--r-- 1 root root 4096 Apr 24 20:09 address -r--r--r-- 1 root root 4096 Apr 24 20:09 addr_len ...Further you can do a
rsync vs. cpstat
on a file and notice another distinct feature; it occupies 0 blocks. Also inode of root (stat /sys) is 1./stat/fs
typically has inode 2. etc.The easiest explanation for rsync failure of synchronizing pseudo files is perhaps by example.
Say we have a file named
address
that is 18 bytes. Anls
orstat
of the file reports 4096 bytes.
rsync
- Opens file descriptor, fd.
- Uses fstat(fd) to get information such as size.
- Set out to read size bytes, i.e. 4096. That would be line 253 of the code linked by @mattdm .
read_size == 4096
- Ask; read: 4096 bytes.
- A short string is read i.e. 18 bytes.
nread == 18
read_size = read_size - nread (4096 - 18 = 4078)
- Ask; read: 4078 bytes
- 0 bytes read (as first read consumed all bytes in file).
nread == 0
, line 255- Unable to read
4096
bytes. Zero out buffer.- Set error
ENODATA
.- Return.
- Report error.
- Retry. (Above loop).
- Fail.
- Report error.
- FINE.
During this process it actually reads the entire file. But with no size available it cannot validate the result – thus failure is only option.
cp
- Opens file descriptor, fd.
- Uses fstat(fd) to get information such as st_size (also uses lstat and stat).
- Check if file is likely to be sparse. That is the file has holes etc.
copy.c:1010 /* Use a heuristic to determine whether SRC_NAME contains any sparse * blocks. If the file has fewer blocks than would normally be * needed for a file of its size, then at least one of the blocks in * the file is a hole. */ sparse_src = is_probably_sparse (&src_open_sb);As
stat
reports file to have zero blocks it is categorized as sparse.- Tries to read file by extent-copy (a more efficient way to copy normal sparse files), and fails.
- Copy by sparse-copy.
- Starts out with max read size of MAXINT.
Typically18446744073709551615
bytes on a 32 bit system.- Ask; read 4096 bytes. (Buffer size allocated in memory from stat information.)
- A short string is read i.e. 18 bytes.
- Check if a hole is needed, nope.
- Write buffer to target.
- Subtract 18 from max read size.
- Ask; read 4096 bytes.
- 0 bytes as all got consumed in first read.
- Return success.
- All OK. Update flags for file.
- FINE.
,
Might be related, but extended attribute calls will fail on sysfs:[root@hypervisor eth0]# lsattr address
lsattr: Inappropriate ioctl for device While reading flags on address
[root@hypervisor eth0]#
Looking at my strace it looks like rsync tries to pull in extended attributes by default:
22964 <... getxattr resumed> , 0x7fff42845110, 132) = -1 ENODATA (No data available)
I tried finding a flag to give rsync to see if skipping extended attributes resolves the issue but wasn't able to find anything (
--xattrs
turns them on at the destination).
Aug 28, 2017 | ubuntuforums.org
View Full Version : [ubuntu] Rsync doesn't copy everyting
Scormen May 31st, 2009, 10:09 AM Hi all,I'm having some trouble with rsync. I'm trying to sync my local /etc directory to a remote server, but this won't work.
The problem is that it seems he doesn't copy all the files.
The local /etc dir contains 15MB of data, after a rsync, the remote backup contains only 4.6MB of data.Rsync is running by root. I'm using this command:
rsync --rsync-path="sudo rsync" -e "ssh -i /root/.ssh/backup" -avz --delete --delete-excluded -h --stats /etc [email protected]:/home/kris/backup/laptopkris
I hope someone can help.
Thanks!Kris
Scormen May 31st, 2009, 11:05 AM I found that if I do a local sync, everything goes fine.
But if I do a remote sync, it copies only 4.6MB.Any idea?
LoneWolfJack May 31st, 2009, 05:14 PM never used rsync on a remote machine, but "sudo rsync" looks wrong. you probably can't call sudo like that so the ssh connection needs to have the proper privileges for executing rsync.just an educated guess, though.
Scormen May 31st, 2009, 05:24 PM Thanks for your answer.In /etc/sudoers I have added next line, so "sudo rsync" will work.
kris ALL=NOPASSWD: /usr/bin/rsync
I also tried without --rsync-path="sudo rsync", but without success.
I have also tried on the server to pull the files from the laptop, but that doesn't work either.
LoneWolfJack May 31st, 2009, 05:30 PM in the rsync help file it says that --rsync-path is for the path to rsync on the remote machine, so my guess is that you can't use sudo there as it will be interpreted as a path.so you will have to do --rsync-path="/path/to/rsync" and make sure the ssh login has root privileges if you need them to access the files you want to sync.
--rsync-path="sudo rsync" probably fails because
a) sudo is interpreted as a path
b) the space isn't escaped
c) sudo probably won't allow itself to be called remotelyagain, this is not more than an educated guess.
Scormen May 31st, 2009, 05:45 PM I understand what you mean, so I tried also:rsync -Cavuhzb --rsync-path="/usr/bin/rsync" -e "ssh -i /root/.ssh/backup" /etc [email protected]:/home/kris/backup/laptopkris
Then I get this error:
sending incremental file list
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/chatscripts/pap": Permission denied (13)
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/chatscripts/provider": Permission denied (13)
rsync: symlink "/home/kris/backup/laptopkris/etc/cups/ssl/server.crt" -> "/etc/ssl/certs/ssl-cert-snakeoil.pem" failed: Permission denied (13)
rsync: symlink "/home/kris/backup/laptopkris/etc/cups/ssl/server.key" -> "/etc/ssl/private/ssl-cert-snakeoil.key" failed: Permission denied (13)
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/ppp/peers/provider": Permission denied (13)
rsync: recv_generator: failed to stat "/home/kris/backup/laptopkris/etc/ssl/private/ssl-cert-snakeoil.key": Permission denied (13)sent 86.85K bytes received 306 bytes 174.31K bytes/sec
total size is 8.71M speedup is 99.97
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1058) [sender=3.0.5]And the same command with "root" instead of "kris".
Then, I get no errors, but I still don't have all the files synced.
Scormen June 1st, 2009, 09:00 AM Sorry for this bump.
I'm still having the same problem.Any idea?
Thanks.
binary10 June 1st, 2009, 10:36 AM I understand what you mean, so I tried also:rsync -Cavuhzb --rsync-path="/usr/bin/rsync" -e "ssh -i /root/.ssh/backup" /etc [email protected]:/home/kris/backup/laptopkris
Then I get this error:
And the same command with "root" instead of "kris".
Then, I get no errors, but I still don't have all the files synced.Maybe there's a nicer way but you could place /usr/bin/rsync into a private protected area and set the owner to root place the sticky bit on it and change your rsync-path argument such like:
# on the remote side, aka [email protected]
mkdir priv-area
# protect it from normal users running a priv version of rsync
chmod 700 priv-area
cd priv-area
cp -p /usr/local/bin/rsync ./rsync-priv
sudo chown 0:0 ./rsync-priv
sudo chmod +s ./rsync-priv
ls -ltra # rsync-priv should now be 'bold-red' in bashLooking at your flags, you've specified a cvs ignore factor, ignore files that are updated on the target, and you're specifying a backup of removed files.
rsync -Cavuhzb --rsync-path="/home/kris/priv-area/rsync-priv" -e "ssh -i /root/.ssh/backup" /etc [email protected]:/home/kris/backup/laptopkris
From those qualifiers you're not going to be getting everything sync'd. It's doing what you're telling it to do.
If you really wanted to perform a like for like backup.. (not keeping stuff that's been changed/deleted from the source. I'd go for something like the following.
rsync --archive --delete --hard-links --one-file-system --acls --xattrs --dry-run -i --rsync-path="/home/kris/priv-area/rsync-priv" --rsh="ssh -i /root/.ssh/backup" /etc/ [email protected]:/home/kris/backup/laptopkris/etc/
Remove the --dry-run and -i when you're happy with the output, and it should do what you want. A word of warning, I get a bit nervous when not seeing trailing (/) on directories as it could lead to all sorts of funnies if you end up using rsync on softlinks.
Scormen June 1st, 2009, 12:19 PM Thanks for your help, binary10.I've tried what you have said, but still, I only receive 4.6MB on the remote server.
Thanks for the warning, I'll not that!Did someone already tried to rsync their own /etc to a remote system? Just to know if this strange thing only happens to me...
Thanks.
binary10 June 1st, 2009, 01:22 PM Thanks for your help, binary10.I've tried what you have said, but still, I only receive 4.6MB on the remote server.
Thanks for the warning, I'll not that!Did someone already tried to rsync their own /etc to a remote system? Just to know if this strange thing only happens to me...
Thanks.
Ok so I've gone back and looked at your original post, how are you calculating 15MB of data under etc - via a du -hsx /etc/ ??
I do daily drive to drive backup copies via rsync and drive to network copies.. and have used them recently for restoring.
Sure my du -hsx /etc/ reports 17MB of data of which 10MB gets transferred via an rsync. My backup drives still operate.
rsync 3.0.6 has some fixes to do with ACLs and special devices rsyncing between solaris. but I think 3.0.5 is still ok with ubuntu to ubuntu systems.
Here is my test doing exactly what you you're probably trying to do. I even check the remote end..
binary10@jsecx25:~/bin-priv$ ./rsync --archive --delete --hard-links --one-file-system --stats --acls --xattrs --human-readable --rsync-path="~/bin/rsync-priv-os-specific" --rsh="ssh" /etc/ [email protected]:/home/kris/backup/laptopkris/etc/
Number of files: 3121
Number of files transferred: 1812
Total file size: 10.04M bytes
Total transferred file size: 10.00M bytes
Literal data: 10.00M bytes
Matched data: 0 bytes
File list size: 109.26K
File list generation time: 0.002 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10.20M
Total bytes received: 38.70Ksent 10.20M bytes received 38.70K bytes 4.09M bytes/sec
total size is 10.04M speedup is 0.98binary10@jsecx25:~/bin-priv$ sudo du -hsx /etc/
17M /etc/
binary10@jsecx25:~/bin-priv$And then on the remote system I do the du -hsx
binary10@lenovo-n200:/home/kris/backup/laptopkris/etc$ cd ..
binary10@lenovo-n200:/home/kris/backup/laptopkris$ sudo du -hsx etc
17M etc
binary10@lenovo-n200:/home/kris/backup/laptopkris$
Scormen June 1st, 2009, 01:35 PM ow are you calculating 15MB of data under etc - via a du -hsx /etc/ ??
Indeed, on my laptop I see:root@laptopkris:/home/kris# du -sh /etc/
15M /etc/If I do the same thing after a fresh sync to the server, I see:
root@server:/home/kris# du -sh /home/kris/backup/laptopkris/etc/
4.6M /home/kris/backup/laptopkris/etc/On both sides, I have installed Ubuntu 9.04, with version 3.0.5 of rsync.
So strange...
binary10 June 1st, 2009, 01:45 PM it does seem a bit odd.I'd start doing a few diffs from the outputs find etc/ -printf "%f %s %p %Y\n" | sort
And see what type of files are missing.
- edit - Added the %Y file type.
Scormen June 1st, 2009, 01:58 PM Hmm, it's going stranger.
Now I see that I have all my files on the server, but they don't have their full size (bytes).I have uploaded the files, so you can look into them.
Laptop: http://www.linuxontdekt.be/files/laptop.files
Server: http://www.linuxontdekt.be/files/server.files
binary10 June 1st, 2009, 02:16 PM If you look at the files that are different aka the ssl's they are links to local files else where aka linked to /usr and not within /etc/aka they are different on your laptop and the server
Scormen June 1st, 2009, 02:25 PM I understand that soft links are just copied, and not the "full file".But, you have run the same command to test, a few posts ago.
How is it possible that you can see the full 15MB?
binary10 June 1st, 2009, 02:34 PM I was starting to think that this was a bug with du.The de-referencing is a bit topsy.
If you rsync copy the remote backup back to a new location back onto the laptop and do the du command. I wonder if you'll end up with 15MB again.
Scormen June 1st, 2009, 03:20 PM Good tip.On the server side, the backup of the /etc was still 4.6MB.
I have rsynced it back to the laptop, to a new directory.If I go on the laptop to that new directory and do a du, it says 15MB.
binary10 June 1st, 2009, 03:34 PM Good tip.On the server side, the backup of the /etc was still 4.6MB.
I have rsynced it back to the laptop, to a new directory.If I go on the laptop to that new directory and do a du, it says 15MB.
I think you've now confirmed that RSYNC DOES copy everything.. just tht du confusing what you had expected by counting the end link sizes.
It might also think about what you're copying, maybe you need more than just /etc of course it depends on what you are trying to do with the backup :)
enjoy.
Scormen June 1st, 2009, 03:37 PM Yeah, it seems to work well.
So, the "problem" where just the soft links, that couldn't be counted on the server side?
binary10 June 1st, 2009, 04:23 PM Yeah, it seems to work well.
So, the "problem" where just the soft links, that couldn't be counted on the server side?The links were copied as links as per the design of the --archive in rsync.
The contents of the pointing links were different between your two systems. These being that that reside outside of /etc/ in /usr And so DU reporting them differently.
Scormen June 1st, 2009, 05:36 PM Okay, I got it.
Many thanks for the support, binarty10!
Scormen June 1st, 2009, 05:59 PM Just to know, is it possible to copy the data from these links as real, hard data?
Thanks.
binary10 June 2nd, 2009, 09:54 AM Just to know, is it possible to copy the data from these links as real, hard data?
Thanks.Yep absolutely
You should then look at other possibilities of:
-L, --copy-links transform symlink into referent file/dir
--copy-unsafe-links only "unsafe" symlinks are transformed
--safe-links ignore symlinks that point outside the source tree
-k, --copy-dirlinks transform symlink to a dir into referent dir
-K, --keep-dirlinks treat symlinked dir on receiver as dirbut then you'll have to start questioning why you are backing them up like that especially stuff under /etc/. If you ever wanted to restore it you'd be restoring full files and not symlinks the restore result could be a nightmare as well as create future issues (upgrades etc) let alone your backup will be significantly larger, could be 150MB instead of 4MB.
Scormen June 2nd, 2009, 10:04 AM Okay, now I'm sure what its doing :)
Is it also possible to show on a system the "real disk usage" of e.g. that /etc directory? So, without the links, that we get a output of 4.6MB.Thank you very much for your help!
binary10 June 2nd, 2009, 10:22 AM What does the following respond with.sudo du --apparent-size -hsx /etc
If you want the real answer then your result from a dry-run rsync will only be enough for you.
sudo rsync --dry-run --stats -h --archive /etc/ /tmp/etc/
Feb 20, 2017 | opensource.com
Another interesting option, and my personal favorite because it increases the power and flexibility of rsync immensely, is the --link-dest option. The --link-dest option allows a series of daily backups that take up very little additional space for each day and also take very little time to create.Specify the previous day's target directory with this option and a new directory for today. rsync then creates today's new directory and a hard link for each file in yesterday's directory is created in today's directory. So we now have a bunch of hard links to yesterday's files in today's directory. No new files have been created or duplicated. Just a bunch of hard links have been created. Wikipedia has a very good description of hard links . After creating the target directory for today with this set of hard links to yesterday's target directory, rsync performs its sync as usual, but when a change is detected in a file, the target hard link is replaced by a copy of the file from yesterday and the changes to the file are then copied from the source to the target.
So now our command looks like the following.
rsync -aH --delete --link-dest=yesterdaystargetdir sourcedir todaystargetdir
There are also times when it is desirable to exclude certain directories or files from being synchronized. For this, there is the --exclude option. Use this option and the pattern for the files or directories you want to exclude. You might want to exclude browser cache files so your new command will look like this.
rsync -aH --delete --exclude Cache --link-dest=yesterdaystargetdir sourcedir todaystargetdir
Note that each file pattern you want to exclude must have a separate exclude option.
rsync can sync files with remote hosts as either the source or the target. For the next example, let's assume that the source directory is on a remote computer with the hostname remote1 and the target directory is on the local host. Even though SSH is the default communications protocol used when transferring data to or from a remote host, I always add the ssh option. The command now looks like this.
rsync -aH -e ssh --delete --exclude Cache --link-dest=yesterdaystargetdir remote1:sourcedir todaystargetdir
This is the final form of my rsync backup command.
rsync has a very large number of options that you can use to customize the synchronization process. For the most part, the relatively simple commands that I have described here are perfect for making backups for my personal needs. Be sure to read the extensive man page for rsync to learn about more of its capabilities as well as the options discussed here.
Feb 12, 2017 | www.cyberciti.biz
So what is unique about the rsync command?It can perform differential uploads and downloads (synchronization) of files across the network, transferring only data that has changed. The rsync remote-update protocol allows rsync to transfer just the differences between two sets of files across the network connection.
How do I install rsync?Use any one of the following commands to install rsync. If you are using Debian or Ubuntu Linux, type the following command:
Always use rsync over ssh
# apt-get install rsync
OR
$ sudo apt-get install rsync
If you are using Red Hat Enterprise Linux (RHEL) / CentOS 4.x or older version, type the following command:
# up2date rsync
RHEL / CentOS 5.x or newer (or Fedora Linux) user type the following command:
# yum install rsync
Since rsync does not provide any security while transferring data it is recommended that you use rsync over ssh session. This allows a secure remote connection. Now let us see some examples of rsync command.
Comman rsync command optionsTask : Copy file from a local computer to a remote server
- --delete : delete files that don't exist on sender (system)
- -v : Verbose (try -vv for more detailed information)
- -e "ssh options" : specify the ssh as remote shell
- -a : archive mode
- -r : recurse into directories
- -z : compress file data
Copy file from /www/backup.tar.gz to a remote server called openbsd.nixcraft.in
$ rsync -v -e ssh /www/backup.tar.gz [email protected]:~
Output:Password: sent 19099 bytes received 36 bytes 1093.43 bytes/sec total size is 19014 speedup is 0.99Please note that symbol ~ indicate the users home directory (/home/jerry).
Task : Copy file from a remote server to a local computerCopy file /home/jerry/webroot.txt from a remote server openbsd.nixcraft.in to a local computer's /tmp directory:
Task: Synchronize a local directory with a remote directory
$ rsync -v -e ssh [email protected]:~/webroot.txt /tmp
Task: Synchronize a remote directory with a local directory
$ rsync -r -a -v -e "ssh -l jerry" --delete /local/webroot openbsd.nixcraft.in:/webroot
Task: Synchronize a local directory with a remote rsync server or vise-versa
$ rsync -r -a -v -e "ssh -l jerry" --delete openbsd.nixcraft.in:/webroot/ /local/webroot
Task: Mirror a directory between my "old" and "new" web server/ftp
$ rsync -r -a -v --delete rsync://rsync.nixcraft.in/cvs /home/cvs
OR
$ rsync -r -a -v --delete /home/cvs rsync://rsync.nixcraft.in/cvs
You can mirror a directory between my "old" (my.old.server.com) and "new" web server with the command (assuming that ssh keys are set for password less authentication)
$ rsync -zavrR --delete --links --rsh="ssh -l vivek" my.old.server.com:/home/lighttpd /home/lighttpd
Read related previous articles
Other options – rdiff and rdiff-backup
- How do I sync data between two Load balanced Linux/UNIX servers?
- How do I sync data between two Load balanced Windows 2003 servers?
The rdiff command uses the rsync algorithm. A utility called rdiff-backup has been created which is capable of maintaining a backup mirror of a file or directory over the network, on another server. rdiff-backup stores incremental rdiff deltas with the backup, with which it is possible to recreate any backup point. Next time I will write about these utilities.
rsync for Windows Server/XP/7/8Please note if you are using MS-Windows, try any one of the program:
Further readings
=> Read rsync man page
=> Official rsync documentation
Feb 12, 2017 | www.tecmint.com
The purpose of creating a mirror of your Web Server with Rsync is if your main web server fails, your backup server can take over to reduce downtime of your website. This way of creating a web server backup is very good and effective for small and medium size web businesses. Advantages of Syncing Web ServersThe main advantages of creating a web server backup with rsync are as follows:
How To Sync Two Apache Web Servers
- Rsync syncs only those bytes and blocks of data that have changed.
- Rsync has the ability to check and delete those files and directories at backup server that have been deleted from the main web server.
- It takes care of permissions, ownerships and special attributes while copying data remotely.
- It also supports SSH protocol to transfer data in an encrypted manner so that you will be assured that all data is safe.
- Rsync uses compression and decompression method while transferring data which consumes less bandwidth.
Let's proceed with setting up rsync to create a mirror of your web server. Here, I'll be using two servers.
Main ServerBackup Server
- IP Address : 192.168.0.100
- Hostname : webserver.example.com
Step 1: Install Rsync Tool
- IP Address : 192.168.0.101
- Hostname : backup.example.com
Here in this case web server data of webserver.example.com will be mirrored on backup.example.com . And to do so first, we need to install Rsync on both the server with the help of following command.
[root@tecmint]# yum install rsync [On Red Hat based systems] [root@tecmint]# apt-get install rsync [On Debian based systems]Step 2: Create a User to run RsyncWe can setup rsync with root user, but for security reasons, you can create an unprivileged user on main webserver i.e webserver.example.com to run rsync.
[root@tecmint]# useradd tecmint [root@tecmint]# passwd tecmintHere I have created a user " tecmint " and assigned a password to user.
Step 3: Test Rsync SetupIt's time to test your rsync setup on your backup server (i.e. backup.example.com ) and to do so, please type following command.
[root@backup www]# rsync -avzhe ssh [email protected]:/var/www/ /var/wwwSample Output[email protected]'s password: receiving incremental file list sent 128 bytes received 32.67K bytes 5.96K bytes/sec total size is 12.78M speedup is 389.70You can see that your rsync is now working absolutely fine and syncing data. I have used " /var/www " to transfer; you can change the folder location according to your needs.
Step 4: Automate Sync with SSH Passwordless LoginNow, we are done with rsync setups and now its time to setup a cron for rsync. As we are going to use rsync with SSH protocol, ssh will be asking for authentication and if we won't provide a password to cron it will not work. In order to work cron smoothly, we need to setup passwordless ssh logins for rsync.
Here in this example, I am doing it as root to preserve file ownerships as well, you can do it for alternative users too.
First, we'll generate a public and private key with following commands on backups server (i.e. backup.example.com ).
[root@backup]# ssh-keygen -t rsa -b 2048When you enter this command, please don't provide passphrase and click enter for Empty passphrase so that rsync cron will not need any password for syncing data.
Sample OutputGenerating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 9a:33:a9:5d:f4:e1:41:26:57:d0:9a:68:5b:37:9c:23 [email protected] The key's randomart image is: +--[ RSA 2048]----+ | .o. | | .. | | ..++ . | | o=E * | | .Sooo o | | =.o o | | * . o | | o + | | . . | +-----------------+Now, our Public and Private key has been generated and we will have to share it with main server so that main web server will recognize this backup machine and will allow it to login without asking any password while syncing data.
[root@backup html]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]Now try logging into the machine, with " ssh '[email protected] '", and check in .ssh/authorized_keys .
[root@backup html]# [email protected]Now, we are done with sharing keys. To know more in-depth about SSH password less login , you can read our article on it.
Step 5: Schedule Cron To Automate SyncLet's setup a cron for this. To setup a cron, please open crontab file with the following command.
[root@backup ~]# crontab –eIt will open up /etc/crontab file to edit with your default editor. Here In this example, I am writing a cron to run it every 5 minutes to sync the data.
*/5 * * * * rsync -avzhe ssh [email protected]:/var/www/ /var/www/The above cron and rsync command simply syncing " /var/www/ " from the main web server to a backup server in every 5 minutes . You can change the time and folder location configuration according to your needs. To be more creative and customize with Rsync and Cron command, you can check out our more detailed articles at:
Feb 12, 2017 | www.youtube.com
soundtraining.netGreat demonstration and very easy to follow Don! Just a note to anyone who might come across this and start using it in production based systems is that you certainly would not want to be rsyncing with root accounts. In addition you would use key based auth with SSH as an additional layer of security. Just my 2cents ;-) curtis shaw 11 months ago Best rsync tutorial on the web. Thanks.
Use of –include and –exclude OptionsThese two options allows us to include and exclude files by specifying parameters with these option helps us to specify those files or directories which you want to include in your sync and exclude files and folders with you don't want to be transferred.
Here in this example, rsync command will include those files and directory only which starts with 'R' and exclude all other files and directory.
[root@tecmint]# rsync -avze ssh --include 'R*' --exclude '*' [email protected]:/var/lib/rpm/ /root/rpm [email protected]'s password: receiving incremental file list created directory /root/rpm ./ Requirename Requireversion sent 67 bytes received 167289 bytes 7438.04 bytes/sec total size is 434176 speedup is 2.596. Use of –delete OptionIf a file or directory not exist at the source, but already exists at the destination, you might want to delete that existing file/directory at the target while syncing .
We can use '–delete' option to delete files that are not there in source directory.
Source and target are in sync. Now creating new file test.txt at the target.
[root@tecmint]# touch test.txt [root@tecmint]# rsync -avz --delete [email protected]:/var/lib/rpm/ . Password: receiving file list ... done deleting test.txt ./ sent 26 bytes received 390 bytes 48.94 bytes/sec total size is 45305958 speedup is 108908.55Target has the new file called test.txt, when synchronize with the source with '–delete' option, it removed the file test.txt.
7. Set the Max Size of Files to be TransferredYou can specify the Max file size to be transferred or sync. You can do it with "–max-size" option. Here in this example, Max file size is 200k, so this command will transfer only those files which are equal or smaller than 200k.
[root@tecmint]# rsync -avzhe ssh --max-size='200k' /var/lib/rpm/ [email protected]:/root/tmprpm [email protected]'s password: sending incremental file list created directory /root/tmprpm ./ Conflictname Group Installtid Name Provideversion Pubkeys Requireversion Sha1header Sigmd5 Triggername __db.001 sent 189.79K bytes received 224 bytes 13.10K bytes/sec total size is 38.08M speedup is 200.438. Automatically Delete source Files after successful TransferNow, suppose you have a main web server and a data backup server, you created a daily backup and synced it with your backup server, now you don't want to keep that local copy of backup in your web server.
So, will you wait for transfer to complete and then delete those local backup file manually? Of Course NO. This automatic deletion can be done using '–remove-source-files' option.
[root@tecmint]# rsync --remove-source-files -zvh backup.tar /tmp/backups/ backup.tar sent 14.71M bytes received 31 bytes 4.20M bytes/sec total size is 16.18M speedup is 1.10 [root@tecmint]# ll backup.tar ls: backup.tar: No such file or directory9. Do a Dry Run with rsyncIf you are a newbie and using rsync and don't know what exactly your command going do. Rsync could really mess up the things in your destination folder and then doing an undo can be a tedious job.
Use of this option will not make any changes only do a dry run of the command and shows the output of the command, if the output shows exactly same you want to do then you can remove '–dry-run' option from your command and run on the terminal.
root@tecmint]# rsync --dry-run --remove-source-files -zvh backup.tar /tmp/backups/ backup.tar sent 35 bytes received 15 bytes 100.00 bytes/sec total size is 16.18M speedup is 323584.00 (DRY RUN)10. Set Bandwidth Limit and Transfer FileYou can set the bandwidth limit while transferring data from one machine to another machine with the the help of '–bwlimit' option. This options helps us to limit I/O bandwidth.
[root@tecmint]# rsync --bwlimit=100 -avzhe ssh /var/lib/rpm/ [email protected]:/root/tmprpm/ [email protected]'s password: sending incremental file list sent 324 bytes received 12 bytes 61.09 bytes/sec total size is 38.08M speedup is 113347.05Also, by default rsync syncs changed blocks and bytes only, if you want explicitly want to sync whole file then you use '-W' option with it.
[root@tecmint]# rsync -zvhW backup.tar /tmp/backups/backup.tar backup.tar sent 14.71M bytes received 31 bytes 3.27M bytes/sec total size is 16.18M speedup is 1.10That's all with rsync now, you can see man pages for more options. Stay connected with Tecmint for more exciting and interesting tutorials in future. Do leave your comments and suggestions.
yolinux.com
Referencing directories - errors and subtleties:
- Back-up (push) your photo directories to a second drive:
rsync -av ~/Pictures /mnt/drive2
This creates a backup of your photos in /mnt/drive2/Pictures/
Back-up to a USB thumb drive is similar: rsync -av ~/Pictures /media/KINGSTON
When you add new photos, just re-execute this rsync command to backup the latest changes.
Note: The drive name will be dependent on the manufacturer.[Potential Pitfall]: Do not include the source directory name in the destination.
rsync -av ~/Pictures /mnt/drive2/Pictures
This will result in /mnt/drive2/Pictures/Pictures/
Note that rsync destinations acts just like the cp and rcp commands.
Also note that rsync -av ~/Pictures/ /mnt/drive2/ has a different behavior from rsync -av ~/Pictures /mnt/drive2- Back-up (push) two directory source to a single directory on the second drive:
rsync -av ~/Pictures ~/Images /mnt/drive2
This creates a backup of your photos from the two directories in /mnt/drive2/Pictures/- Sync directories and if any were deleted in ~/Pictures, also delete it in /mnt/drive2/Pictures/:
rsync -a --progress --delete ~/Pictures /mnt/drive2
This creates a backup in /mnt/drive2/Pictures/- Sync one specific file in a directory:
rsync -a ~/Pictures/Group-photo-2001-A.jpg /mnt/drive2/Pictures- Sync a group of specific files in a directory:
rsync -a ~/Pictures/2001-*.jpg /mnt/drive2/Pictures
This creates a backup in /mnt/drive2/Pictures/
Note that when transferring files only, the directory name has to be provided in the destination path.- Sync files and directories listed in a file:
rsync -ar --files-from=Filelist.txt ~/Data /mnt/drive2
This creates a backup in /mnt/drive2/Data/
Directory paths are included if specified with a closing slash such as pathx/pathy/pathz/. Path names must be terminated with a "/" or "/."- Back-up (push) your source code and compress to save space. Ignore object files:
rsync -avz --delete --exclude='*.o' --exclude='*.so' ~/src /mnt/drive2
This creates a backup in /mnt/drive2/src/ but does not transfer files with the ".o" and ".so" extensions.- Back-up (push) your source code and ignore object and shared object code files:
rsync -av --delete --filter='+ *.[ch]' --filter='- *.o' ~/src /mnt/drive2
This transfers files with the extension ".c" and ".h" but does not transfer object files with the ".o" extensions.
same as rsync.exe -av --exclude='*.o' --filter='+ *.c *.h' ~/src /mnt/drive2- Back-up (push) your source code and ignore CM directories, object and shared object code files:
rsync -artv --progress --delete --filter='+ *.[ch]' --filter='- *.o' --exclude=".svn" ~/src /mnt/drive2
Note that --exclude overides the include filter --filter='+ *.[ch]' so that ".c" and ".h" files under .svn/ are not copied.Rsync Options: (partial list)
- rsync -ar dir1 dir2
This will copy dir1 into dir2 to give dir2/dir1 thus dir1/file.txt gets copied to dir2/dir1/file.txt- rsync -ar dir1/ dir2/
This will copy the contents of dir1 into dir2 to give dir2/contents-of-dir1, eg dir1/file.txt gets copied to dir2/file.txt
The following all achieve the same results:
- rsync -ar dir1/ dir2/
- rsync -ar dir1/* dir2
- rsync -ar dir1/* dir2/
Note that rsync will be able to handle files with blanks in the file name or directory name as well as with dashes ("-") or underscores ("_").
For all options see the rsync man page
Command line argument Description -a
(--archive)Archive.
Includes options:
- -r: recursion
- -l: preserve symbolic links as symbolic links. Opposite of -L
- -p: preserve permissions (Linux/unix only)
- -t: preserve file modification time
- -g: preserve group ownership
- -o: preserve user ownership
- -D: preserve special files and devices (Linux/unix only)
-d
(--dirs)Copy directory tree structure without copying the files within the directories --existing Update only existing files from source directory which are already present at the destination. No new files will be transferred. -L
(--copy-links)Transform a symbolic link to a copied file upon transfer --stats Print verbose set of statistics on the transfer
Add -h (--human-readable) to print stats in an understandable fashion-p
(--perms)Preserve permissions (not relevant for MS/Windows client) -r
(--recursive)Recursive through directories and sub-directories -t
(--times)Preserve file modification times -v
(--verbose)Verbose -z
(--compress)Compress files during transfer to reduce network bandwidth. Files not stored in an altered or compressed state.
Note that compression will have little or no effect on JPG, PNG and files already using compression.
Use arguments --skip-compress=gz/bz2/jpg/jpeg/ogg/mp[34]/mov/avi/rpm/deb/ to avoid compressing files already compressed--delete Delete extraneous files from destination directories. Delete files on archive server if they were also deleted on client.
Use the argument -m (--prune-empty-dirs) to delete empty directories (no longer useful after its contents are deleted)--include
--exclude
--filterSpecify a pattern for specific inclusion or exclusion or use the more universal filter for inclusion (+)/exclusion (-).
Do not transfer files ending with ".o": --exclude='*.o'
Transfer all files ending with ".c" or ".h": --filter='+ *.[ch]'-i
(--itemize-changes)Print information about the transfer. List everything (all file copies and file changes) rsync is going to perform --list-only
--dry-runDon't copy anything, just list what rsync would copy if this option was not given. This helps when debugging the correct exclusion/inclusion filters. --progress Shows percent complete, Kb transferred and Kb/s transfer rate. Includes verbose output.
Rsync can be configured in multiple client-server modes.
Rsync Client-Server Configuration and Operation: These configurations are specified with the use of the colon ":"
- connect client to a sever running rsync in daemon mode
- connect client to a sever using a ssh shell
- Double colon refers to a connection to a host running the rsync daemon in the format hostname::module/path where the module name is identified by the configuration in /etc/rsyncd.conf. The double colon is equivalent to using the URL prefix rsync://
- Single colon refers to the use of a remote shell
- No colon then the directory is considered to be local to the system.
1) Rsync daemon server:
The Rsync server is often referred to as rsyncd or the rsync daemon. This is in fact the same rsync executable run with the command line argument "--daemon". This can be run stand-alone or using xinetd as is typically configured on most Linux distributions.
Configure xinetd to manage rsync:File: /etc/xinetd.d/rsync
Default: "disable = yes". Change to "disable = no"
For more information on xinetd see the YoLinux xinetd tutorial.
Start/Re-start xinetd: /etc/rc.d/init.d/xinetd restart
view sourceprint?
01
service rsync
02
{
03
disable = no
04
flags = IPv6
05
socket_type = stream
06
wait = no
07
user = root
08
server = /usr/bin/rsync
09
server_args = --daemon
10
log_on_failure += USERID
11
}
Typical Linux distributions do not pre-configure rsync for server use. Both Ubuntu and Red Hat based distributions require that one generates the configuration file "/etc/rsyncd.conf"
File: /etc/rsyncd.conf
rsyncd.conf man page
view sourceprint?
01
log file = /var/log/rsyncd.log
02
hosts allow = 192.17.39.244, 192.17.39.60
03
hosts deny = *
04
list = true
05
uid = root
06
gid = root
07
read only = false
08
09
[Proj1]
10
path = /tmp/Proj1
11
comment = Project 1 rsync directory
12
13
[ProjX]
14
path = /var/ProjX
15
comment = Project X rsync directory
Client command to rsync to the server:
Push: rsync -avr /home/user1/Proj1/Data server-host-name::Proj1
(eg. update server backup from mobile laptop)
This will initially copy over directory Data and all of its contents to /tmp/Proj1/Data on the remote server.Pull: rsync -avr server-host-name::Proj1 /home/user1/Proj1/Data
(eg. update mobile laptop from server backup)
2) Rsync to server using ssh shell:
Using this method does not use the configuration "modules" in /etc/rsyncd.conf but instead uses the paths as if logged in using ssh.First configure ssh for "password-less" login:
Note that current Linux distributions use ssh version 2 and rsa.Now try rsync (push) using ssh:
Note that "Enter" was pressed when asked for a "passphrase" to take the default.
[user1@myclient ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/user1/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/user1/.ssh/id_rsa. Your public key has been saved in /home/user1/.ssh/id_rsa.pub. The key fingerprint is: aa:1c:76:33:8a:9c:10:51:............
Two files are generated:Note file protection on file:
- Local client (private key): ~/.ssh/id_rsa
- Contents (one line) of file (public key): ~/.ssh/id_rsa.pub to be copied into file on server: ~/.ssh/authorized_keys
Copy public key to server so you can login.
[user1@myclient ~]$ ls -l ~/.ssh/id_rsa -rw-------. 1 user1 user1 1675 Sep 7 14:55 /home/user1/.ssh/id_rsa
Use the following command:Now try logging into the machine, with "ssh 'user1@remote-server'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. Test "password-less" ssh connection: ssh remote-server
[user1@myclient ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub user1@remote-server user1@remote-server's password:
This command should log you in without asking for a password.
Note that if this connection is is to be spawned by a cron job (eg. root user) then the shell user ID must be provided: user1@
rsync -avr --rsh=/usr/bin/ssh /home/user1/Proj1/Data remote-server:/mnt/supersan/Proj1
rsync -avr --rsh=/usr/bin/ssh /home/user1/Proj1/Data user1@remote-server:/mnt/supersan/Proj1SSH options may be put in the file ~/.ssh/config
crontab:
Note that rsync is often used with cron to perform a nightly rsync.
eg. Rsync to get latest updates to the web server at 2:00am:
- File: /etc/crontab
See the crontab man page
view sourceprint?
1
* 2 * * * rsync -avr server-host-name::Proj1/html /var/www > /var/log/rsync 2>&1
Sep 1, 2009 | thegeekstuff.com
Example 1. Synchronize Two Directories in a Local Server
To sync two directories in a local computer, use the following rsync -zvr command.
$ rsync -zvr /var/opt/installation/inventory/ /root/temp building file list ... done sva.xml svB.xml . sent 26385 bytes received 1098 bytes 54966.00 bytes/sec total size is 44867 speedup is 1.63 $In the above rsync example:
- -z is to enable compression
- -v verbose
- -r indicates recursive
Now let us see the timestamp on one of the files that was copied from source to destination. As you see below, rsync didn't preserve timestamps during sync.
$ ls -l /var/opt/installation/inventory/sva.xml /root/temp/sva.xml -r--r--r-- 1 bin bin 949 Jun 18 2009 /var/opt/installation/inventory/sva.xml -r--r--r-- 1 root bin 949 Sep 2 2009 /root/temp/sva.xmlExample 2. Preserve timestamps during Sync using rsync -a
rsync option -a indicates archive mode. -a option does the following,
- Recursive mode
- Preserves symbolic links
- Preserves permissions
- Preserves timestamp
- Preserves owner and group
Now, executing the same command provided in example 1 (But with the rsync option -a) as shown below:
$ rsync -azv /var/opt/installation/inventory/ /root/temp/ building file list ... done ./ sva.xml svB.xml . sent 26499 bytes received 1104 bytes 55206.00 bytes/sec total size is 44867 speedup is 1.63 $As you see below, rsync preserved timestamps during sync.
$ ls -l /var/opt/installation/inventory/sva.xml /root/temp/sva.xml -r--r--r-- 1 root bin 949 Jun 18 2009 /var/opt/installation/inventory/sva.xml -r--r--r-- 1 root bin 949 Jun 18 2009 /root/temp/sva.xmlExample 3. Synchronize Only One File
To copy only one file, specify the file name to rsync command, as shown below.
$ rsync -v /var/lib/rpm/Pubkeys /root/temp/ Pubkeys sent 42 bytes received 12380 bytes 3549.14 bytes/sec total size is 12288 speedup is 0.99Example 4. Synchronize Files From Local to Remote
rsync allows you to synchronize files/directories between the local and remote system.
$ rsync -avz /root/temp/ [email protected]:/home/thegeekstuff/temp/ Password: building file list ... done ./ rpm/ rpm/Basenames rpm/Conflictname sent 15810261 bytes received 412 bytes 2432411.23 bytes/sec total size is 45305958 speedup is 2.87While doing synchronization with the remote server, you need to specify username and ip-address of the remote server. You should also specify the destination directory on the remote server. The format is username@machinename:path
As you see above, it asks for password while doing rsync from local to remote server.
Sometimes you don't want to enter the password while backing up files from local to remote server. For example, If you have a backup shell script, that copies files from local to remote server using rsync, you need the ability to rsync without having to enter the password.
To do that, setup ssh password less login as we explained earlier.
Example 5. Synchronize Files From Remote to Local
When you want to synchronize files from remote to local, specify remote path in source and local path in target as shown below.
$ rsync -avz [email protected]:/var/lib/rpm /root/temp Password: receiving file list ... done rpm/ rpm/Basenames . sent 406 bytes received 15810230 bytes 2432405.54 bytes/sec total size is 45305958 speedup is 2.87Example 6. Remote shell for Synchronization
rsync allows you to specify the remote shell which you want to use. You can use rsync ssh to enable the secured remote connection.
Use rsync -e ssh to specify which remote shell to use. In this case, rsync will use ssh.
$ rsync -avz -e ssh [email protected]:/var/lib/rpm /root/temp Password: receiving file list ... done rpm/ rpm/Basenames sent 406 bytes received 15810230 bytes 2432405.54 bytes/sec total size is 45305958 speedup is 2.87Example 7. Do Not Overwrite the Modified Files at the Destination
In a typical sync situation, if a file is modified at the destination, we might not want to overwrite the file with the old file from the source.
Use rsync -u option to do exactly that. (i.e do not overwrite a file at the destination, if it is modified). In the following example, the file called Basenames is already modified at the destination. So, it will not be overwritten with rsync -u.
$ ls -l /root/temp/Basenames total 39088 -rwxr-xr-x 1 root root 4096 Sep 2 11:35 Basenames $ rsync -avzu [email protected]:/var/lib/rpm /root/temp Password: receiving file list ... done rpm/ sent 122 bytes received 505 bytes 114.00 bytes/sec total size is 45305958 speedup is 72258.31 $ ls -lrt total 39088 -rwxr-xr-x 1 root root 4096 Sep 2 11:35 BasenamesExample 8. Synchronize only the Directory Tree Structure (not the files)
Use rsync -d option to synchronize only directory tree from source to the destination. The below example, synchronize only directory tree in recursive manner, not the files in the directories.
$ rsync -v -d [email protected]:/var/lib/ . Password: receiving file list ... done logrotate.status CAM/ YaST2/ acpi/ sent 240 bytes received 1830 bytes 318.46 bytes/sec total size is 956 speedup is 0.46Example 9. View the rsync Progress during Transfer
When you use rsync for backup, you might want to know the progress of the backup. i.e how many files are copies, at what rate it is copying the file, etc.
rsync –progress option displays detailed progress of rsync execution as shown below.
$ rsync -avz --progress [email protected]:/var/lib/rpm/ /root/temp/ Password: receiving file list ... 19 files to consider ./ Basenames 5357568 100% 14.98MB/s 0:00:00 (xfer#1, to-check=17/19) Conflictname 12288 100% 35.09kB/s 0:00:00 (xfer#2, to-check=16/19) . . . sent 406 bytes received 15810211 bytes 2108082.27 bytes/sec total size is 45305958 speedup is 2.87You can also use rsnapshot utility (that uses rsync) to backup local linux server, or backup remote linux server.
Example 10. Delete the Files Created at the Target
If a file is not present at the source, but present at the target, you might want to delete the file at the target during rsync.
In that case, use –delete option as shown below. rsync delete option deletes files that are not there in source directory.
# Source and target are in sync. Now creating new file at the target. $ > new-file.txt $ rsync -avz --delete [email protected]:/var/lib/rpm/ . Password: receiving file list ... done deleting new-file.txt ./ sent 26 bytes received 390 bytes 48.94 bytes/sec total size is 45305958 speedup is 108908.55Target has the new file called new-file.txt, when synchronize with the source with –delete option, it removed the file new-file.txt
Example 11. Do not Create New File at the Target
If you like, you can update (Sync) only the existing files at the target. In case source has new files, which is not there at the target, you can avoid creating these new files at the target. If you want this feature, use –existing option with rsync command.
First, add a new-file.txt at the source.
[/var/lib/rpm ]$ > new-file.txtNext, execute the rsync from the target.
$ rsync -avz --existing [email protected]:/var/lib/rpm/ . [email protected]'s password: receiving file list ... done ./ sent 26 bytes received 419 bytes 46.84 bytes/sec total size is 88551424 speedup is 198991.96If you see the above output, it didn't receive the new file new-file.txt
Example 12. View the Changes Between Source and Destination
This option is useful to view the difference in the files or directories between source and destination.
At the source:
$ ls -l /var/lib/rpm -rw-r--r-- 1 root root 5357568 2010-06-24 08:57 Basenames -rw-r--r-- 1 root root 12288 2008-05-28 22:03 Conflictname -rw-r--r-- 1 root root 1179648 2010-06-24 08:57 DirnamesAt the destination:
$ ls -l /root/temp -rw-r--r-- 1 root root 12288 May 28 2008 Conflictname -rw-r--r-- 1 bin bin 1179648 Jun 24 05:27 Dirnames -rw-r--r-- 1 root root 0 Sep 3 06:39 BasenamesIn the above example, between the source and destination, there are two differences. First, owner and group of the file Dirname differs. Next, size differs for the file Basenames.
Now let us see how rsync displays this difference. -i option displays the item changes.
$ rsync -avzi [email protected]:/var/lib/rpm/ /root/temp/ Password: receiving file list ... done >f.st.... Basenames .f....og. Dirnames sent 48 bytes received 2182544 bytes 291012.27 bytes/sec total size is 45305958 speedup is 20.76In the output it displays some 9 letters in front of the file name or directory name indicating the changes.
In our example, the letters in front of the Basenames (and Dirnames) says the following:
> specifies that a file is being transferred to the local host. f represents that it is a file. s represents size changes are there. t represents timestamp changes are there. o owner changed g group changed.Example 13. Include and Exclude Pattern during File Transfer
rsync allows you to give the pattern you want to include and exclude files or directories while doing synchronization.
$ rsync -avz --include 'P*' --exclude '*' [email protected]:/var/lib/rpm/ /root/temp/ Password: receiving file list ... done ./ Packages Providename Provideversion Pubkeys sent 129 bytes received 10286798 bytes 2285983.78 bytes/sec total size is 32768000 speedup is 3.19In the above example, it includes only the files or directories starting with 'P' (using rsync include) and excludes all other files. (using rsync exclude '*' )
Example 14. Do Not Transfer Large Files
You can tell rsync not to transfer files that are greater than a specific size using rsync –max-size option.
$ rsync -avz --max-size='100K' [email protected]:/var/lib/rpm/ /root/temp/ Password: receiving file list ... done ./ Conflictname Group Installtid Name Sha1header Sigmd5 Triggername sent 252 bytes received 123081 bytes 18974.31 bytes/sec total size is 45305958 speedup is 367.35max-size=100K makes rsync to transfer only the files that are less than or equal to 100K. You can indicate M for megabytes and G for gigabytes.
Example 15. Transfer the Whole File
One of the main feature of rsync is that it transfers only the changed block to the destination, instead of sending the whole file.
If network bandwidth is not an issue for you (but CPU is), you can transfer the whole file, using rsync -W option. This will speed-up the rsync process, as it doesn't have to perform the checksum at the source and destin t ... done ./ Basenames Conflictname Dirnames Filemd5s Group Installtid Name sent 406 bytes received 15810211 bytes 2874657.64 bytes/sec total size is 45305958 speedup is 2.87
Additional rsync Tutorials
Apr 26, 2012 | OpenAlfa Blog
Server Administration Add comments(Leer este artículo en espańol)
The main purpose of the rsync command in Linux distributions is to copy contents from an origin to a destination, but it can also be used to state the differences between to directory trees. Besides, both origin and destination can be local, or reside in a remote server.
We compare directoryA and directoryB using two rsync commands:
$ rsync --dry-run -v -r -c --delete directoryA/ directoryB/ $ rsync --dry-run -v -r -c --delete directoryB/ directoryA/The basic usage of the rsync command to copy directoryA to directoryB is:
$ rsync directoryA/ directoryB/Warning: It is important to end the origin directory name with the '/' character, because otherwise the rsync would create a subdirectory named 'directoryA' under the destination directory 'directoryB'.
- The options "–dry-run -v" tell rsync not to perform any actual copy, but just to print the names of the files that it would copy.
- Option "-r" tells rsync to execute recursively.
- Option "-c" sets that the file comparison is to be performed by computing a checksum of the content of the files, instead of just comparing the date and size of the files.
- Finally, option "–delete" tells rsync to remove existing files in the destination directory, that do not exist in the origin directory (but because of the –dry-run option, it will just print the names of the files that it would delete, with lines like this: "deleting filename").
The first of the above rsync commands will print:
- The names of files existing in both directories, having different content.
- The names of files existing only on directoryB and not in directoryA (as files to be deleted)
The second command will print:
- The names of files existing in both directories, having different content (these should be the same as those printed by the first command)
- The names of files existing only on directoryA and not in directoryB (as files to be deleted)
Example:
By comparing to mediawiki installations wikiA and wikiB, we get the the output:
$ rsync --dry-run -v -r -c --delete wikiA/ wikiB/ sending incremental file list LocalSettings.php deleting imagenes/logo135x135.png imagenes/Home_icon.jpg imagenes/Home_icon.png imagenes/domowiki.gif imagenes/domowiki_logo.png sent 112402 bytes received 247 bytes 75099.33 bytes/sec total size is 68027992 speedup is 603.89 (DRY RUN) $ rsync --dry-run -v -r -c --delete wikiA/ wikiB/ sending incremental file list LocalSettings.php deleting imagenes/domowiki_logo.png deleting imagenes/domowiki.gif deleting imagenes/Home_icon.png deleting imagenes/Home_icon.jpg imagenes/logo135x135.png sent 112321 bytes received 244 bytes 225130.00 bytes/sec total size is 68041474 speedup is 604.46 (DRY RUN) $As we see, in this case LocalSettings.php files exist in both directories, but their content differs, and there are also some images existing only under wikiA, and some images existing only under wikiB.
Server Fault
Running ubuntu 12.04, I want to compare 2 directories, say folder1/ and folder2/ and copy any files that are different to folder3/. There are also nested files, so matching subdirectories should be copied as wellIs there a single command that would help me? I can get the full list of changed files running:
rsync -rcnC --out-format="%f" folder1/ folder2/
But rsync doesn't seem to have the ability to "export" these files on a different target directory. Can I pipe the list to cp or some other program, so that the files are copied, while the directories are created as well? For example, I tried
rsync -rcnC --out-format="%f" folder1/ folder2/ | xargs cp -t folder3/
but that wouldn't preserve directories as well, it would simply copy all files inside folder3/
A:
Use --compare-dest.From the man page:
--compare-dest=DIR - This option instructs rsync to use DIR on the destination machine as an additional hierarchy to compare destination files against doing transfers (if the files are missing in the destination directory). If a file is found in DIR that is identical to the sender's file, the file will NOT be transferred to the destination directory. This is useful for creating a sparse backup of just files that have changed from an earlier backup.
first check your syntax with --dry-run
rsync -aHxv --progress --dry-run --compare-dest=folder2/ folder1/ folder3/
Then once you're satisfied with the output:
rsync -aHxv --progress --compare-dest=folder2/ folder1/ folder3/
this link has a good explanation of --compare-dest scope.
Super User
I'm a UNIX dev myself, and wouldn't be asking if this were a UNIX system we're dealing with, but alas. Also, this is for a custom nightly backup solution, where reliability and data integrity is a priority, so given that a few weeks ago I couldn't even figure out a for-loop in a batch script, I'm pretty sure I lack the experience to do this right, or even determine the best way to do this.
Reading http://www.howtoforge.com/backing-up-with-rsync-and-managing-previous-versions-history, I learned that rsync can do something like what I'm after, using options like
--dry-run # don't actually rsync (touch) any files --itemize-changes # list changes rsync _would_ have made --out-format="%i|%n|" # define an output format for the list of changes
File system permissions depend on the umask of the mount-user root and his user- and group-ID. To gain write access to the vfat partition also as normal user, set "umask=002" to give rights to read, write and execute to members of the group. All the other users do not have write access to the partition (can't be bad to treat a multi-user system accordingly!) Now add the parameter "gid=100" so all the files stored on the vfat partition belong to the group "users" (at least on my Debian system). Additionally, we'll add the parameter "uid=1000" to make sure that files copied from our source to the vfat partition don't get "root" but the actual user as owner. (On my system, 1000 is the user-ID of the main user "t" who is member of the group "users"). On my Fedora Core 3 system, I used "uid=500" and "gid=500" which is my user and group-ID)
> mount /dev/sda1 /myvfat -t vfat -o shortname=mixed,codepage=850,umask=002,uid=1000,gid=100 > mount | grep sda1 /dev/sda1 in /myvfat type vfat (rw,shortname=mixed,codepage=850,umask=002,uid=1000,gid=100) > grep sda1 /proc/mounts /dev/sda1 /myvfat vfat rw,nodiratime,uid=1000,gid=100,fmask=0002,dmask=0002,codepage=cp850,shortname=mixed 0 0 > cd /myvfat > ls -la -rwxrwxr-x 1 t users 0 Dec 31 16:05 ABCDEFGH -rwxrwxr-x 1 t users 0 Dec 31 16:05 ABCDEFGHIIf you want to add these options to your /etc/fstab, you may want to add the parameters "noauto" and "user", so that the file system does not get mounted automatically at system start and can be mounted by a normal user. The parameter "user" implies, for security reasons, also "noexec", "nosuid", and " nodev". This, however, should not be a problem in our example, because we assumed to deal with an external hard drive for pure data storage. If you want to execute programs on the vfat partition, add the parameter "exec". As an optimisation you can turn off the updating of the last file access mit the parameters "noatime" and "nodiratime", if you don't need this information. Personally, I do use this information, for example to find (with "find -atime -21") the audio files that I listened to during the last three weeks.
The resulting /etc/fstab-entry looks like this:
/dev/sda1 /myvfat vfat shortname=mixed,codepage=850,umask=002,uid=1000,gid=100,noauto,user 0 0 > mount /myvfat > mount | grep sda1 /dev/sda1 on /myvfat type vfat (rw,noexec,nosuid,nodev,shortname=mixed,codepage=850,umask=002,uid=1000,gid=100) > grep sda1 /proc/mounts /dev/sda1 /myvfat vfat rw,nodiratime,nosuid,nodev,noexec,uid=1000,gid=100,fmask=0002,dmask=0002,codepage=cp850, shortname=mixed 0 0Further information can be found in the man-pages of the "mount"-command, especially in the sections "Mount options for fat" and "Mount options for vfat"
2.2. Adjusting the system time
A very sneaky problem occurs for everyone who uses a system time like "Europe/Berlin". Contrary to vfat, Unix file systems do take leap seconds in the past and daylight savings time into account. The time of the last file change of a file created in January would be deferred by one hour in June. As a consequence, rsync would transmit all files on every clock change.
An example:
# On vfat during daylight savings time the date gets supplemented with the current time zone, thus being forged by one hour. > TZ=Europe/Berlin ls -l --time-style=full-iso /myvfat/testfile -rwxrwxrwx [...] 2003-11-26 02:53:02.000000000 +0200 # On ext3 the date is displayed correctly also during daylight savings time. > TZ=Europe/Berlin ls -l --time-style=full-iso /myext3/testfile -rw-rw-rw- [...] 2003-11-26 02:53:02.000000000 +0100As I did not find any mount parameter to change the time zone, I adjusted the system time to a zone that does not have daylight savings time (e.g. UTC) and set the local time zone "Europe/Berlin" for all users. As a consequence, however, all syslogd time stamps also use UTC instead of the local time zone.
Debian users can adjust this via "base-config". "Configure timezone" to "None of the above" and "UTC" gets the job done. Afterwards it should look like this:
> cat /etc/timezone Etc/UTC > ls -l /etc/localtime [...] /etc/localtime -> /usr/share/zoneinfo/Etc/UTC # ~/.bash_profile export TZ=Europe/Berlin
2.3. File size limitsLinux only supports files up to 2GB size on vfat. You can at least backup bigger files by splitting them up into smaller parts. In the following example, I am backing up a 5GB-cryptoloop file to a vfat partition by splitting it into 2000MB pieces:
> cd /myvfat > split -d -b 2000m /myext3/cryptfile cryptfile_backup > ls cryptfile_backup00 cryptfile_backup01 cryptfile_backup02Files bigger than 2GB must be excluded from rsync of course.
Reassembling the original file can be done via:
# Linux > cat cryptfile_backup00 cryptfile_backup01 cryptfile_backup02 > cryptfile # DOS > copy /b cryptfile_backup00 + cryptfile_backup01 + cryptfile_backup02 cryptfile2.4. Using rsync
The characteristics of vfat must also be considered when calling up rsync. As vfat does not support symbolic links, file permissions, owners, groups and devices, usage of the parameter "-a", which considers all of the above will not have any effect (aside from the error messages). Thus it's best to use only the parameters that actually work:
-r, --recursive treat folders recursively -v, --verbose show transmitted files -t, --times keep time settings -n, --dry-run test only --exclude ignore filesAs the date of the last file change is very important for rsync, the option "-t" is essential. It's also very wise to test every rsync change with "-n" before.
The last roadblock to a successful synchronization with rsync is the time resolution of the vfat time stamp. It amounts to a little more than one second. You'll have to set the parameter "--modify-window=1" to gain a tolerance of one second, so that rsync isn't "on the dot".
In a nutshell, the command to efficiently transmit all files from an ext3 file system to a vfat file system is:
> rsync -rvtn --modify-window=1 --exclude "lost+found" /myext3/* /myvfat3. Problems with large vfat file systems
The rapid growth of hard drive size poses big problems for the rather neglected dosfstools and vfat-driver.
3.1. Lost Clusters
Under Linux Kernel 2.4.x., a limit of the cluster data type results in data loss, as soon as the vfat file system holds around 130GB. In Kernel 2.6.x., this problem was - rather accidently - solved, when many variables were consequently provided with a new type. A detailed description of this bug, including a testsuite and a patch (by Erik Andersen) can be found here. (The patch also allows for file sizes up to 4GB).
If you, however, work with a 2.4.x. Kernel and have a "full" vfat partition, be prepared to lose data: any written file in a new folder will be lost after unmounting the file system. When you mount the file system again, these files have a size of 0 and the clusters are in limbo. You can delete the unassigned clusters via dosfsck.
3.2. dosfsck and high RAM Demand
To conduct file system checks as efficiently as possible, dosfsck copies both FATs to RAM. With a very large file system on a 250GB drive, the high number of clusters yields a very high demand for RAM, that surpassed my 350MB (including swap). Thus dosfsck aborted with a malloc-error. Roman Hodek, the maintainer of dosfsck, proposed to convert the program to "mmap()", but also said that this change would be complex. As long as this situation has not changed, be sure top have sufficient RAM.
3.3. Executing dosfsck
As long as the vfat file system is mounted, dosfsck can be executed, but all repairs silently fail. Thus you should make sure that your partition is not mounted befora using dosfsck. In the following example, an unassigned cluster (due to the bug in Kernel 2.4.x.) is located and deleted. By the way, the command fsck.vfat is a symbolic link to dosfsck.
> fsck.vfat -vr /dev/sda1 dosfsck 2.10 (22 Sep 2003) dosfsck 2.10, 22 Sep 2003, FAT32, LFN Checking we can access the last sector of the filesystem Boot sector contents: System ID "mkdosfs" Media byte 0xf8 (hard disk) 512 bytes per logical sector 16384 bytes per cluster 32 reserved sectors First FAT starts at byte 16384 (sector 32) 2 FATs, 32 bit entries 39267840 bytes per FAT (= 76695 sectors) Root directory start at cluster 2 (arbitrary size) Data area starts at byte 78552064 (sector 153422) 9816944 data clusters (160840810496 bytes) 63 sectors/track, 255 heads 0 hidden sectors 314295660 sectors total Checking for unused clusters. Reclaimed 1 unused cluster (16384 bytes). Checking free cluster summary. Free cluster summary wrong (641900 vs. really 641901) 1) Correct 2) Don't correct ? 1 Perform changes ? (y/n) y /dev/sda1: 143 files, 9175043/9816944 clusters3.4. Formatting a large vfat file system
When formatting with mkfs.vfat you have to add the option -F 32, so that a 32-bit file system is created. Without this option, a 12-bit or 16-bit file system is created, depending on the partition size, or the formatting process aborts (on an oversized partition). Fat16 only supports file systems up to 2GB, fat32 allows for up to 2TB (terabytes).
> mkfs.vfat -F 32 /dev/sda14. Conclusion
Solving the problems described here cost me a lot of time. But to me, being able to perform my work exclusively with Free Software is a luxury that makes it quite worthwhile. Thus I want to thank all developers of these programs heartily. If the psychological strain of the aforementioned problems grows big enough, there will some volunteers who will approach the remaining problems.
© Torsten Schenk ([email protected])
License:
This text is subject to the GNU Free Documentation License (FDL). Free spreading in modified or unmodified form is allowed. Modifications must be marked unmistakeably and also distributed under the FDL.
Translated by Mag. Christian Paratschek. More of my work can be found on my website.
rsync and vfatIt took several tries, and a lot of poking around, but I finally have my music collection mirrored to a disk I can take around (most notably to work). The hard part was getting rsync to work right. Finally I got it working after finding a helpful article on the topic. To summarize (in less than 3 pages), I used to following 2 commands:
Code:
mount -t vfat -o shortname=mixed,iocharset=utf8 /dev/sda1 /mnt rsync --modify-window=1 -rtv --delete /data/mp3/ /mnt/mp3Now I won't lose them, and maybe they'll help you. The only reason for having problems is that I was using the vfat filesystem under FC3 Linux (where my custom-built audio archive exists) to make a disk I could plug in to my work laptop. Windows filesystems aren't so great, they have problems doing mixed case and being very accurate with times. So this makes it work!================
On 2013-02-02 03:46, jdmcdaniel3 wrote:
>
> I found this example, perhaps it might help?
>
>> RSYNC AND
>> VFAT (\"HTTP://WWW.KYLEV.COM/2005/03/29/RSYNC-AND-VFAT/\")
>> It took several tries, and a lot of poking around, but I finally have
>> my music collection mirrored to a disk I can take around (most notably
>> to work). The hard part was getting rsync to work right. Finally I got
>> it working after 'finding a helpful article on the topic'
>> (http://www.osnews.com/story.php?news_id=9681&page=1). To summarize (in
>> less than 3 pages), I used to following 2 commands:
>>
>>>
> Code:
> --------------------
> > > mount -t vfat -o shortname=mixed,iocharset=utf8 /dev/sda1 /mnt
> >
> > rsync --modify-window=1 -rtv --delete /data/mp3/ /mnt/mp3
> ----------------------modify-window
When comparing two timestamps, rsync treats the timestamps as being equal if they differ by no more than the modify-window value.This is normally 0 (for an exact match), but you may find it useful to set this to a larger value in some situations. In particular, when transferring to or from an MS Windows
FAT filesystem (which represents times with a 2-second resolution), --modify-window=1 is useful (allow- ing times to differ by up to 1 second).Interesting! That will be it.
-r, --recursive recurse into directories
-t, --times preserve modification times
-v, --verbose increase verbosity
I'll try it.....
Ok, first run was slow, I thought it failed. But a second run just after
did run in seconds, so it appears to work. I then umounted the device,
mounted it again (automatic mount under xfce), run the copy again, and
it was away in seconds. So yes, that is the trick, thankyou.
Introduction to rsync¶
rsync is useful when large amounts of data need to be transmitted regularly while not changing too much. This is, for example, often the case when creating backups. Another application concerns staging servers. These are servers that store complete directory trees of Web servers that are regularly mirrored onto a Web server in a DMZ.
28.4.1. Configuration and Operationrsync can be operated in two different modes. It can be used to archive or copy data. To accomplish this, only a remote shell, like ssh, is required on the target system. However, rsync can also be used as a daemon to provide directories to the network.
The basic mode of operation of rsync does not require any special configuration. rsync directly allows mirroring complete directories onto another system. As an example, the following command creates a backup of the home directory of tux on a backup server named sun:
rsync -baz -e ssh /home/tux/ tux@sun:backupThe following command is used to play the directory back:
rsync -az -e ssh tux@sun:backup /home/tux/Up to this point, the handling does not differ much from that of a regular copying tool, like scp.
rsync should be operated in "rsync" mode to make all its features fully available. This is done by starting the rsyncd daemon on one of the systems. Configure it in the file
/etc/rsyncd.conf
. For example, to make the directory/srv/ftp
available with rsync, use the following configuration:gid = nobody uid = nobody read only = true use chroot = no transfer logging = true log format = %h %o %f %l %b log file = /var/log/rsyncd.log [FTP] path = /srv/ftp comment = An ExampleThen start rsyncd with rcrsyncd
start
. rsyncd can also be started automatically during the boot process. Set this up by activating this service in the runlevel editor provided by YaST or by manually entering the command insservrsyncd
. rsyncd can alternatively be started by xinetd. This is, however, only recommended for servers that rarely use rsyncd.The example also creates a log file listing all connections. This file is stored in
/var/log/rsyncd.log
.It is then possible to test the transfer from a client system. Do this with the following command:
rsync -avz sun::FTPThis command lists all files present in the directory
/srv/ftp
of the server. This request is also logged in the log file/var/log/rsyncd.log
. To start an actual transfer, provide a target directory. Use.
for the current directory. For example:rsync -avz sun::FTP .By default, no files are deleted while synchronizing with rsync. If this should be forced, the additional option
--delete
must be stated. To ensure that no newer files are deleted, the option--update
can be used instead. Any conflicts that arise must be resolved manually.
True Blade Systems
Name the script: server-download.sh and put some comments at the start of the script before the actual rsync command:
# server-download.sh # # Install on Notebook PCs # # rsync tool to download server data # from [server name] # to [user name's notebook PC] # # uses ssh key pairs to login automatically # # last edited: [last edit date] by [author] # # download only those files on [server name] in [server target directory] # that are newer than what is already on the notebook PC rsync -avzu [user name]@[server name]:[server directory] [notebook PC directory]Notes:
- You should add the -n option to rsync the first time this is run to do a "dry run" which does not do any actual downloading. By doing this you will be able to see if the script is going to correctly do what you want. Thus, the rsync command becomes:
rsync -avzun [remainder as above]- You can create additional rsync lines in the script file to download data from more than one directory on the server.
Next you need to create an icon on the desktop that will permit the script to be easily run by the user so that they won't have to enter the script name in a command window to do the file download.
Distributing software packages to all of our servers is a tedious task. Currently, a release manager makes a connection to each server and transfers files using ftp. This involves entering passwords multiple times, waiting for transfers to complete, changing directories, and keeping files organized. We developed a shell script, distribute_release ( Listing 1 ), that makes the job easier.
Our script has some advantages over the ftp process:
- Directory trees can be used to organize release modules.
- A distribution network defines how files are transferred from server to server.
- When a release module is ready to be distributed, it is replicated to all of the servers in the network using rsync, which helps minimize network traffic.
- Various authentication methods can be used to avoid entering passwords for each server.
We'll describe the directory structures including creating the distribution network. Then we'll talk about the scripts. Finally, we'll discuss an example.
Directory Structures
Each release module is stored in the directory /var/spool/pkg/release/[module]/. A module directory can be flat, or it can contain subdirectories. Hidden directory trees under the ./release/ directory define the distribution network. Therefore, the names of these directories cannot be used as module names.
Transport protocols supported by distribute_release include nfs, rsh, and ssh. If a release module is distributed using nfs, then the directory /var/spool/pkg/release/.nfs/[module]/ contains symbolic links corresponding to the hosts in the server's distribution network:
/var/spool/pkg/release/.nfs/[module]/[host] -> \ /net/[host]/var/spool/pkg/release/When using nfs, rsync functions like an improved copy command, transferring files between the directories /var/spool/pkg/release/[module]/ and /var/spool/pkg/release/.nfs/[module]/[host]/[module].When using rsh or ssh, the directory structures are similar. With rsh, for example, empty files of the form /var/spool/pkg/release/.rsh/[module]/[host] define the hosts in the distribution network.
The Scripts
Before distribute_release can be called, the directory structures and the distribution network must be created. The script create_distribution ( Listing 2 ) facilitates these tasks.
One argument, the name of a release module, must be passed to create_distribution. When no options are used, the local host functions as a terminal node in the distribution network. In other words, the system may receive updates from another host, but it will not propagate those updates to downstream hosts. Downstream hosts and transport protocols may be specified with the -h and -t options respectively.
When using distribute_release, the name of a release module must be passed to the script. The -q and -v options may be used to control the amount of information displayed to the user. Hosts to be included or excluded from the distribution may be specified using the -i and -e options. The -r option may be used to determine how many times the program will recursively call itself to distribute the module to successive levels in a distribution hierarchy. When using nfs, the recursive calls are made locally. With rsh and ssh, the program calls itself on a remote server.
Distribute_release first gets the argument and any command-line options. Then, for each transport protocol, the script builds a distribution list and executes the appropriate rsync command for each host in the list. If a recursion depth is specified, then another instance of distribute_release is executed in a detached screen session, allowing the parent instance to continue running while the child processes propagate the module to other hosts.
An Example
Our example network (see Figure 1 ) contains five servers -- bambi, pongo, pluto, nemo, and goofy. One of the release modules is named TS1 (located on bambi) and the module is named TS2 (located on pluto). By executing the create_distributions script ( Listing 3 ) on each server, the complete distribution network for both modules is built using the proper create_distribution calls.
Consider the TS1 release module; after the module has been distributed to all of the systems in the network, the directory /var/spool/pkg/release/TS1/ contains the following files and subdirectories:
./README ./v1/TS1-v1.pkg ./v2/TS1-v2.pkg ./beta/TS1-v3.pkgOn bambi, the directory /var/spool/pkg/release/.ssh/TS1/ contains a file named pongo. So, executing "distribute_release TS1" on bambi synchronizes the TS1 module with pongo using ssh as the transport protocol. The TS1 module can be distributed from pongo to all servers in the network using the -r option:distribute_release -r 2 TS1When using ssh, passwords can be avoided by using public/private key pairs with empty passphrases. When using rsh, you can update /etc/hosts.equiv or the appropriate .rhosts file. Obviously, passwords are not an issue with nfs. Deciding which protocol to use depends on security concerns, potential performance issues, and configuration complexity.John Spurgeon is a software developer and systems administrator for Intel's Factory Information Control Systems, IFICS, in Hillsboro, Oregon. He is currently preparing to ride a single-speed bicycle in Race Across America in 2007.
Ed Schaefer is a frequent contributor to Sys Admin. He is a software developer and DBA for Intel's Factory Information Control Systems, IFICS, in Hillsboro, Oregon. Ed also hosts the monthly Shell Corner column on UnixReview.com. He can be reached at: [email protected].
2007-03 | Xah Lee Web
One-Way Copying a Directory to Server with rsync
How to copy local directory to a remote machine, in one shot?
For a one-way copying (or updating), use rsync. The remote machine must have rsync installed.Example:
rsync -z -a -v --rsh="ssh -l mary" ~/web/ [email protected]:~/
This will copy the local dir 〔~/web/〕 to the remote dir 〔~/〕 on the machine with domain name "example.org", using login "mary" thru the ssh protocol. The "-z" is to use compression. The "-a" is for archived mode, basically making the file's meta data (owner/perm/timestamp) same as the local file (when possible) and do recursive (i.e. upload the whole dir). The "-v" is for verbose mode, which basically makes rsync print out which files is being updated. (rsync does not upload files that's already on the destination and identical.)
For example, here's what i use to sync/upload my website on my local machine to my server.
rsync -z -a -v --exclude="*~" --exclude=".DS_Store" --exclude=".bash_history" --exclude="*/_curves_robert_yates/*.png" --exclude="logs/*" --exclude="xlogs/*" --delete --rsh="ssh -l u40651121" ~/web/ [email protected]:~/I used this command daily. The "--exclude" tells it to disregard any files matching that pattern (i.e. if it matches, don't upload it nor delete it on remote server)
Here's a example of syncing Windows and Mac.
rsync -z -r -v --delete --rsh="ssh -l xah" ~/web/ [email protected]:~/web/Note that "-r" is used intstead of "-a". The "-r" means recursive, all sub directories and files. Don't use "-a" because that will sync file owner, group, permissions, andbecause Windows and unix has different permission systems and file systems, so "-a" is usually not what you want. (For a short intro to perm systems on unix and Windows, see: Unix And Windows File Permission Systems)
You can creat a bash alias for the long command e.g. alias l="ls -al";, or use bash's back history by 【Ctrl+r】 then type rsync.
developerWorksMuch like cp, rsync copies files from a source to a destination. Unlike cp, the source and destination of an rsync operation can be local or remote. For instance, the command in Listing 1 copies the directory /tmp/photos and its entire contents verbatim to a home directory.
Listing 1. Copy the contents of a directory verbatim
$ rsync -n -av /tmp/photos ~ building file list ... done photos/ photos/Photo 2.jpg photos/Photo 3.jpg photos/Photo 6.jpg photos/Photo 9.jpg sent 218 bytes received 56 bytes 548.00 bytes/sec total size is 375409 speedup is 1370.11The -v option enables verbose messages. The -a option (where a stands for archive), is a shorthand for -rlptgoD (recurse, copy symbolic links as symbolic links, preserve permissions, preserve file times, preserve group, preserve owner, and preserve devices and special files, respectively). Typically, -a mirrors files; exceptions occur when the destination cannot or does not support the same attributes. For example, copying a directory from UNIX to Windows® does not map perfectly. Some suggestions for unusual cases appear below.rsync has a lot of options. If you worry that your options or source or destination specifications are incorrect, use -n to perform a dry run. A dry run previews what will happen to each file but does not move a single byte. When you are confident of all the settings, drop the -n and proceed.Listing 2 provides an example where -n is invaluable. The command in Listing 1 and the following command yield different results.
Listing 2. Copy the contents of a named directory$ rsync -av /tmp/photos/ ~ ./ Photo 2.jpg Photo 3.jpg Photo 6.jpg Photo 9.jpg sent 210 bytes received 56 bytes 532.00 bytes/sec total size is 375409 speedup is 1411.31What is the difference? The difference is the trailing slash on the source argument. If the source has a trailing slash, the contents of the named directory but not the directory itself are copied. A slash on the end of the destination is immaterial.
And Listing 3 provides an example of moving the same directory to another system.
$ rsync -av /tmp/photos example.com:album created directory album Photo 2.jpg Photo 3.jpg Photo 6.jpg Photo 9.jpg sent 210 bytes received 56 bytes 21.28 bytes/sec total size is 375409 speedup is 1411.31Assuming that you have the same login name on the remote machine, rsync prompts you with a password and, given the proper credential, creates the directory album and copies the images to that directory. By default, rsync uses Secure Shell (SSH) as its transport mechanism; you can reuse your machine aliases and public keys with rsync.
The examples in Listing 2 and Listing 3 demonstrate two of rsync's four modes. The first example was shell mode, also dubbed local mode. The second sample was remote shell mode and is so named because SSH powers the underlying connection and transfers. rsync has two additional modes. List mode acts like ls: It lists the contents of source, as shown in Listing 4.
$ drwxr-xr-x 238 2009/08/22 18:49:50 photos -rw-r--r-- 6148 2008/07/03 01:36:18 photos/.DS_Store -rw-r--r-- 71202 2008/06/18 04:51:36 photos/Photo 2.jpg -rw-r--r-- 69632 2008/06/18 04:51:45 photos/Photo 3.jpg -rw-r--r-- 61046 2008/07/14 00:31:17 photos/Photo 6.jpg -rw-r--r-- 167381 2008/07/14 00:31:56 photos/Photo 9.jpgThe fourth mode is server mode. Here, the rsync daemon runs perennially on a machine, accepting requests to transfer files. A transfer can send files to the daemon or request files from it. Server mode is ideal for creating a central backup server or project repository.To differentiate between remote shell mode and server mode, the latter employs two colons (:) in the source and destination names. Assuming that whatever.example.com exists, the next command copies files from the source to a local destination:
$ rsync -av whatever.example.com::src /tmp
And what exactly is src? It's an rsync module that you define and configure on the daemon's host. A module has a name, a path that contains its files, and some other parameters, such as read only, which protects the contents from modification.To run an rsync daemon, type:
$ sudo rsync --daemon
Running the rsync daemon as the superuser, root, is not strictly necessary, but the practice protects other files on your machine. Running as root, rsync restricts itself to the module's directory hierarchy (its path) using chroot. After a chroot, all other files and directories seem to vanish. If you choose to run the rsync daemon with your own privileges, choose an unused socket and make sure its modules have sufficient permissions to allow download and/or upload. Listing 5 shows a minimal configuration to share some files in your home directory without the need for sudo. The configuration is stored in file rsyncd.conf.
Listing 5. Simple configuration for sharing filesmotd file = /home/strike/rsyncd/rsync.motd_file pid file = /home/strike/rsyncd/rsyncd.pid port = 7777 use chroot = no [demo] path = /home/strike comment = Martin home directory list = no [dropbox] path = /home/strike/public/dropbox comment = A place to leave things for Martin read only = no [pickup] path = /home/strike/public/pickup comment = Get your files here!The file has two segments. The first segment-here, the first four lines-configures the operation of the rsync daemon. (Other options are available, too.) The first line points to a file with a friendly message to identify your server. The second line points to another file to record the process ID of the server. This is a convenience in the event you must manually kill the rsync daemon:kill -INT `cat /home/strike/rsyncd/rsyncd.pid`The two files are in a home directory, because this example does not use superuser privileges to run the software. Similarly, the port chosen for the daemon is above 1000, which users can claim for any application. The fourth line turns off chroot.
The remaining segment is subdivided into small sections, one section per module. Each section, in turn, has a header line and a list of (key-value) pairs to set options for each module. By default, all modules are read only; set read only = no to allow Write operations. Also by default, all modules are listed in the module catalog; set list = no to hide the module.
To start the daemon, run:
$ rsync --daemon --config=rsyncd.confNow, connect to the daemon from another machine, and omit a module name. You should see this:rsync --port=7777 mymachine.example.com:: Hello! Welcome to Martin's rsync server. dropbox A place to leave things for Martin pickup Get your files here!If you do not name a module after the colons (::), the daemon responds with a list of available modules. If you name a module but do not name a specific file or directory within the module, the daemon provides a catalog of the module's contents, as shown in Listing 6.
Listing 6. Catalog output of a module's contents
rsync --port=7777 mymachine.example.com::pickup Hello! Welcome to Martin's rsync server. drwxr-xr-x 4096 2009/08/23 08:56:19 . -rw-r--r-- 0 2009/08/23 08:56:19 article21.html -rw-r--r-- 0 2009/08/23 08:56:19 design.txt -rw-r--r-- 0 2009/08/23 08:56:19 figure1.pngAnd naming a module and a file copies the file locally, as shown in Listing 7.Listing 7. Name a module to copy files locally
rsync --port=7777 mymachine.example.com::pickup/ Hello! Welcome to Martin's rsync server. drwxr-xr-x 4096 2009/08/23 08:56:19 . -rw-r--r-- 0 2009/08/23 08:56:19 article21.html -rw-r--r-- 0 2009/08/23 08:56:19 design.txt -rw-r--r-- 0 2009/08/23 08:56:19 figure1.png
You can also perform an upload by reversing the source and destination, then pointing to the module for writes, as shown in Listing 8.Listing 8. Reverse source and destination directories
$ rsync -v --port=7777 application.js mymachine.example.com::dropbox Hello! Welcome to Martin's rsync server. application.js sent 245 bytes received 38 bytes 113.20 bytes/sec total size is 164 speedup is 0.58That's a quick but thorough review. Next, let's see how you can apply rsync to daily tasks. rsync is especially useful for backups. And because it can synchronize a local file with its remote counterpart-and can do that for an entire file system, too-it's ideal for managing large clusters of machines that must be (at least partially) identical.Performing backups on a frequent basis is a critical but typically ignored chore. Perhaps it's the demands of running a lengthy backup each day or the need to have large external media to store files; never mind the excuse, copying data somewhere for safekeeping should be an everyday practice.
To make the task painless, use rsync and point to a remote server-perhaps one that your service provider hosts and backs up. Each of your UNIX machines can use the same technique, and it's ideal for keeping the data on your laptop safe.
Establish SSH keys and an rsync daemon on the remote machine, and create a backup module to permit writes. Once established, run rsync to create a daily backup that takes hardly any space, as shown in Listing 9.
Listing 9. Create daily backups#!/bin/sh # This script based on work by Michael Jakl (jakl.michael AT gmail DOTCOM) and used # with express permission. HOST=mymachine.example.com SOURCE=$HOME PATHTOBACKUP=home-backup date=`date "+%Y-%m-%dT%H:%M:%S"` rsync -az --link-dest=$PATHTOBACKUP/current $SOURCE $HOST:PATHTOBACKUP/back-$date ssh $HOST "rm $PATHTOBACKUP/current && ln -s back-$date $PATHTOBACKUP/current"Replace HOST with the name of your backup host and SOURCE with the directory you want to save. Change PATHTOBACKUP to the name of your module. (You can also embed the three final lines of the script in a loop, dynamically change SOURCE, and back up a series of separate directories on the same system.) Here's how the backup works:
- To begin, date is set to the current date and time and yields a string like 2009-08-23T12:32:18, which identifies the backup uniquely.
- The rsync command performs the heavy lifting. -az preserves all file information and compresses the transfers. The magic lies in --link-dest=$PATHTOBACKUP/current, which specifies that if a file has not changed, do not copy it to the new backup. Instead, create a hard link from the new backup to the same file in the existing backup. In other words, the new backup only contains files that have changed; the rest are links.
More specifically (and expanding all variables), mymachine.example.com::home-backup/current is the current archive. The new archive for /home/strike is targeted to mymachine.example.com::home-backup/back-2009-08-23T12:32:18. If a file in /home/strike has not changed, the file is represented in the new backup by a hard link to the current archive. Otherwise, the new file is copied to the new archive.
If you touch but a few file or perhaps a handful of directories each day, the additional space required for what is effectively a full backup is paltry. moreover, because each daily backup (except the very first) is so small, you can keep a long history of the files on hand.
- The last step is to alter the organization of the backups on the remote machine to promote the newly created archive to be the current archive, thereby minimizing the differences t command removes the current archive (which is merely a symbolic link) and recreates the same symbolic link pointing to the new archive.
Keep in mind that a hard link to a hard link points to the same file. Hard links are very cheap to create and maintain, so a full backup is simulated using only an incremental scheme.
Sys Admin v16, i02
Distributing software packages to all of our servers is a tedious task. Currently, a release manager makes a connection to each server and transfers files using ftp. This involves entering passwords multiple times, waiting for transfers to complete, changing directories, and keeping files organized. We developed a shell script, distribute_release ( Listing 1 ), that makes the job easier.
Our script has some advantages over the ftp process:
- Directory trees can be used to organize release modules.
- A distribution network defines how files are transferred from server to server.
- When a release module is ready to be distributed, it is replicated to all of the servers in the network using rsync, which helps minimize network traffic.
- Various authentication methods can be used to avoid entering passwords for each server.
We'll describe the directory structures including creating the distribution network. Then we'll talk about the scripts. Finally, we'll discuss an example.
Directory Structures
Each release module is stored in the directory /var/spool/pkg/release/[module]/. A module directory can be flat, or it can contain subdirectories. Hidden directory trees under the ./release/ directory define the distribution network. Therefore, the names of these directories cannot be used as module names.
Transport protocols supported by distribute_release include nfs, rsh, and ssh. If a release module is distributed using nfs, then the directory /var/spool/pkg/release/.nfs/[module]/ contains symbolic links corresponding to the hosts in the server's distribution network:
/var/spool/pkg/release/.nfs/[module]/[host] -> \ /net/[host]/var/spool/pkg/release/When using nfs, rsync functions like an improved copy command, transferring files between the directories /var/spool/pkg/release/[module]/ and /var/spool/pkg/release/.nfs/[module]/[host]/[module].When using rsh or ssh, the directory structures are similar. With rsh, for example, empty files of the form /var/spool/pkg/release/.rsh/[module]/[host] define the hosts in the distribution network.
The Scripts
Before distribute_release can be called, the directory structures and the distribution network must be created. The script create_distribution ( Listing 2 ) facilitates these tasks.
One argument, the name of a release module, must be passed to create_distribution. When no options are used, the local host functions as a terminal node in the distribution network. In other words, the system may receive updates from another host, but it will not propagate those updates to downstream hosts. Downstream hosts and transport protocols may be specified with the -h and -t options respectively.
When using distribute_release, the name of a release module must be passed to the script. The -q and -v options may be used to control the amount of information displayed to the user. Hosts to be included or excluded from the distribution may be specified using the -i and -e options. The -r option may be used to determine how many times the program will recursively call itself to distribute the module to successive levels in a distribution hierarchy. When using nfs, the recursive calls are made locally. With rsh and ssh, the program calls itself on a remote server.
Distribute_release first gets the argument and any command-line options. Then, for each transport protocol, the script builds a distribution list and executes the appropriate rsync command for each host in the list. If a recursion depth is specified, then another instance of distribute_release is executed in a detached screen session, allowing the parent instance to continue running while the child processes propagate the module to other hosts.
An Example
Our example network (see Figure 1 ) contains five servers -- bambi, pongo, pluto, nemo, and goofy. One of the release modules is named TS1 (located on bambi) and the module is named TS2 (located on pluto). By executing the create_distributions script ( Listing 3 ) on each server, the complete distribution network for both modules is built using the proper create_distribution calls.
Consider the TS1 release module; after the module has been distributed to all of the systems in the network, the directory /var/spool/pkg/release/TS1/ contains the following files and subdirectories:
./README ./v1/TS1-v1.pkg ./v2/TS1-v2.pkg ./beta/TS1-v3.pkgOn bambi, the directory /var/spool/pkg/release/.ssh/TS1/ contains a file named pongo. So, executing "distribute_release TS1" on bambi synchronizes the TS1 module with pongo using ssh as the transport protocol. The TS1 module can be distributed from pongo to all servers in the network using the -r option:distribute_release -r 2 TS1When using ssh, passwords can be avoided by using public/private key pairs with empty passphrases. When using rsh, you can update /etc/hosts.equiv or the appropriate .rhosts file. Obviously, passwords are not an issue with nfs. Deciding which protocol to use depends on security concerns, potential performance issues, and configuration complexity.John Spurgeon is a software developer and systems administrator for Intel's Factory Information Control Systems, IFICS, in Hillsboro, Oregon. He is currently preparing to ride a single-speed bicycle in Race Across America in 2007.
Ed Schaefer is a frequent contributor to Sys Admin. He is a software developer and DBA for Intel's Factory Information Control Systems, IFICS, in Hillsboro, Oregon. Ed also hosts the monthly Shell Corner column on UnixReview.com. He can be reached at: [email protected].
Apr 14, 2009 | developerworks
If you work with both a laptop and a desktop computer, you know you have to synchronize the machines to keep them up to date. In addition, you probably want to run the synchronization not only at your home but also from a remote site; in my case, whenever I travel with my laptop, I make sure that whatever I do on it gets backed up to my desktop computer. (Losing your laptop and thereby losing all your work isn't nice at all!) Many solutions to this problem exist: This article introduces one such tool-rsync-and mentions several related tools, all of which provide easy synchronization procedures.
Rsync::Config is a module who can be used to create rsync configuration files. A configuration file (from Rsync::Config point of view) is made by atoms and modules with atoms. A atom is the smallest piece from the configuration file. This module inherits from Rsync::Config::Module .
What is Rsync?Rsync is a very useful alternative to rcp written by Andrew Tridgell and Paul Mackerras. This tool lets you copy files and directories between a local host and a remote host (source and destination can also be local if you need.) The main advantage of using Rsync instead of rcp, is that rsync can use SSH as a secure channel, send/receive only the bytes inside files that changed since the last replication, and remove files on the destination host if those files were deleted on the source host to keep both hosts in sync. In addition to using rcp/ssh for transport, you can also use Rsync itself, in which case you will connect to TCP port 873.
Whether you rely on SSH or use Rsync explicitely, Rsync still needs to be installed on both hosts. A Win32 port is available if you need, so you can have either one of the host or both be NT hosts. Rsync's web site has some good infos and links. There is also an HOWTO.
Configuring /etc/rsyncd.conf
Being co-written by Andrew Tridgell, author of Samba, it's no surprise that Rsync's configuration file looks just like Samba (and Windows' :-), and that Rsync lets you create projects that look like shared directories under Samba. Accessing remote resources through this indirect channel offers more independence, as it lets you move files on the source Rsync server without changing anything on the destination host.
Any parameters listed before any [module] section are global, default parameters.
Each module is a symbolic name for a directory on the local host. Here's an example:
#/etc/rsyncd.conf
secrets file = /etc/rsyncd.secrets
motd file = /etc/rsyncd.motd #Below are actually defaults, but to be on the safe side...
read only = yes
list = yes
uid = nobody
gid = nobody[out]
comment = Great stuff from remote.acme.com
path = /home/rsync/out[confidential]
comment = For your eyes only
path = /home/rsync/secret-out
auth users = joe,jane
hosts allow = *.acme.com
hosts deny = *
list = false
Note: Rsync will not grant access to a protected share if the password file (/etc/rsyncd.secrets, here) is world-readable.
Running RSYNCd
Per the manual page:
The rsync daemon is launched by specifying the --daemon option to rsync. You can launch it either via inetd or as a stand-alone daemon. When run via inetd you should add a line like this to /etc/services:
rsync 873/tcp... and a single line something like this to /etc/inetd.conf:
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemonYou will then need to send inetd a HUP signal to tell it to reread its config file. Note that you should not send the rsync server a HUP signal to force it to reread the /etc/rsyncd.conf. The file is re-read on each client connection.
Per the HOWTO:
The rsync daemon is robust, so it is safe to launch it as a stand-alone server. The code that loops waiting for requests is only a few lines long then it forks a new copy. If the forked process dies then it doesn't harm the main daemon.
The big advantage of running as a daemon will come when the planned directory cache system is implemented. The caching system will probably only be enable when running as a daemon. For this reason, busy sites is recommended to run rsync as a daemon. Also, the daemon mode makes it easy to limit the number of concurrent connections.Since it's not included in the 2.4.3 RPM package, here's the init script to be copied as /etc/rc.d/init.d/rsyncd with symlinks to /etc/rc.d/rc3.d:
#!/bin/sh
# Rsyncd This shell script takes care of starting and stopping the rsync daemon
# description: Rsync is an awesome replication tool.# Source function library.
. /etc/rc.d/init.d/functions[ -f /usr/bin/rsync ] || exit 0
case "$1" in
start)
action "Starting rsyncd: " /usr/bin/rsync --daemon
;;
stop)
action "Stopping rsyncd: " killall rsync
;;
*)
echo "Usage: rsyncd {start|stop}"
exit 1
esac
exit 0
Here's an example under Linux on how to set up a replication through SSH:
rsync -avz -e ssh [email protected]:/home/rsync/out/ /home/rsync/from_remoteAn important thing here, is that the presence or absence of a trailing "/" in the source directory determines whether the directory itself is copied, or simply the contents of this source directory.
In other words, the above means that the local host must have a directory available (here, /home/rsync/from_remote to receive the contents of /home/rsync/out sitting on the remote host, otherwise Rsync will happily download all files into the path given as destination without asking for confirmation, and you could end up with a big mess.
On the other hand, rsync -avz -e ssh [email protected]:/home/rsync/out /home/rsync/from_remote means that the an "out" sub-directory is first created under /home/rsync/from_remote on the destination host, and will be populated with the contents of the remote directory ./out. In this case, files will be save on the local host in /home/rsync/from_remote/out, so the former commands looks like a better choice.
Here's how to replicate an Rsync share from a remote host:
rsync -avz [email protected]::out /home/rsync/inNotice that we do not use a path to give the source resource, but instead just a name ("out"), and that we use :: to separate the server's name and the resource it offers. In the Rsync configuration that we'll see just below, this is shown as a [out] section. This way, admins on remote.acme.com can move files on their server; As long as they remember to update the actual path in the [out] section (eg. PATH=/home/rsync/out to PATH=/home/outgoing), remote Rsync users are not affected.
An Rsync server displays the list of available anonymous shares through rsync remote.acme.com::. Note the ::. For added security, it is possible to prompt for a password when listing private shares, so that only authorized remote users know about the Rsync shares available from your server.
Any NT version?
The NT port only requires the latest and greatest RSYNCx.EXE and Cygnus' CYGWIN1.DLL. The easiest is to keep both in the same directory, but the DLL can be located in any directory found in your PATH environment variable.
Robert Scholte's excellent tutorial on using the NT port of Rsync can be found here.
Instructions on how to install rsync as an NT service are here.
Here's an example based on the sample above:
C:\Rsync>rsync243 -avz [email protected]::confidential ./confidential
Password:
receiving file list ... done
./
./
wrote 109 bytes read 123 bytes 66.29 bytes/sec
total size is 0 speedup is 0.00Useful command-line switches
-v, --verbose increase verbosity
-q, --quiet decrease verbosity
-c, --checksum always checksum
-a, --archive archive mode. It is a quick way of saying you want recursion and want to preserve everything.
-r, --recursive recurse into directories
-R, --relative use relative path names
-u, --update update only (don't overwrite newer files)
-t, --times preserve times
-n, --dry-run show what would have been transferred
-W, --whole-file copy whole files, no incremental checks
-I, --ignore-times Normally rsync will skip any files that are already the same length and have the same time-stamp. This option turns off this behavior.
--existing only update files that already exist
--delete delete files that don't exist on the sending side
--delete-after delete after transferring, not before
--force force deletion of directories even if not empty
-c, --checksum always checksum
--size-only only use file size when determining if a file should be transferred
--progress show progress during transfer
-z, --compress compress file data
--exclude=PATTERN exclude files matching PATTERN
--daemon run as a rsync daemon
--password-file=FILE get password from FILEResources
- http://members.ozemail.com.au/~msteveb/rsync/
- http://www.ccp14.ac.uk/ccp14admin/rsync/
- http://optics.ph.unimelb.edu.au/help/rsync/
- http://sunsite.dk/SunSITE/guides/rsync/
- http://sunsite.dk/SunSITE/guides/rsync/rsync-mirroring02.html
- ftp://ftp.minn.net/usr/mmchen/
- http://www.uwsg.indiana.edu/security/rsync.html
- http://filewatcher.org/sec/rsync.html
- http://freeos.com/printer.php?entryID=4042
- A Tutorial on Using rsync
- Rsnapshot (higher-level backup utility based on rsync, improved ease-of-use, allows you to keep multiple snapshots in time of your data, local or remote)
In the last two months I've been traveling a lot. During the same period my main desktop computer went belly up. I would have been in trouble without rsync at my disposal -- but thanks to my regular use of this utility, my data (or most of it, anyway) was already copied offsite just waiting to be used. It takes a little time to become familiar with rsync, but once you are, you should be able to handle most of your backup needs with just a short script.What's so great about rsync? First, it's designed to speed up file transfer by copying the differences between two files rather than copying an entire file every time. For example, when I'm writing this article, I can make a copy via rsync now and then another copy later. The second (and third, fourth, fifth, etc.) time I copy the file, rsync copies the differences only. That takes far less time, which is especially important when you're doing something like copying a whole directory offsite for daily backup. The first time may take a long time, but the next will only take a few minutes (assuming you don't change that much in the directory on a daily basis).
Another benefit is that rsync can preserve permissions and ownership information, copy symbolic links, and generally is designed to intelligently handle your files.
You shouldn't need to do anything to get rsync installed -- it should be available on almost any Linux distribution by default. If it's not, you should be able to install it from your distribution's package repositories. You will need rsync on both machines if you're copying data to a remote system, of course.
When you're using it to copy files to another host, the rsync utility typically works over a remote shell, such as Secure Shell (SSH) or Remote Shell (RSH). We'll work with SSH in the following examples, because RSH is not secure and you probably don't want to be copying your data using it. It's also possible to connect to a remote host using an rsync daemon, but since SSH is practically ubiquitous these days, there's no need to bother.
2006-09-20I hate making backups by hand. It costs a lot of time and usually I have far better things to do. Long ago (in the Windows 98 era) I made backups to CD only before I needed to reďnstall the OS, which was about once every 18 months, and my code projects maybe twice as often. A lot has changed since those dark times though. My single PC expanded into a network with multiple desktops and a server, I installed a mix of Debian an Ubuntu and ditched Windows, and I have a nice broadband link - just as my friends do. Finally a lazy git like me can set up a decent backup system that takes care of itself, leaving me time to do the "better" things (such as writing about it :-)
There are already quite a few tutorials on the internet explaining various ways to backup your Linux system using built-in commands and a script of some sorts, but I could not find one that suited me so I decided to write another one - one that takes care of backing up my entire network.
rsnapshot is a filesystem snapshot utility based on rsync. It makes it easy to make periodic snapshots of local machines, and remote machines over ssh. It uses hard links whenever possible, to greatly reduce the disk space required.
Warsync (Wrapper Around Rsync) is a server replication system mainly used to sync servers in LVS clusters. It is based on rsync over ssh and has native support for Debian package synchronization.
November 20, 1999
Rsync is a wonderful little utility that's amazingly easy to set up on your machines. Rather than have a scripted FTP session, or some other form of file transfer script -- rsync copies only the diffs of files that have actually changed, compressed and through ssh if you want to for security. That's a mouthful -- but what it means is:
- Diffs - Only actual changed pieces of files are transferred, rather than the whole file. This makes updates faster, especially over slower links like modems. FTP would transfer the entire file, even if only one byte changed.
- Compression - The tiny pieces of diffs are then compressed on the fly, further saving you file transfer time and reducing the load on the network.
- Secure Shell - The security concious of you out there would like this, and you should all be using it. The stream from rsync is passed through the ssh protocol to encrypt your session instead of rsh, which is also an option (and required if you don't use ssh - enable it in your /etc/inet.d and restart your inet daemon if you disabled it for security).
Rsync is rather versatile as a backup/mirroring tool, offering many features above and beyond the above. I personally use it to synchronize Website trees from staging to production servers and to backup key areas of the filesystems both automatically through cron and by a CGI script. Here are some other key features of rsync:
- Support for copying links, devices, owners, groups and permissions
- Exclude and exclude-from options similar to GNU tar
- A CVS exclude mode for ignoring the same files that CVS would ignore
- Does not require root privileges
- Pipelining of file transfers to minimize latency costs
- Support for anonymous or authenticated rsync servers (ideal for mirroring)
How does it work?
You must set up one machine or another of a pair to be an "rsync server" by running rsync in a daemon mode ("rsync --daemon" at the commandline) and setting up a short, easy configuration file (/etc/rsyncd.conf). Below I'll detail a sample configuration file. The options are readily understood, few in number -- yet quite powerful.
Any number of machines with rsync installed may then synchronize to and/or from the machine running the rsync daemon. You can use this to make backups, mirror filesystems, distribute files or any number of similar operations. Through the use of the "rsync algorithm" which transfers only the diffs between files (similar to a patch file) and then compressing them -- you are left with a very efficient system.
For those of you new to secure shell ("ssh" for short), you should be using it! There's a very useful and quite thourough Getting Started with SSH document available. You may also want to visit the Secure Shell Web Site. Or, just hit the Master FTP Site in Finland and snag it for yourself. It provides a secure, encrypted "pipe" for your network traffic. You should be using it instead of telnet, rsh or rlogin and use the replacement "scp" command instead of "rcp." Setting up a Server
You must set up a configuration file on the machine meant to be the server and run the rsync binary in daemon mode. Even your rsync client machines can run rsync in daemon mode for two-way transfers. You can do this automatically for each connection via the inet daemon or at the commandline in standalone mode to leave it running in the background for often repeated rsyncs. I personally use it in standalone mode, like Apache. I have a crontab entry that synchronizes a Web site directory hourly. Plus there is a CGI script that folks fire off frequently during the day for immediate updating of content. This is a lot of rsync calls! If you start off the rsync daemon through your inet daemon, then you incur much more overhead with each rsync call. You basically restart the rsync daemon for every connection your server machine gets! It's the same reasoning as starting Apache in standalone mode rather than through the inet daemon. It's quicker and more efficient to start rsync in standalone mode if you anticipate a lot of rsync traffic. Otherwise, for the occasional transfer follow the procedure to fire off rsync via the inet daemon. This way the rsync daemon, as small as it is, doesn't sit in memory if you only use it once a day or whatever. Your call.
Below is a sample rsync configuration file. It is placed in your /etc directory as rsyncd.conf.
motd file = /etc/rsyncd.motd log file = /var/log/rsyncd.log pid file = /var/run/rsyncd.pid lock file = /var/run/rsync.lock [simple_path_name] path = /rsync_files_here comment = My Very Own Rsync Server uid = nobody gid = nobody read only = no list = yes auth users = username secrets file = /etc/rsyncd.scrtVarious options that you would modify right from the start are the areas in italics in the sample above. I'll start at the top, line by line, and go through what you should pay attention to. What the sample above does is setup a single "path" for rsync transfers to that machine.
Starting at the top are four lines specifying files and their paths for rsync running in daemon mode. The first is a "message of the day" (motd) file like you would use for an FTP server. This file's contents get displayed when clients connect to this machine. Use it as a welcome, warning or simply identification. The next line specifies a log file to send diagnostic and norml run-time messages to. The PID file contains the "process ID" (PID) number of the running rsync daemon. A lock file is used to ensure that things run smoothly. These options are global to the rsync daemon.
The next block of lines is specific to a "path" that rsync uses. The options contained therein have effect only within the block (they're local, not global options). Start with the "path" name. It's somewhat confusing that rsync uses the term "path" -- as it's not necessarily a full pathname. It serves as an "rsync area nickname" of sorts. It's a short, easy to remember (and type!) name that you assign to a try filesystem path with all the options you specify. Here are the things you need to set up first and foremost:
- path - this is the actual filesystem path to where the files are rsync'ed from and/or to.
- comment - a short, descriptive explanation of what and where the path points to for listings.
- auth users - you really should put this in to restrict access to only a pre-defined user that you specify in the following secrets file - does not have to be a valid system user.
- secrets file - the file containing plaintext key/value pairs of usernames and passwords.
One thing you should seriously consider is the "hosts allow" and "hosts deny" options for your path. Enter the IPs or hostnames that you wish to specifically allow or deny! If you don't do this, or at least use the "auth users" option, then basically that area of your filesystem is wide open to the world by anyone using rsync! Something I seriously think you should avoid...
Check the rsyncd.conf man page with "man rsyncd.conf" and read it very carefully where security options are concerned. You don't want just anyone to come in and rsync up an empty directory with the "--delete" option, now do you?
The other options are all explained in the man page for rsyncd.conf. Basically, the above options specify that the files are chmod'ed to uid/gid, the filesystem path is read/write and that the rsync path shows up in rsync listings. The rsync secrets file I keep in /etc/ along with the configuration and motd files, and I prefix them with "rsyncd." to keep them together.
Using Rsync Itself
Now on to actually using, or initiating an rsync transfer with rsync itself. It's the same binary as the daemon, just without the "--daemon" flag. It's simplicity is a virtue. I'll start with a commandline that I use in a script to synchronize a Web tree below.
rsync --verbose --progress --stats --compress --rsh=/usr/local/bin/ssh \ --recursive --times --perms --links --delete \ --exclude "*bak" --exclude "*~" \ /www/* webserver:simple_path_nameLet's go through it one line at a time. The first line calls rsync itself and specifies the options "verbose," progress" and "stats" so that you can see what's going on this first time around. The "compress" and "rsh" options specify that you want your stream compressed and to send it through ssh (remember from above?) for security's sake.
The next line specifies how rsync itself operates on your files. You're telling rsync here to go through your source pathname recursively with "recursive" and to preserve the file timestamps and permissions with "times" and "perms." Copy symbolic links with "links" and delete things from the remote rsync server that are also deleted locally with "delete."
Now we have a line where there's quite a bit of power and flexibility. You can specify GNU tar-like include and exclude patterns here. In this example, I'm telling rsync to ignore some backup files that are common in this Web tree ("*.bak" and "*~" files). You can put whatever you want to match here, suited to your specific needs. You can leave this line out and rsync will copy all your files as they are locally to the remote machine. Depends on what you want.
Finally, the line that specifies the source pathname, the remote rsync machine and rsync "path." The first part "/www/*" specifies where on my local filesytem I want rsync to grab the files from for transmission to the remote rsync server. The next word, "webserver" should be the DNS name or IP address of your rsync server. It can be "w.x.y.z" or "rsync.mydomain.com" or even just "webserver" if you have a nickname defined in your /etc/hosts file, as I do here. The single colon specifies that you want the whole mess sent through your ssh tunnel, as opposed to the regular rsh tunnel. This is an important point to pay attention to! If you use two colons, then despite the specification of ssh on the commandline previously, you'll still go through rsh. Ooops. The last "www" in that line is the rsync "path" that you set up on the server as in the sample above.
Yes, that's it! If you run the above command on your local rsync client, then you will transfer the entire "/www/*" tree to the remote "webserver" machine except backup files, preserving file timestamps and permissions -- compressed and secure -- with visual feedback on what's happening.
Note that in the above example, I used GNU style long options so that you can see what the commandline is all about. You can also use abbreviations, single letters -- to do the same thing. Try running rsync with the "--help" option alone and you can see what syntax and options are available.
Rsync on the NetYou can find the rsync distribution at the rsync home page or hit the rsync FTP site.
There are also various pages of information on rsync out there, many of which reside on the rsync Web site. Below are three documents that you should also read thouroughly before using rsync so that you understand it well:
Google matched content |
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: May 06, 2020