|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
Jan 22, 2017 | nac.uci.edu
v1.67 (Mac Beta) Table of Contents
1. Download
If you already know you want it, get it here: parsync+utils.tar.gz (contains parsync plus the kdirstat-cache-writer , stats , and scut utilities below) Extract it into a dir on your $PATH and after verifying the other dependencies below, give it a shot.
While parsync is developed for and test on Linux, the latest version of parsync has been modified to (mostly) work on the Mac (tested on OSX 10.9.5). A number of the Linux-specific dependencies have been removed and there are a number of Mac-specific work arounds.
Thanks to Phil Reese < [email protected] > for the code mods needed to get it started. It's the same package and instructions for both platforms.
2. Dependencies
parsync requires the following utilities to work:
non-default Perl utility: URI::Escape qw(uri_escape)
- stats - self-writ Perl utility for providing descriptive stats on STDIN
- scut - self-writ Perl utility like cut that allows regex split tokens
- kdirstat-cache-writer (included in the tarball mentioned above), requires a
sudo yum install perl-URI # CentOS-like sudo apt-get install liburi-perl # Debian-likeparsync needs to be installed only on the SOURCE end of the transfer and uses whatever rsync is available on the TARGET. It uses a number of Linux- specific utilities so if you're transferring between Linux and a FreeBSD host, install parsync on the Linux side. In fact, as currently written, it will only PUSH data to remote targets ; it will not pull data as rsync itself can do. This will probably in the near future. 3. Overview rsync is a fabulous data mover. Possibly more bytes have been moved (or have been prevented from being moved) by rsync than by any other application. So what's not to love? For transferring large, deep file trees, rsync will pause while it generates lists of files to process. Since Version 3, it does this pretty fast, but on sluggish filesystems, it can take hours or even days before it will start to actually exchange rsync data. Second, due to various bottlenecks, rsync will tend to use less than the available bandwidth on high speed networks. Starting multiple instances of rsync can improve this significantly. However, on such transfers, it is also easy to overload the available bandwidth, so it would be nice to both limit the bandwidth used if necessary and also to limit the load on the system. parsync tries to satisfy all these conditions and more by:
- using the kdir-cache-writer utility from the beautiful kdirstat directory browser which can produce lists of files very rapidly
- allowing re-use of the cache files so generated.
- doing crude loadbalancing of the number of active rsyncs, suspending and un-suspending the processes as necessary.
- using rsync's own bandwidth limiter (--bwlimit) to throttle the total bandwidth.
- using rsync's own vast option selection is available as a pass-thru (tho limited to those compatible with the --files-from option).
Beyond this introduction, parsync's internal help is about all you'll need to figure out how to use it; below is what you'll see when you type parsync -h . There are still edge cases where parsync will fail or behave oddly, especially with small data transfers, so I'd be happy to hear of such misbehavior or suggestions to improve it. Download the complete tarball of parsync, plus the required utilities here: parsync+utils.tar.gz Unpack it, move the contents to a dir on your $PATH , chmod it executable, and try it out.
Only use for LARGE data transfers The main use case for parsync is really only very large data transfers thru fairly fast network connections (>1Gb/s). Below this speed, a single rsync can saturate the connection, so there's little reason to use parsync and in fact the overhead of testing the existence of and starting more rsyncs tends to worsen its performance on small transfers to slightly less than rsync alone. parsync --helpor justparsyncBelow is what you should see:4. parsync help
parsync version 1.67 (Mac compatibility beta) Jan 22, 2017 by Harry Mangalam <[email protected]> || <[email protected]> parsync is a Perl script that wraps Andrew Tridgell's miraculous 'rsync' to provide some load balancing and parallel operation across network connections to increase the amount of bandwidth it can use. parsync is primarily tested on Linux, but (mostly) works on MaccOSX as well. parsync needs to be installed only on the SOURCE end of the transfer and only works in local SOURCE -> remote TARGET mode (it won't allow remote local SOURCE <- remote TARGET, emitting an error and exiting if attempted). It uses whatever rsync is available on the TARGET. It uses a number of Linux-specific utilities so if you're transferring between Linux and a FreeBSD host, install parsync on the Linux side. The only native rsync option that parsync uses is '-a' (archive) & '-s' (respect bizarro characters in filenames). If you need more, then it's up to you to provide them via '--rsyncopts'. parsync checks to see if the current system load is too heavy and tries to throttle the rsyncs during the run by monitoring and suspending / continuing them as needed. It uses the very efficient (also Perl-based) kdirstat-cache-writer from kdirstat to generate lists of files which are summed and then crudely divided into NP jobs by size. It appropriates rsync's bandwidth throttle mechanism, using '--maxbw' as a passthru to rsync's 'bwlimit' option, but divides it by NP so as to keep the total bw the same as the stated limit. It monitors and shows network bandwidth, but can't change the bw allocation mid-job. It can only suspend rsyncs until the load decreases below the cutoff. If you suspend parsync (^Z), all rsync children will suspend as well, regardless of current state. Unless changed by '--interface', it tried to figure out how to set the interface to monitor. The transfer will use whatever interface routing provides, normally set by the name of the target. It can also be used for non-host-based transfers (between mounted filesystems) but the network bandwidth continues to be (usually pointlessly) shown. [[NB: Between mounted filesystems, parsync sometimes works very poorly for reasons still mysterious. In such cases (monitor with 'ifstat'), use 'cp' or 'tnc' (https://goo.gl/5FiSxR) for the initial data movement and a single rsync to finalize. I believe the multiple rsync chatter is interfering with the transfer.]] It only works on dirs and files that originate from the current dir (or specified via "--rootdir"). You cannot include dirs and files from discontinuous or higher-level dirs. ** the ~/.parsync files ** The ~/.parsync dir contains the cache (*.gz), the chunk files (kds*), and the time-stamped log files. The cache files can be re-used with '--reusecache' (which will re-use ALL the cache and chunk files. The log files are datestamped and are NOT overwritten. ** Odd characters in names ** parsync will sometimes refuse to transfer some oddly named files, altho recent versions of rsync allow the '-s' flag (now a parsync default) which tries to respect names with spaces and properly escaped shell characters. Filenames with embedded newlines, DOS EOLs, and other odd chars will be recorded in the log files in the ~/.parsync dir. ** Because of the crude way that files are chunked, NP may be adjusted slightly to match the file chunks. ie '--NP 8' -> '--NP 7'. If so, a warning will be issued and the rest of the transfer will be automatically adjusted. OPTIONS ======= [i] = integer number [f] = floating point number [s] = "quoted string" ( ) = the default if any --NP [i] (sqrt(#CPUs)) ............... number of rsync processes to start optimal NP depends on many vars. Try the default and incr as needed --startdir [s] (`pwd`) .. the directory it works relative to. If you omit it, the default is the CURRENT dir. You DO have to specify target dirs. See the examples below. --maxbw [i] (unlimited) .......... in KB/s max bandwidth to use (--bwlimit passthru to rsync). maxbw is the total BW to be used, NOT per rsync. --maxload [f] (NP+2) ........ max total system load - if sysload > maxload, sleeps an rsync proc for 10s --checkperiod [i] (5) .......... sets the period in seconds between updates --rsyncopts [s] ... options passed to rsync as a quoted string (CAREFUL!) this opt triggers a pause before executing to verify the command. --interface [s] ............. network interface to /monitor/, not nec use. default: `/sbin/route -n | grep "^0.0.0.0" | rev | cut -d' ' -f1 | rev` above works on most simple hosts, but complex routes will confuse it. --reusecache .......... don't re-read the dirs; re-use the existing caches --email [s] ..................... email address to send completion message (requires working mail system on host) --barefiles ..... set to allow rsync of individual files, as oppo to dirs --nowait ................ for scripting, sleep for a few s instead of wait --version ................................. dumps version string and exits --help ......................................................... this help Examples ======== -- Good example 1 -- % parsync --maxload=5.5 --NP=4 --startdir='/home/hjm' dir1 dir2 dir3 hjm@remotehost:~/backups where = "--startdir='/home/hjm'" sets the working dir of this operation to '/home/hjm' and dir1 dir2 dir3 are subdirs from '/home/hjm' = the target "hjm@remotehost:~/backups" is the same target rsync would use = "--NP=4" forks 4 instances of rsync = -"-maxload=5.5" will start suspending rsync instances when the 5m system load gets to 5.5 and then unsuspending them when it goes below it. It uses 4 instances to rsync dir1 dir2 dir3 to hjm@remotehost:~/backups -- Good example 2 -- % parsync --rsyncopts="--ignore-existing" --reusecache --NP=3 --barefiles *.txt /mount/backups/txt where = "--rsyncopts='--ignore-existing'" is an option passed thru to rsync telling it not to disturb any existing files in the target directory. = "--reusecache" indicates that the filecache shouldn't be re-generated, uses the previous filecache in ~/.parsync = "--NP=3" for 3 copies of rsync (with no "--maxload", the default is 4) = "--barefiles" indicates that it's OK to transfer barefiles instead of recursing thru dirs. = "/mount/backups/txt" is the target - a local disk mount instead of a network host. It uses 3 instances to rsync *.txt from the current dir to "/mount/backups/txt". -- Error Example 1 -- % pwd /home/hjm # executing parsync from here % parsync --NP4 --compress /usr/local /media/backupdisk why this is an error: = '--NP4' is not an option (parsync will say "Unknown option: np4") It should be '--NP=4' = if you were trying to rsync '/usr/local' to '/media/backupdisk', it will fail since there is no /home/hjm/usr/local dir to use as a source. This will be shown in the log files in ~/.parsync/rsync-logfile-<datestamp>_# as a spew of "No such file or directory (2)" errors = the '--compress' is a native rsync option, not a native parsync option. You have to pass it to rsync with "--rsyncopts='--compress'" The correct version of the above command is: % parsync --NP=4 --rsyncopts='--compress' --startdir=/usr local /media/backupdisk -- Error Example 2 -- % parsync --start-dir /home/hjm mooslocal [email protected]:/usr/local why this is an error: = this command is trying to PULL data from a remote SOURCE to a local TARGET. parsync doesn't support that kind of operation yet. The correct version of the above command is: # ssh to hjm@moo, install parsync, then: % parsync --startdir=/usr local hjm@remote:/home/hjm/mooslocal
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Jan 22, 2017 | nac.uci.edu
v1.67 (Mac Beta) Table of Contents
1. Download
If you already know you want it, get it here: parsync+utils.tar.gz (contains parsync plus the kdirstat-cache-writer , stats , and scut utilities below) Extract it into a dir on your $PATH and after verifying the other dependencies below, give it a shot.
While parsync is developed for and test on Linux, the latest version of parsync has been modified to (mostly) work on the Mac (tested on OSX 10.9.5). A number of the Linux-specific dependencies have been removed and there are a number of Mac-specific work arounds.
Thanks to Phil Reese < [email protected] > for the code mods needed to get it started. It's the same package and instructions for both platforms.
2. Dependencies
parsync requires the following utilities to work:
non-default Perl utility: URI::Escape qw(uri_escape)
- stats - self-writ Perl utility for providing descriptive stats on STDIN
- scut - self-writ Perl utility like cut that allows regex split tokens
- kdirstat-cache-writer (included in the tarball mentioned above), requires a
sudo yum install perl-URI # CentOS-like sudo apt-get install liburi-perl # Debian-likeparsync needs to be installed only on the SOURCE end of the transfer and uses whatever rsync is available on the TARGET. It uses a number of Linux- specific utilities so if you're transferring between Linux and a FreeBSD host, install parsync on the Linux side. In fact, as currently written, it will only PUSH data to remote targets ; it will not pull data as rsync itself can do. This will probably in the near future. 3. Overview rsync is a fabulous data mover. Possibly more bytes have been moved (or have been prevented from being moved) by rsync than by any other application. So what's not to love? For transferring large, deep file trees, rsync will pause while it generates lists of files to process. Since Version 3, it does this pretty fast, but on sluggish filesystems, it can take hours or even days before it will start to actually exchange rsync data. Second, due to various bottlenecks, rsync will tend to use less than the available bandwidth on high speed networks. Starting multiple instances of rsync can improve this significantly. However, on such transfers, it is also easy to overload the available bandwidth, so it would be nice to both limit the bandwidth used if necessary and also to limit the load on the system. parsync tries to satisfy all these conditions and more by:
- using the kdir-cache-writer utility from the beautiful kdirstat directory browser which can produce lists of files very rapidly
- allowing re-use of the cache files so generated.
- doing crude loadbalancing of the number of active rsyncs, suspending and un-suspending the processes as necessary.
- using rsync's own bandwidth limiter (--bwlimit) to throttle the total bandwidth.
- using rsync's own vast option selection is available as a pass-thru (tho limited to those compatible with the --files-from option).
Beyond this introduction, parsync's internal help is about all you'll need to figure out how to use it; below is what you'll see when you type parsync -h . There are still edge cases where parsync will fail or behave oddly, especially with small data transfers, so I'd be happy to hear of such misbehavior or suggestions to improve it. Download the complete tarball of parsync, plus the required utilities here: parsync+utils.tar.gz Unpack it, move the contents to a dir on your $PATH , chmod it executable, and try it out.
Only use for LARGE data transfers The main use case for parsync is really only very large data transfers thru fairly fast network connections (>1Gb/s). Below this speed, a single rsync can saturate the connection, so there's little reason to use parsync and in fact the overhead of testing the existence of and starting more rsyncs tends to worsen its performance on small transfers to slightly less than rsync alone. parsync --helpor justparsyncBelow is what you should see:4. parsync help
parsync version 1.67 (Mac compatibility beta) Jan 22, 2017 by Harry Mangalam <[email protected]> || <[email protected]> parsync is a Perl script that wraps Andrew Tridgell's miraculous 'rsync' to provide some load balancing and parallel operation across network connections to increase the amount of bandwidth it can use. parsync is primarily tested on Linux, but (mostly) works on MaccOSX as well. parsync needs to be installed only on the SOURCE end of the transfer and only works in local SOURCE -> remote TARGET mode (it won't allow remote local SOURCE <- remote TARGET, emitting an error and exiting if attempted). It uses whatever rsync is available on the TARGET. It uses a number of Linux-specific utilities so if you're transferring between Linux and a FreeBSD host, install parsync on the Linux side. The only native rsync option that parsync uses is '-a' (archive) & '-s' (respect bizarro characters in filenames). If you need more, then it's up to you to provide them via '--rsyncopts'. parsync checks to see if the current system load is too heavy and tries to throttle the rsyncs during the run by monitoring and suspending / continuing them as needed. It uses the very efficient (also Perl-based) kdirstat-cache-writer from kdirstat to generate lists of files which are summed and then crudely divided into NP jobs by size. It appropriates rsync's bandwidth throttle mechanism, using '--maxbw' as a passthru to rsync's 'bwlimit' option, but divides it by NP so as to keep the total bw the same as the stated limit. It monitors and shows network bandwidth, but can't change the bw allocation mid-job. It can only suspend rsyncs until the load decreases below the cutoff. If you suspend parsync (^Z), all rsync children will suspend as well, regardless of current state. Unless changed by '--interface', it tried to figure out how to set the interface to monitor. The transfer will use whatever interface routing provides, normally set by the name of the target. It can also be used for non-host-based transfers (between mounted filesystems) but the network bandwidth continues to be (usually pointlessly) shown. [[NB: Between mounted filesystems, parsync sometimes works very poorly for reasons still mysterious. In such cases (monitor with 'ifstat'), use 'cp' or 'tnc' (https://goo.gl/5FiSxR) for the initial data movement and a single rsync to finalize. I believe the multiple rsync chatter is interfering with the transfer.]] It only works on dirs and files that originate from the current dir (or specified via "--rootdir"). You cannot include dirs and files from discontinuous or higher-level dirs. ** the ~/.parsync files ** The ~/.parsync dir contains the cache (*.gz), the chunk files (kds*), and the time-stamped log files. The cache files can be re-used with '--reusecache' (which will re-use ALL the cache and chunk files. The log files are datestamped and are NOT overwritten. ** Odd characters in names ** parsync will sometimes refuse to transfer some oddly named files, altho recent versions of rsync allow the '-s' flag (now a parsync default) which tries to respect names with spaces and properly escaped shell characters. Filenames with embedded newlines, DOS EOLs, and other odd chars will be recorded in the log files in the ~/.parsync dir. ** Because of the crude way that files are chunked, NP may be adjusted slightly to match the file chunks. ie '--NP 8' -> '--NP 7'. If so, a warning will be issued and the rest of the transfer will be automatically adjusted. OPTIONS ======= [i] = integer number [f] = floating point number [s] = "quoted string" ( ) = the default if any --NP [i] (sqrt(#CPUs)) ............... number of rsync processes to start optimal NP depends on many vars. Try the default and incr as needed --startdir [s] (`pwd`) .. the directory it works relative to. If you omit it, the default is the CURRENT dir. You DO have to specify target dirs. See the examples below. --maxbw [i] (unlimited) .......... in KB/s max bandwidth to use (--bwlimit passthru to rsync). maxbw is the total BW to be used, NOT per rsync. --maxload [f] (NP+2) ........ max total system load - if sysload > maxload, sleeps an rsync proc for 10s --checkperiod [i] (5) .......... sets the period in seconds between updates --rsyncopts [s] ... options passed to rsync as a quoted string (CAREFUL!) this opt triggers a pause before executing to verify the command. --interface [s] ............. network interface to /monitor/, not nec use. default: `/sbin/route -n | grep "^0.0.0.0" | rev | cut -d' ' -f1 | rev` above works on most simple hosts, but complex routes will confuse it. --reusecache .......... don't re-read the dirs; re-use the existing caches --email [s] ..................... email address to send completion message (requires working mail system on host) --barefiles ..... set to allow rsync of individual files, as oppo to dirs --nowait ................ for scripting, sleep for a few s instead of wait --version ................................. dumps version string and exits --help ......................................................... this help Examples ======== -- Good example 1 -- % parsync --maxload=5.5 --NP=4 --startdir='/home/hjm' dir1 dir2 dir3 hjm@remotehost:~/backups where = "--startdir='/home/hjm'" sets the working dir of this operation to '/home/hjm' and dir1 dir2 dir3 are subdirs from '/home/hjm' = the target "hjm@remotehost:~/backups" is the same target rsync would use = "--NP=4" forks 4 instances of rsync = -"-maxload=5.5" will start suspending rsync instances when the 5m system load gets to 5.5 and then unsuspending them when it goes below it. It uses 4 instances to rsync dir1 dir2 dir3 to hjm@remotehost:~/backups -- Good example 2 -- % parsync --rsyncopts="--ignore-existing" --reusecache --NP=3 --barefiles *.txt /mount/backups/txt where = "--rsyncopts='--ignore-existing'" is an option passed thru to rsync telling it not to disturb any existing files in the target directory. = "--reusecache" indicates that the filecache shouldn't be re-generated, uses the previous filecache in ~/.parsync = "--NP=3" for 3 copies of rsync (with no "--maxload", the default is 4) = "--barefiles" indicates that it's OK to transfer barefiles instead of recursing thru dirs. = "/mount/backups/txt" is the target - a local disk mount instead of a network host. It uses 3 instances to rsync *.txt from the current dir to "/mount/backups/txt". -- Error Example 1 -- % pwd /home/hjm # executing parsync from here % parsync --NP4 --compress /usr/local /media/backupdisk why this is an error: = '--NP4' is not an option (parsync will say "Unknown option: np4") It should be '--NP=4' = if you were trying to rsync '/usr/local' to '/media/backupdisk', it will fail since there is no /home/hjm/usr/local dir to use as a source. This will be shown in the log files in ~/.parsync/rsync-logfile-<datestamp>_# as a spew of "No such file or directory (2)" errors = the '--compress' is a native rsync option, not a native parsync option. You have to pass it to rsync with "--rsyncopts='--compress'" The correct version of the above command is: % parsync --NP=4 --rsyncopts='--compress' --startdir=/usr local /media/backupdisk -- Error Example 2 -- % parsync --start-dir /home/hjm mooslocal [email protected]:/usr/local why this is an error: = this command is trying to PULL data from a remote SOURCE to a local TARGET. parsync doesn't support that kind of operation yet. The correct version of the above command is: # ssh to hjm@moo, install parsync, then: % parsync --startdir=/usr local hjm@remote:/home/hjm/mooslocal
Jun 02, 2018 | unix.stackexchange.com
up vote 7 down vote favorite 4
Mandar Shinde ,Mar 13, 2015 at 6:51
I have been using arsync
script to synchronize data at one host with the data at another host. The data has numerous small-sized files that contribute to almost 1.2TB.In order to sync those files, I have been using
rsync
command as follows:rsync -avzm --stats --human-readable --include-from proj.lst /data/projects REMOTEHOST:/data/The contents of proj.lst are as follows:
+ proj1 + proj1/* + proj1/*/* + proj1/*/*/*.tar + proj1/*/*/*.pdf + proj2 + proj2/* + proj2/*/* + proj2/*/*/*.tar + proj2/*/*/*.pdf ... ... ... - *As a test, I picked up two of those projects (8.5GB of data) and I executed the command above. Being a sequential process, it tool 14 minutes 58 seconds to complete. So, for 1.2TB of data it would take several hours.
If I would could multiple
rsync
processes in parallel (using&
,xargs
orparallel
), it would save my time.I tried with below command with
parallel
(aftercd
ing to source directory) and it took 12 minutes 37 seconds to execute:parallel --will-cite -j 5 rsync -avzm --stats --human-readable {} REMOTEHOST:/data/ ::: .This should have taken 5 times less time, but it didn't. I think, I'm going wrong somewhere.
How can I run multiple
rsync
processes in order to reduce the execution time?Ole Tange ,Mar 13, 2015 at 7:25
Are you limited by network bandwidth? Disk iops? Disk bandwidth? – Ole Tange Mar 13 '15 at 7:25Mandar Shinde ,Mar 13, 2015 at 7:32
If possible, we would want to use 50% of total bandwidth. But, parallelising multiplersync
s is our first priority. – Mandar Shinde Mar 13 '15 at 7:32Ole Tange ,Mar 13, 2015 at 7:41
Can you let us know your: Network bandwidth, disk iops, disk bandwidth, and the bandwidth actually used? – Ole Tange Mar 13 '15 at 7:41Mandar Shinde ,Mar 13, 2015 at 7:47
In fact, I do not know about above parameters. For the time being, we can neglect the optimization part. Multiplersync
s in parallel is the primary focus now. – Mandar Shinde Mar 13 '15 at 7:47Mandar Shinde ,Apr 11, 2015 at 13:53
Following steps did the job for me:
- Run the
rsync --dry-run
first in order to get the list of files those would be affected.
rsync -avzm --stats --safe-links --ignore-existing --dry-run --human-readable /data/projects REMOTE-HOST:/data/ > /tmp/transfer.log
- I fed the output of
cat transfer.log
toparallel
in order to run 5rsync
s in parallel, as follows:
cat /tmp/transfer.log | parallel --will-cite -j 5 rsync -avzm --relative --stats --safe-links --ignore-existing --human-readable {} REMOTE-HOST:/data/ > result.log
Here,
--relative
option ( link ) ensured that the directory structure for the affected files, at the source and destination, remains the same (inside/data/
directory), so the command must be run in the source folder (in example,/data/projects
).Sandip Bhattacharya ,Nov 17, 2016 at 21:22
That would do an rsync per file. It would probably be more efficient to split up the whole file list usingsplit
and feed those filenames to parallel. Then use rsync's--files-from
to get the filenames out of each file and sync them. rm backups.* split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync --progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/ – Sandip Bhattacharya Nov 17 '16 at 21:22Mike D ,Sep 19, 2017 at 16:42
How does the second rsync command handle the lines in result.log that are not files? i.e.receiving file list ... done
created directory /data/
. – Mike D Sep 19 '17 at 16:42Cheetah ,Oct 12, 2017 at 5:31
On newer versions of rsync (3.1.0+), you can use--info=name
in place of-v
, and you'll get just the names of the files and directories. You may want to use --protect-args to the 'inner' transferring rsync too if any files might have spaces or shell metacharacters in them. – Cheetah Oct 12 '17 at 5:31Mikhail ,Apr 10, 2017 at 3:28
I would strongly discourage anybody from using the accepted answer, a better solution is to crawl the top level directory and launch a proportional number of rync operations.I have a large zfs volume and my source was was a cifs mount. Both are linked with 10G, and in some benchmarks can saturate the link. Performance was evaluated using
zpool iostat 1
.The source drive was mounted like:
mount -t cifs -o username=,password= //static_ip/70tb /mnt/Datahoarder_Mount/ -o vers=3.0Using a single
rsync
process:rsync -h -v -r -P -t /mnt/Datahoarder_Mount/ /StoragePodthe io meter reads:
StoragePod 30.0T 144T 0 1.61K 0 130M StoragePod 30.0T 144T 0 1.61K 0 130M StoragePod 30.0T 144T 0 1.62K 0 130MThis in synthetic benchmarks (crystal disk), performance for sequential write approaches 900 MB/s which means the link is saturated. 130MB/s is not very good, and the difference between waiting a weekend and two weeks.
So, I built the file list and tried to run the sync again (I have a 64 core machine):
cat /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount.log | parallel --will-cite -j 16 rsync -avzm --relative --stats --safe-links --size-only --human-readable {} /StoragePod/ > /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount_result.logand it had the same performance!
StoragePod 29.9T 144T 0 1.63K 0 130M StoragePod 29.9T 144T 0 1.62K 0 130M StoragePod 29.9T 144T 0 1.56K 0 129MAs an alternative I simply ran rsync on the root folders:
rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/Marcello_zinc_bone /StoragePod/Marcello_zinc_bone rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/fibroblast_growth /StoragePod/fibroblast_growth rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/QDIC /StoragePod/QDIC rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/sexy_dps_cell /StoragePod/sexy_dps_cellThis actually boosted performance:
StoragePod 30.1T 144T 13 3.66K 112K 343M StoragePod 30.1T 144T 24 5.11K 184K 469M StoragePod 30.1T 144T 25 4.30K 196K 373MIn conclusion, as @Sandip Bhattacharya brought up, write a small script to get the directories and parallel that. Alternatively, pass a file list to rsync. But don't create new instances for each file.
Julien Palard ,May 25, 2016 at 14:15
I personally use this simple one:ls -1 | parallel rsync -a {} /destination/directory/Which only is usefull when you have more than a few non-near-empty directories, else you'll end up having almost every
rsync
terminating and the last one doing all the job alone.Ole Tange ,Mar 13, 2015 at 7:25
A tested way to do the parallelized rsync is: http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Parallelizing-rsyncrsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.
The following will start one rsync per big file in src-dir to dest-dir on the server fooserver:
cd src-dir; find . -type f -size +100000 | \ parallel -v ssh fooserver mkdir -p /dest-dir/{//}\; \ rsync -s -Havessh {} fooserver:/dest-dir/{}The directories created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:
rsync -Havessh src-dir/ fooserver:/dest-dir/If you are unable to push data, but need to pull them and the files are called digits.png (e.g. 000000.png) you might be able to do:
seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/Mandar Shinde ,Mar 13, 2015 at 7:34
Any other alternative in order to avoidfind
? – Mandar Shinde Mar 13 '15 at 7:34Ole Tange ,Mar 17, 2015 at 9:20
Limit the -maxdepth of find. – Ole Tange Mar 17 '15 at 9:20Mandar Shinde ,Apr 10, 2015 at 3:47
If I use--dry-run
option inrsync
, I would have a list of files that would be transferred. Can I provide that file list toparallel
in order to parallelise the process? – Mandar Shinde Apr 10 '15 at 3:47Ole Tange ,Apr 10, 2015 at 5:51
cat files | parallel -v ssh fooserver mkdir -p /dest-dir/{//}\; rsync -s -Havessh {} fooserver:/dest-dir/{} – Ole Tange Apr 10 '15 at 5:51Mandar Shinde ,Apr 10, 2015 at 9:49
Can you please explain themkdir -p /dest-dir/{//}\;
part? Especially the{//}
thing is a bit confusing. – Mandar Shinde Apr 10 '15 at 9:49,
For multi destination syncs, I am usingparallel rsync -avi /path/to/source ::: host1: host2: host3:Hint: All ssh connections are established with public keys in
~/.ssh/authorized_keys
Jun 02, 2018 | www.gnu.org
rsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.
The following will start one rsync per big file in src-dir to dest-dir on the server fooserver :
cd src-dir; find . -type f -size +100000 | \ parallel -v ssh fooserver mkdir -p /dest-dir/{//}\; \ rsync -s -Havessh {} fooserver:/dest-dir/{}The dirs created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:
rsync -Havessh src-dir/ fooserver:/dest-dir/If you are unable to push data, but need to pull them and the files are called digits.png (e.g. 000000.png) you might be able to do:
seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/
Google matched content |
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: February, 19, 2020