Speed up rsync with Simultaneous/Concurrent File Transfers?

parallel rsync xargs
parallel rsync example
rsync multiple processes
gnu parallel rsync
rsync options
gnu parallel file transfer
rsync parallel github
parsync

We need to transfer 15TB of data from one server to another as fast as we can. We're currently using rsync but we're only getting speeds of around 150Mb/s, when our network is capable of 900+Mb/s (tested with iperf). I've done tests of the disks, network, etc and figured it's just that rsync is only transferring one file at a time which is causing the slowdown.

I found a script to run a different rsync for each folder in a directory tree (allowing you to limit to x number), but I can't get it working, it still just runs one rsync at a time.

I found the script here (copied below).

Our directory tree is like this:

/main
   - /files
      - /1
         - 343
            - 123.wav
            - 76.wav
         - 772
            - 122.wav
         - 55
            - 555.wav
            - 324.wav
            - 1209.wav
         - 43
            - 999.wav
            - 111.wav
            - 222.wav
      - /2
         - 346
            - 9993.wav
         - 4242
            - 827.wav
      - /3
         - 2545
            - 76.wav
            - 199.wav
            - 183.wav
         - 23
            - 33.wav
            - 876.wav
         - 4256
            - 998.wav
            - 1665.wav
            - 332.wav
            - 112.wav
            - 5584.wav

So what I'd like to happen is to create an rsync for each of the directories in /main/files, up to a maximum of, say, 5 at a time. So in this case, 3 rsyncs would run, for /main/files/1, /main/files/2 and /main/files/3.

I tried with it like this, but it just runs 1 rsync at a time for the /main/files/2 folder:

#!/bin/bash

# Define source, target, maxdepth and cd to source
source="/main/files"
target="/main/filesTest"
depth=1
cd "${source}"

# Set the maximum number of concurrent rsync threads
maxthreads=5
# How long to wait before checking the number of rsync threads again
sleeptime=5

# Find all folders in the source directory within the maxdepth level
find . -maxdepth ${depth} -type d | while read dir
do
    # Make sure to ignore the parent folder
    if [ `echo "${dir}" | awk -F'/' '{print NF}'` -gt ${depth} ]
    then
        # Strip leading dot slash
        subfolder=$(echo "${dir}" | sed 's@^\./@@g')
        if [ ! -d "${target}/${subfolder}" ]
        then
            # Create destination folder and set ownership and permissions to match source
            mkdir -p "${target}/${subfolder}"
            chown --reference="${source}/${subfolder}" "${target}/${subfolder}"
            chmod --reference="${source}/${subfolder}" "${target}/${subfolder}"
        fi
        # Make sure the number of rsync threads running is below the threshold
        while [ `ps -ef | grep -c [r]sync` -gt ${maxthreads} ]
        do
            echo "Sleeping ${sleeptime} seconds"
            sleep ${sleeptime}
        done
        # Run rsync in background for the current subfolder and move one to the next one
        nohup rsync -a "${source}/${subfolder}/" "${target}/${subfolder}/" </dev/null >/dev/null 2>&1 &
    fi
done

# Find all files above the maxdepth level and rsync them as well
find . -maxdepth ${depth} -type f -print0 | rsync -a --files-from=- --from0 ./ "${target}/"

Updated answer (Jan 2020)

xargs is now the recommended tool to achieve parallel execution. It's pre-installed almost everywhere. For running multiple rsync tasks the command would be:

ls /srv/mail | xargs -n1 -P4 -I% rsync -Pa % myserver.com:/srv/mail/

This will list all folders in /srv/mail, pipe them to xargs, which will read them one-by-one and and run 4 rsync processes at a time. The % char replaces the input argument for each command call.

Original answer using parallel:

ls /srv/mail | parallel -v -j8 rsync -raz --progress {} myserver.com:/srv/mail/{}

A Quick Look at Parallel Rsync and How it Can Save a System , Rsync is a tool for copying files between volumes in the same or separate servers . Rsync can still be slow in certain situations, however—especially when “[ Multi-Stream-rsync] will split the transfer in multiple buckets while� As for using rsync, you can't usefully run two simultaneous rsyncs copying the same files at the same time. the only thing I can think of that you can do at the rsync level is to break your directory into multiple subdirectories and transfer them one by one in parallel.

rsync transfers files as fast as it can over the network. For example, try using it to copy one large file that doesn't exist at all on the destination. That speed is the maximum speed rsync can transfer data. Compare it with the speed of scp (for example). rsync is even slower at raw transfer when the destination file exists, because both sides have to have a two-way chat about what parts of the file are changed, but pays for itself by identifying data that doesn't need to be transferred.

A simpler way to run rsync in parallel would be to use parallel. The command below would run up to 5 rsyncs in parallel, each one copying one directory. Be aware that the bottleneck might not be your network, but the speed of your CPUs and disks, and running things in parallel just makes them all slower, not faster.

run_rsync() {
    # e.g. copies /main/files/blah to /main/filesTest/blah
    rsync -av "$1" "/main/filesTest/${1#/main/files/}"
}
export -f run_rsync
parallel -j5 run_rsync ::: /main/files/*

How To: Speed Up File Transfers in Linux using RSync with GNU , But, over all of those alternatives, I would prefer GNU Parallel, a utility used to execute jobs in parallel. It is a single command that can replace� It uses rsync algorithm and is designed to use up to 5 threads when backing up a folder structure. Although more threads are possible, I/O read/write becomes a bottleneck with more threads. Note that Syncrify does not use rsync's binary. Instead, it uses the rsync algorithm with a custom implementation, which runs rsync over HTTP(S).

You can use xargs which supports running many processes at a time. For your case it will be:

ls -1 /main/files | xargs -I {} -P 5 -n 1 rsync -avh /main/files/{} /main/filesTest/

Parallelise rsync using GNU Parallel, I fed the output of cat transfer.log to parallel in order to run 5 rsync s in parallel, This in synthetic benchmarks (crystal disk), performance for sequential write may end up with wrong permissions and smaller files are not being transferred. Nous avons besoin de transférer 15TB de données d'un serveur à l'autre aussi vite que nous le pouvons. Nous utilisons actuellement rsync mais nous n'obtenons que des vitesses d'environ150Mb/s, quand notre réseau est capable de 900+Mb/s (testé avec iperf).

Maximizing rsync throughput in 2 easy commands, rsync is an excellent tool for linux to copy files between different you can significantly speed up copying files by running processes in parallel. To speed up CuteFTP through concurrent transfers, go to Tools > Global Options. In the Global Options, go to Connection. You can increase the maximum concurrent transfers globally or per site.

I've developed a python package called: parallel_sync

https://pythonhosted.org/parallel_sync/pages/examples.html

Here is a sample code how to use it:

from parallel_sync import rsync
creds = {'user': 'myusername', 'key':'~/.ssh/id_rsa', 'host':'192.168.16.31'}
rsync.upload('/tmp/local_dir', '/tmp/remote_dir', creds=creds)

parallelism by default is 10; you can increase it:

from parallel_sync import rsync
creds = {'user': 'myusername', 'key':'~/.ssh/id_rsa', 'host':'192.168.16.31'}
rsync.upload('/tmp/local_dir', '/tmp/remote_dir', creds=creds, parallelism=20)

however note that ssh typically has the MaxSessions by default set to 10 so to increase it beyond 10, you'll have to modify your ssh settings.

Speed up rsync by running multiple instances at once?, You could replace rsync with lftp - see my post on Superuser: /questions/75681 /inverse-multiplexing-to-speed-up-file-transfer/305236#305236 usefully run two simultaneous rsyncs copying the same files at the same time. directory into multiple subdirectories and transfer them one by one in parallel. That speed is the maximum speed rsync can transfer data. Compare it with the speed of scp (for example). rsync is even slower at raw transfer when the destination file exists, because both sides have to have a two-way chat about what parts of the file are changed, but pays for itself by identifying data that doesn't need to be transferred.

Parallelizing rsync, I'm not sure how easy it is to set up, but it might just do the trick! N) # Transfer files in parallel using rsync (simple script) # MAXCONN: maximum number� Unfortunately, the file transfer speeds are so bad you’re considering buying a turtle and glueing a USB drive to its back to speed up file copying. After hours of searching the net for an answer, in the end, I found the cause of my woes in a couple of little settings tucked away on the Synology – effectively my Synology NAS wasn’t even

parsyncfp, For transferring large, deep file trees, rsync will pause while it Below this speed , rsync itself can saturate the connection, so there's little reason If you need more, then it's up to you to provide them ALL via '--rsyncopts'. parsyncfp or for running multiple parsyncfps simultaneously --startdir|sd [s] (`pwd`) . rsync-create all missing parent directories? Run Rsync from Python ; How to automatically accept the remote key when rsyncing? How to copy a directory from local machine to remote machine ; Rsync checksum only for same size files ; rsync through ssh tunnel ; How to rsync only a specific list of files?

Multithread rsync transfers, I am using rsync, but I am getting the painfully slow rsync blues. an average of about 50 MB/s when the transfers finish on large files. If I open new SSH sessions and run 10 rsync transfers simultaneously, the aggregate bandwidth I was hoping to use a parallel command to speed up the transfer, but it� I was on the hunt for an rsync replacement because I was seeing extremely slow transfer times when moving large data files. Rysnc didn’t seem to take full advantage of all the bandwidth on a 1GB link. Transfers were taking much longer then what I thought they should. Rsysnc also does not run parallel syncs across the wire.

Comments
  • Note, if you customize your ls output through various means, such as the LISTFLAGS variable or DIR_COLORS file, you may need to use ls --indicator-style=none to prevent ls from appending symbols to the path name (such as * for executable files).
  • I found this worked much better if I used cd /sourcedir ; parallel -j8 -i rsync -aqH {} /destdir/{} -- *
  • @Manuel Riel what the '{}' means?
  • That's a placeholder for the filenames you get piped from the ls command before. man parallel should have more details. The find command uses the same I believe.
  • This is not an efficient solution, as shown here: unix.stackexchange.com/questions/189878/… This solution will create one rsync call per file in the listing
  • Doesn't seem I can get parallel on Ubuntu Server 12.04 with apt-get install parallel. Don't really want to start installing stuff manually just for this because it's very rarely going to be needed. I was just hoping for a quick script I could do it with.
  • @BT643: Use apt-get install moreutils to install parallel
  • @dragosrsupercool Thanks, will keep that in mind when I ever need to do anything like this in future :)
  • While yes copying single files go "as fast as possible", many many many times there seem to be some kind of cap on a single pipe where simultaneous transfers do not appear to choke each others' bandwidth thus meaning parallel transfers are far more efficient and faster than single transfers.
  • How to install parallel in Linux?
  • Please don't just post some tool or library as an answer. At least demonstrate how it solves the problem in the answer itself.
  • @i_m_mahii Stack Exchange should automatically keep a copy of linked pages.
  • parsync is awesome
  • Contrary to what some others may say, proposing a solution that is merely tools does help some of us. The "conform or go away!" crowd apparently doesn't actually just want to help others. so thanks for your post on behalf of all those who just discovered those two packages today from your post, and those who realized that xarg and find (without those packages) could also do the trick. Post and let the voters do their bit and ignore the bitter "get off my site" guys who seem to wander around here from time to time "enforcing".
  • Since many of us who are actually reading this particular post know what we're looking for already, and since the OP provided a detailed question, proposing an advanced use case here is appropriate. I don't want some generic example (as I shouldn't be copying and pasting it for my application anyway) as to how to use these tools; I'm going to read the docs and figure it out myself. Trust but verify.