Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article gives detailed instructions for using the msrsync utility to copy data to an Azure Blob storage container for use with Azure HPC Cache.
To learn more about moving data to Blob storage for your Azure HPC Cache, read Move data to Azure Blob storage.
The msrsync tool can be used to move data to a back-end storage target for the Azure HPC Cache. This tool is designed to optimize bandwidth usage by running multiple parallel rsync processes. It is available from GitHub at https://github.com/jbd/msrsync.
msrsync breaks up the source directory into separate “buckets” and then runs individual rsync processes on each bucket.
Preliminary testing using a four-core VM showed best efficiency when using 64 processes. Use the msrsync option -p to set the number of processes to 64.
Note that msrsync can only write to and from local volumes. The source and destination must be accessible as local mounts on the workstation used to issue the command.
Follow these instructions to use msrsync to populate Azure Blob storage with Azure HPC Cache:
Install
msrsyncand its prerequisites (rsyncand Python 2.6 or later)Determine the total number of files and directories to be copied.
For example, use the utility
prime.pywith argumentsprime.py --directory /path/to/some/directory(available by downloading https://github.com/Azure/Avere/blob/main/src/clientapps/dataingestor/prime.py).If not using
prime.py, you can calculate the number of items with the GNUfindtool as follows:find <path> -type f |wc -l # (counts files) find <path> -type d |wc -l # (counts directories) find <path> |wc -l # (counts both)Divide the number of items by 64 to determine the number of items per process. Use this number with the
-foption to set the size of the buckets when you run the command.Issue the
msrsynccommand to copy files:msrsync -P --stats -p64 -f<ITEMS_DIV_64> --rsync "-ahv --inplace" <SOURCE_PATH> <DESTINATION_PATH>For example, this command is designed to move 11,000 files in 64 processes from /test/source-repository to /mnt/hpccache/repository:
mrsync -P --stats -p64 -f170 --rsync "-ahv --inplace" /test/source-repository/ /mnt/hpccache/repository