Globus provides fast file transfers over the Internet.
If you work in High-Performance Computing, Next Generation Sequencing or Bioinformatics, you are likely faced with routinely transferring large files, or large numbers of files, around the Internet. And you’re probably well aware that Internet file transfers can be painfully slow and negatively impact productivity and workflows. Globus can help alleviate the Internet file transfer bottleneck.
In this barebones tutorial we’ll describe how to get started with Globus. For a more in-depth discussion of Globus capabilities, see their website.
Globus provides two mechanisms for obtaining a Globus account:
- You may already have an organizational account that lets you use Globus. From the organizational login page (below), check the drop-down list to see if your organization appears there. If so, choose your organization and login with credentials (userid and password) that are appropriate for your account. For more information about this method see the CILogon website.
- If you don’t have an existing organizational account, you can create a Globus ID (and below) account. Just fill in the blanks to create your own account.
Organizational login page.
Globus ID login page.
Globus endpoints page.
To transfer files with Globus you have to identify a source endpoint and a destination endpoint.
The Globus Manage Endpoints page provides a convenient method for managing commonly used source and destination endpoints. You have to login to your account to see this page. You can add new endpoints to the list with the “add Globus Connect Personal endpoint” and “add Globus Connect Server endpoint” functions. For each endpoint in the list this page shows the endpoint name, the type of endpoint (eg. Public Endpoint, Globus Connect Personal, etc.), the current status of the endpoint (eg. ready, not active, etc.) and the status of your credentials for that endpoint (eg. expiring soon, never expires, etc.). Clicking on the endpoint name gives more detailed information about it. You may also filter and search the endpoints list if it becomes too unwieldy.
Globus endpoint names are usually provided by the source and destination organizations. Unfortunately there does not seem to be central directory of endpoint names, so you’ll have to check the source/destination organization websites, or contact them directly, to determine the correct endpoint names.
Globus file transfer page.
Once you’ve identified the appropriate source and destination endpoints, you can transfer files between them with ease.
The Globus file transfer page shows the status of file transfer jobs. On the left, it shows the source endpoint name and the absolute path to directories and/or files that you wish to transfer. On the right, it shows the destination endpoint name and the absolute path to a directory where you want to copy files. Note that you can copy individual files, sets of files, individual directories or sets of directories.
To copy files from source to destination, simply click file and/or directory name(s) in the source frame, and then click the right arrow button. The file transfer starts immediately. Basically, this is a simple drag-and-drop operation. (Note that you can also go the other direction, that is, copy files from destination to source but this changes the naming conventions. It’s a perfectly legitimate operation though.)
To keep file transfers straight, you can add an optional text label to specific transfer jobs.
Globus also provides several file transfer options:
- “sync” transfers only new or changed files
- delete files on the destination side that do not exist on the source side
- match file datetime stamps on the destination side with file datetime stamps on the source side
- verify file checksums on both source and destination files to ensure file integrity after transfers are complete
- encrypt file transfers
Verifying file integrity after transfer is likely the most important feature in this set, particularly for large files that take a long time to transfer. The file integrity check adds about 5% more time to a typical transfer job.
When a file transfer job is complete (whether successful or failure), you’ll get an email message describing the transfer operation. You can also check the Activity page to get information about transfer jobs.
Globus activity page.
The Globus Activity page shows the status of file transfers in progress and the metrics of completed transfers. The Activity page shows a unique task ID for each job, source and destination endpoints, the condition of the transfer (eg. success, failure), your Globus user ID, request and completion datetime stamps, file transfer settings, the number of files and/or directories transferred, the total number of bytes transferred, the effective bandwidth of the transfer and various status flags. The Activity page also includes an event log that presents diagnostic information in case a file transfer fails to complete.
The file transfer rates between source and destination sites depend on several factors including network bandwidth, firewalls and routers, disk I/O rates, etc. In general you’ll have to try some test file transfers to get a sense of the transfer rates between any two specific sites.
Here are some ad-hoc file transfer rates to give you an idea of what to expect from Globus:
|Source||Destination||Transfer Rate (Gbyte/hr.)|
|CSU Cray, CO||CU Janus, CO||650|
|NCBI, MD||Stampede, TX||500|
|Stampede, TX||CSU Cray, CO||150|
|OK State U., OK||CSU Cray, CO||70|
|OK State U., OK||CSU CS Dept., CO||20|
|Stampede, TX||CSU CS Dept., CO||20|
|CSU CS Dept., CO||Nextgene server||5|
Globus file transfer rates.
With Globus it’s possible to transfer multi-terabyte files in a few hours between some sites – a very reasonable and acceptable figure for many applications. Even transferring files from large-scale high-performance systems to local workstations is acceptable for files in the tens to hundreds of gigabytes. Thus, Globus should be considered for anyone wishing to move large files over the Internet.