Skip to content

Commit 32eee70

Browse files
committed
Data Transfer - too much editing and fiddling, publish already.
1 parent c819a4f commit 32eee70

File tree

1 file changed

+29
-65
lines changed

1 file changed

+29
-65
lines changed

docs/basics/datatransfer.md

Lines changed: 29 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,23 @@
11
# Data Transfer { #datatransfer }
22
*Last update: April 14, 2025*
33

4-
## Protocols
4+
TACC supports two primary technologies for data transfer: SSH (also referred to as SCP and SFTP) and Globus (also referred to as GridFTP). All TACC systems support SSH-based transfer, and most TACC systems support Globus-based transfer. When in doubt, we recommend that you start with SSH-based transfer as this requires the least setup and utilizes the TACC authentication system. Globus uses its own authentication system and will require additional setup steps, [outlined below](#globus).
55

6-
TACC supports two primary technologies for data transfer:
7-
8-
1. The Secure Shell Protocol (SSH)
9-
10-
* The SSH protocol encompasses the `scp`, `rsync` and `sftp` command-line utilities.
11-
* use a GUI if you don't like the command-line, see Cyberduck, Putty, Moab
12-
* Windows Users
13-
14-
2. Globus
15-
16-
* define Globus, formerly GridFTP
17-
* best for transferring large data-sets
18-
* often used across institutions
6+
There are many SSH-compatible clients across all platforms, and almost any modern SSH client will successfully interoperate with TACC systems. While we provide [examples using the Cyberduck application](#cyberduck), users are encouraged to select and utilize whichever transfer client is most familiar to them and most functional on your platform. Many SSH clients are organized to assist with specific workflows.
197

20-
All TACC resources support SSH-based transfer, and most TACC resources support Globus-based transfer.
8+
For SSH-based transfers, you will need two pieces of information in addition to your TACC username/password combination: the HOSTNAME of the system you are transferring to, and the PATH that you are attempting to access. Especially if you are uploading data, it is very important that you select the correct path for the resource and project - otherwise your data will be at risk of being lost or misplaced. The path may include a functional name such as /scratch/ or a resource name such as /corral/ .
219

10+
Globus-based transfers usually utilize an endpoint name (usually the name of the HPC or Storage resource you are connecting to) rather than a hostname, but you will still need to know the endpoint name, and you will always need the PATH that you are addressing, in order to successfully transfer data.
2211

12+
All TACC resources support SSH-based transfer, and most TACC resources support Globus-based transfer.
2313

2414
## SSH
2515

26-
There are many SSH-compatible clients across all platforms, and almost any modern SSH client will successfully interoperate with TACC systems. While we provide [examples using the Cyberduck application](#cyberduck), users are encouraged to select and utilize whichever transfer client is most familiar to them and most functional on your platform. Many SSH clients are organized to assist with specific workflows.
27-
2816
You can access SSH utilities via a client application, a GUI interface, or on the command-line via a Terminal application.
2917

30-
SSH clients generally fall into two categories:
31-
3218
1. Graphical User Interface (GUI) tools, e.g. [Cyberduck](#cyberduck).
3319
1. Command-line (CLI) tools e.g. `scp`, `sftp`, `rsync`
3420

35-
For SSH-based transfers, you will need two pieces of information in addition to your TACC username/password combination: 1) the HOSTNAME of the system you are transferring to, and the PATH that you are attempting to access. Especially if you are uploading data, it is very important that you select the correct path for the resource and project - otherwise your data will be at risk of being lost or misplaced. <!-- The path may include a functional name such as /scratch/ or a resource name such as /corral/ . -->
36-
37-
3821

3922
### Cyberduck { #cyberduck }
4023

@@ -66,58 +49,38 @@ Consult Figure 2. below to ensure the information you have provided is correct.
6649
Once connected, you can navigate through your remote file hierarchy using the graphical user interface. You may also drag-and-drop files from your local computer into the Cyberduck window to transfer files to the system.
6750

6851

69-
### SSH Command-Line Examples { #ssh }
70-
71-
Transfer files between TACC HPC resources and other Linux-based systems using either [`scp`](http://linux.com/learn/intro-to-linux/2017/2/how-securely-transfer-files-between-servers-scp) or [`rsync`](http://linux.com/learn/get-know-rsync). Both `scp` and `rsync` are available in the Mac Terminal app. Windows SSH clients typically include `scp`-based file transfer capabilities.
72-
73-
The `scp` and `rsync` commands are standard UNIX data transfer mechanisms used to transfer moderate size files and data collections between systems. These applications use a single thread to transfer each file one at a time. The `scp` and `rsync` utilities are typically the best methods when transferring Gigabytes of data. For larger data transfers, parallel data transfer mechanisms, e.g., Globus, can often improve total throughput and reliability.
74-
75-
!!! note
76-
It is possible to use these command line tools if your local machine runs Windows, but you will need to use an SSH client (ex. [CyberDuck][DOWNLOADCYBERDUCK]).
77-
78-
To simplify the data transfer process, we recommend that Windows users follow the <a href="#datatransfer-cyberduck">How to Transfer Data with Cyberduck</a> guide as detailed below.
79-
8052
## Transfer Scenarios
8153

8254
Let's examine the most common data transfer scenarios for TACC users. In all the following text, the term "data" can refer to anything from a single file to multiple directories.
8355

84-
1. [Transfer data between your laptop and TACC resources](#txf1)
85-
1. [Transfer data between TACC HPC resources](#txf2)
86-
1. [Transfer data between institutions](#txf3)
87-
1. [Transfer data between TACC HPC and storage resources](#txf4)
56+
1. [Between your laptop and TACC resources](#txf1)
57+
1. [Between TACC HPC resources](#txf2)
58+
1. [Between institutions](#txf3)
59+
1. [Between TACC HPC and storage resources](#txf4)
8860

8961

90-
### Transfer data between your laptop and a TACC resource { #txf1 }
62+
### 1. Transfer data between your laptop and a TACC resource { #txf1 }
9163

9264
Moving data from your home computer/laptop to a TACC resource is called "pushing" or "uploading" that file. Conversely, when copying data from a TACC resource to your laptop, this is "pulling" or "downloading" data. In the following examples, all transfers are initiated from your laptop, not the TACC resource, since your laptop likely does not have a fixed IP address.
9365

94-
#### Example 1
95-
96-
TACC account holder `bjones` uploads a local file, `mylaptopfile`, to his home directory on Stampede3.
66+
Example 1: TACC account holder `bjones` uploads a local file, `mylaptopfile`, to his home directory on Stampede3.
9767

9868
```cmd-line
9969
localhost$ scp mylaptopfile bjones@stampede3.tacc.utexas.edu:
10070
```
101-
10271
Note the "`:`" at the end of the line.
10372

104-
#### Example 2
105-
106-
TACC account holder `bjones` downloads a file located in his home directory on Stampede3, `myTACCfile`, to his laptop.
107-
73+
Example 2: TACC account holder `bjones` downloads a file located in his home directory on Stampede3, `myTACCfile`, to his laptop.
10874

10975
```cmd-line
11076
localhost$ scp bjones@stampede3.utexas.edu:myTACCfile .
11177
```
11278

113-
<pre>
114-
<b>localhost$</b> scp bjones
115-
</pre>
116-
117-
### Transfer files between TACC HPC resources { #txf2 }
79+
### 2. Transfer files between TACC HPC resources { #txf2 }
11880

11981
Transfer files between TACC HPC resources, e.g. Stampede3 to Vista.
120-
If you have an allocation on more than one TACC HPC resource, and want to move a file from one home directory or another, make use of the shared $WORK file system.
82+
83+
If you have an allocation on more than one TACC HPC resource, and want to move a file from one home directory or another, make use of the shared `$WORK` file system.
12184

12285
Example: copy `myfile` in my home directory on Stampede3 to my account on Vista.
12386

@@ -129,23 +92,28 @@ vista$ exit
12992
stampede3$
13093
```
13194

132-
### Transfer files between institutions { #txf3 }
95+
### 3. Transfer Files Between Institutions { #txf3 }
13396

134-
If you are a researcher with
135-
Transfer files between institutions e.g. from TACC to Cornell University.
97+
If you are a researcher with data located at multiple institutions, we recommend you use Globus for large data set transfers to TACC. You will need to authenticate with your institution. [See how to set up your TACC account to use Globus](#globus).
13698

137-
If you wish to transfer files between institutions, we recommend you use Globus for large data set transfers. You will need to authenticate with your institution. See how to set up your TACC account to use Globus.
99+
### 4. Backup/Transfer files between TACC HPC and TACC storage resources { #txf4 }
138100

139-
### Backup/Transfer files between TACC HPC and TACC storage resources { #txf4 }
101+
To backup files to TACC's Ranch archive, consult the [Ranch User Guide](https://docs.tacc.utexas.edu/hpc/corral/#transferring). Consult the [Corral User Guide][TACCCORRALUG] for instructions on transferring between Lonestar6 and Corral.
140102

141-
Transfer files between TACC HPC and storage resources e.g. from Lonestar6 to Corral, Stampede3 to Ranch.
103+
### SSH Command-Line Examples { #ssh }
142104

143-
To backup files to TACC's Ranch archive, consult the [Ranch User Guide](https://docs.tacc.utexas.edu/hpc/corral/#transferring). Consult the [Corral User Guide][TACCCORRALUG] for instructions on transferring between Lonestar6 and Corral.
105+
Transfer files between TACC HPC resources and other Linux-based systems using either [`scp`](http://linux.com/learn/intro-to-linux/2017/2/how-securely-transfer-files-between-servers-scp) or [`rsync`](http://linux.com/learn/get-know-rsync). Both `scp` and `rsync` are available in the Mac Terminal app. Windows SSH clients typically include `scp`-based file transfer capabilities.
144106

107+
The `scp` and `rsync` commands are standard UNIX data transfer mechanisms used to transfer moderate size files and data collections between systems. These applications use a single thread to transfer each file one at a time. The `scp` and `rsync` utilities are typically the best methods when transferring Gigabytes of data. For larger data transfers, parallel data transfer mechanisms, e.g., Globus, can often improve total throughput and reliability.
145108

146-
#### Advanced `scp`
109+
!!! note
110+
It is possible to use these command line tools if your local machine runs Windows, but you will need to use an SSH client (ex. [CyberDuck][DOWNLOADCYBERDUCK]).
111+
112+
To simplify the data transfer process, we recommend that Windows users follow the <a href="#datatransfer-cyberduck">How to Transfer Data with Cyberduck</a> guide as detailed below.
147113

148114

115+
## Advanced `scp` Examples
116+
149117
The Linux `scp` (secure copy) utility is a component of the OpenSSH suite. Assuming your Lonestar6 username is `bjones`, a simple `scp` transfer that copies a file named `myfile` from your local Linux system to Lonestar6 `$HOME` would look like this:
150118

151119
```cmd-line
@@ -199,7 +167,7 @@ Consult the `scp` man pages for more information:
199167
login1$ man scp
200168
```
201169

202-
#### Transferring Files with `rsync` { #transferring-rsync }
170+
## Transferring Files with `rsync` { #transferring-rsync }
203171

204172
The `rsync` (remote synchronization) utility is another way to keep your data up to date. In contrast to `scp`, `rsync` transfers only the actual changed parts of a file (instead of transferring an entire file). Hence, this selective method of data transfer can be much more efficient than `scp`. The following example demonstrates usage of the `rsync` command for transferring a file named `myfile.c` from its current location on Stampede to Frontera's `$DATA` directory.
205173

@@ -246,10 +214,6 @@ login1$ man rsync
246214
!!! Warning
247215
When executing multiple instantiations of any of the commands listed above, `scp`, `sftp` and `rsync`, limit your active transfers to no more than 2-3 processes at a time.
248216

249-
<!--
250-
globus is a GUI and is excellent for transferring l
251-
252-
TACC staff recommends that you start with SSH-based transfer as this requires the least setup and utilizes the TACC authentication system. Globus uses its own authentication system and will require additional setup steps, [outlined below](#globus). -->
253217

254218

255219

0 commit comments

Comments
 (0)