You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/basics/datatransfer.md
+29-65Lines changed: 29 additions & 65 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,40 +1,23 @@
1
1
# Data Transfer { #datatransfer }
2
2
*Last update: April 14, 2025*
3
3
4
-
## Protocols
4
+
TACC supports two primary technologies for data transfer: SSH (also referred to as SCP and SFTP) and Globus (also referred to as GridFTP). All TACC systems support SSH-based transfer, and most TACC systems support Globus-based transfer. When in doubt, we recommend that you start with SSH-based transfer as this requires the least setup and utilizes the TACC authentication system. Globus uses its own authentication system and will require additional setup steps, [outlined below](#globus).
5
5
6
-
TACC supports two primary technologies for data transfer:
7
-
8
-
1. The Secure Shell Protocol (SSH)
9
-
10
-
* The SSH protocol encompasses the `scp`, `rsync` and `sftp` command-line utilities.
11
-
* use a GUI if you don't like the command-line, see Cyberduck, Putty, Moab
12
-
* Windows Users
13
-
14
-
2. Globus
15
-
16
-
* define Globus, formerly GridFTP
17
-
* best for transferring large data-sets
18
-
* often used across institutions
6
+
There are many SSH-compatible clients across all platforms, and almost any modern SSH client will successfully interoperate with TACC systems. While we provide [examples using the Cyberduck application](#cyberduck), users are encouraged to select and utilize whichever transfer client is most familiar to them and most functional on your platform. Many SSH clients are organized to assist with specific workflows.
19
7
20
-
All TACC resources support SSH-based transfer, and most TACC resources support Globus-based transfer.
8
+
For SSH-based transfers, you will need two pieces of information in addition to your TACC username/password combination: the HOSTNAME of the system you are transferring to, and the PATH that you are attempting to access. Especially if you are uploading data, it is very important that you select the correct path for the resource and project - otherwise your data will be at risk of being lost or misplaced. The path may include a functional name such as /scratch/ or a resource name such as /corral/ .
21
9
10
+
Globus-based transfers usually utilize an endpoint name (usually the name of the HPC or Storage resource you are connecting to) rather than a hostname, but you will still need to know the endpoint name, and you will always need the PATH that you are addressing, in order to successfully transfer data.
22
11
12
+
All TACC resources support SSH-based transfer, and most TACC resources support Globus-based transfer.
23
13
24
14
## SSH
25
15
26
-
There are many SSH-compatible clients across all platforms, and almost any modern SSH client will successfully interoperate with TACC systems. While we provide [examples using the Cyberduck application](#cyberduck), users are encouraged to select and utilize whichever transfer client is most familiar to them and most functional on your platform. Many SSH clients are organized to assist with specific workflows.
27
-
28
16
You can access SSH utilities via a client application, a GUI interface, or on the command-line via a Terminal application.
29
17
30
-
SSH clients generally fall into two categories:
31
-
32
18
1. Graphical User Interface (GUI) tools, e.g. [Cyberduck](#cyberduck).
33
19
1. Command-line (CLI) tools e.g. `scp`, `sftp`, `rsync`
34
20
35
-
For SSH-based transfers, you will need two pieces of information in addition to your TACC username/password combination: 1) the HOSTNAME of the system you are transferring to, and the PATH that you are attempting to access. Especially if you are uploading data, it is very important that you select the correct path for the resource and project - otherwise your data will be at risk of being lost or misplaced. <!-- The path may include a functional name such as /scratch/ or a resource name such as /corral/ . -->
36
-
37
-
38
21
39
22
### Cyberduck { #cyberduck }
40
23
@@ -66,58 +49,38 @@ Consult Figure 2. below to ensure the information you have provided is correct.
66
49
Once connected, you can navigate through your remote file hierarchy using the graphical user interface. You may also drag-and-drop files from your local computer into the Cyberduck window to transfer files to the system.
67
50
68
51
69
-
### SSH Command-Line Examples { #ssh }
70
-
71
-
Transfer files between TACC HPC resources and other Linux-based systems using either [`scp`](http://linux.com/learn/intro-to-linux/2017/2/how-securely-transfer-files-between-servers-scp) or [`rsync`](http://linux.com/learn/get-know-rsync). Both `scp` and `rsync` are available in the Mac Terminal app. Windows SSH clients typically include `scp`-based file transfer capabilities.
72
-
73
-
The `scp` and `rsync` commands are standard UNIX data transfer mechanisms used to transfer moderate size files and data collections between systems. These applications use a single thread to transfer each file one at a time. The `scp` and `rsync` utilities are typically the best methods when transferring Gigabytes of data. For larger data transfers, parallel data transfer mechanisms, e.g., Globus, can often improve total throughput and reliability.
74
-
75
-
!!! note
76
-
It is possible to use these command line tools if your local machine runs Windows, but you will need to use an SSH client (ex. [CyberDuck][DOWNLOADCYBERDUCK]).
77
-
78
-
To simplify the data transfer process, we recommend that Windows users follow the <ahref="#datatransfer-cyberduck">How to Transfer Data with Cyberduck</a> guide as detailed below.
79
-
80
52
## Transfer Scenarios
81
53
82
54
Let's examine the most common data transfer scenarios for TACC users. In all the following text, the term "data" can refer to anything from a single file to multiple directories.
83
55
84
-
1.[Transfer data between your laptop and TACC resources](#txf1)
85
-
1.[Transfer data between TACC HPC resources](#txf2)
86
-
1.[Transfer data between institutions](#txf3)
87
-
1.[Transfer data between TACC HPC and storage resources](#txf4)
56
+
1.[Between your laptop and TACC resources](#txf1)
57
+
1.[Between TACC HPC resources](#txf2)
58
+
1.[Between institutions](#txf3)
59
+
1.[Between TACC HPC and storage resources](#txf4)
88
60
89
61
90
-
### Transfer data between your laptop and a TACC resource { #txf1 }
62
+
### 1. Transfer data between your laptop and a TACC resource { #txf1 }
91
63
92
64
Moving data from your home computer/laptop to a TACC resource is called "pushing" or "uploading" that file. Conversely, when copying data from a TACC resource to your laptop, this is "pulling" or "downloading" data. In the following examples, all transfers are initiated from your laptop, not the TACC resource, since your laptop likely does not have a fixed IP address.
93
65
94
-
#### Example 1
95
-
96
-
TACC account holder `bjones` uploads a local file, `mylaptopfile`, to his home directory on Stampede3.
66
+
Example 1: TACC account holder `bjones` uploads a local file, `mylaptopfile`, to his home directory on Stampede3.
### Transfer files between TACC HPC resources { #txf2 }
79
+
### 2. Transfer files between TACC HPC resources { #txf2 }
118
80
119
81
Transfer files between TACC HPC resources, e.g. Stampede3 to Vista.
120
-
If you have an allocation on more than one TACC HPC resource, and want to move a file from one home directory or another, make use of the shared $WORK file system.
82
+
83
+
If you have an allocation on more than one TACC HPC resource, and want to move a file from one home directory or another, make use of the shared `$WORK` file system.
121
84
122
85
Example: copy `myfile` in my home directory on Stampede3 to my account on Vista.
123
86
@@ -129,23 +92,28 @@ vista$ exit
129
92
stampede3$
130
93
```
131
94
132
-
### Transfer files between institutions { #txf3 }
95
+
### 3. Transfer Files Between Institutions { #txf3 }
133
96
134
-
If you are a researcher with
135
-
Transfer files between institutions e.g. from TACC to Cornell University.
97
+
If you are a researcher with data located at multiple institutions, we recommend you use Globus for large data set transfers to TACC. You will need to authenticate with your institution. [See how to set up your TACC account to use Globus](#globus).
136
98
137
-
If you wish to transfer files between institutions, we recommend you use Globus for large data set transfers. You will need to authenticate with your institution. See how to set up your TACC account to use Globus.
99
+
### 4. Backup/Transfer files between TACC HPC and TACC storage resources { #txf4 }
138
100
139
-
### Backup/Transfer files between TACC HPC and TACC storage resources { #txf4 }
101
+
To backup files to TACC's Ranch archive, consult the [Ranch User Guide](https://docs.tacc.utexas.edu/hpc/corral/#transferring). Consult the [Corral User Guide][TACCCORRALUG] for instructions on transferring between Lonestar6 and Corral.
140
102
141
-
Transfer files between TACC HPC and storage resources e.g. from Lonestar6 to Corral, Stampede3 to Ranch.
103
+
### SSH Command-Line Examples { #ssh }
142
104
143
-
To backup files to TACC's Ranch archive, consult the [Ranch User Guide](https://docs.tacc.utexas.edu/hpc/corral/#transferring). Consult the [Corral User Guide][TACCCORRALUG] for instructions on transferring between Lonestar6 and Corral.
105
+
Transfer files between TACC HPC resources and other Linux-based systems using either [`scp`](http://linux.com/learn/intro-to-linux/2017/2/how-securely-transfer-files-between-servers-scp) or [`rsync`](http://linux.com/learn/get-know-rsync). Both `scp` and `rsync` are available in the Mac Terminal app. Windows SSH clients typically include `scp`-based file transfer capabilities.
144
106
107
+
The `scp` and `rsync` commands are standard UNIX data transfer mechanisms used to transfer moderate size files and data collections between systems. These applications use a single thread to transfer each file one at a time. The `scp` and `rsync` utilities are typically the best methods when transferring Gigabytes of data. For larger data transfers, parallel data transfer mechanisms, e.g., Globus, can often improve total throughput and reliability.
145
108
146
-
#### Advanced `scp`
109
+
!!! note
110
+
It is possible to use these command line tools if your local machine runs Windows, but you will need to use an SSH client (ex. [CyberDuck][DOWNLOADCYBERDUCK]).
111
+
112
+
To simplify the data transfer process, we recommend that Windows users follow the <ahref="#datatransfer-cyberduck">How to Transfer Data with Cyberduck</a> guide as detailed below.
147
113
148
114
115
+
## Advanced `scp` Examples
116
+
149
117
The Linux `scp` (secure copy) utility is a component of the OpenSSH suite. Assuming your Lonestar6 username is `bjones`, a simple `scp` transfer that copies a file named `myfile` from your local Linux system to Lonestar6 `$HOME` would look like this:
150
118
151
119
```cmd-line
@@ -199,7 +167,7 @@ Consult the `scp` man pages for more information:
199
167
login1$ man scp
200
168
```
201
169
202
-
####Transferring Files with `rsync` { #transferring-rsync }
170
+
## Transferring Files with `rsync` { #transferring-rsync }
203
171
204
172
The `rsync` (remote synchronization) utility is another way to keep your data up to date. In contrast to `scp`, `rsync` transfers only the actual changed parts of a file (instead of transferring an entire file). Hence, this selective method of data transfer can be much more efficient than `scp`. The following example demonstrates usage of the `rsync` command for transferring a file named `myfile.c` from its current location on Stampede to Frontera's `$DATA` directory.
205
173
@@ -246,10 +214,6 @@ login1$ man rsync
246
214
!!! Warning
247
215
When executing multiple instantiations of any of the commands listed above, `scp`, `sftp` and `rsync`, limit your active transfers to no more than 2-3 processes at a time.
248
216
249
-
<!--
250
-
globus is a GUI and is excellent for transferring l
251
-
252
-
TACC staff recommends that you start with SSH-based transfer as this requires the least setup and utilizes the TACC authentication system. Globus uses its own authentication system and will require additional setup steps, [outlined below](#globus). -->
0 commit comments