SDRF-Proteomics Quick Start Tutorial

Table of Contents

1. What is SDRF-Proteomics?
- 1.1. The Core Concept
2. Understanding Templates
- 2.1. Core Templates (Organism-based)
- 2.2. Specialized Templates (Experiment-based)
3. Understanding Column Types
4. Step 1: Choose Your Template
5. Step 2: Fill in Sample Information
6. Step 3: Fill in Data File Information
7. Step 4: Define Your Experimental Variables
- 7.1. What is a Factor Value?
8. Step 5: Validate Your File
- 8.1. Option 1: Command Line (Recommended)
- 8.2. Option 2: Validate Against a Template
9. Complete Example
10. Common Scenarios
11. Common Mistakes to Avoid
12. Finding the Right Terms (Ontologies)
13. Getting Help
14. Next Steps

This tutorial will guide you through creating your first SDRF file step-by-step. By the end, you’ll understand the format and have a working file for your experiment.

Estimated time: 10-15 minutes

1. What is SDRF-Proteomics?

SDRF (Sample and Data Relationship Format) is a simple tab-separated file (like Excel) that describes your proteomics experiment. It connects your biological samples to your mass spectrometry data files.

1.1. The Core Concept

Think of SDRF as a table where:

Each row = one sample-to-file relationship
Each column = one piece of information about that sample or file

That’s it! No programming, no complex formats — just a spreadsheet.

ℹ️	Why use SDRF? When you submit data to repositories like PRIDE, SDRF ensures your experiment is fully described and can be reanalyzed by others. It’s becoming a standard requirement for proteomics data submission. For complete details, see the full specification.

2. Understanding Templates

Templates are pre-made SDRF files with the right columns already set up for your experiment type. Instead of figuring out which columns you need, just pick a template and fill in your data.

2.1. Core Templates (Organism-based)

These define the basic biological information needed based on your organism:

Template	Description
human	Includes columns for age, sex, ancestry. View template
vertebrates	For mouse, rat, zebrafish, etc. View template
invertebrates	For insects, worms, etc. View template
plants	Includes plant-specific metadata. View template

2.2. Specialized Templates (Experiment-based)

Add extra columns for specific experimental workflows:

Template	Description
cell-lines	Adds cell line identifiers and Cellosaurus accessions. View template
dia-acquisition	DIA-specific parameters. View template
immunopeptidomics	MHC typing and related metadata. View template
crosslinking	XL-MS specific columns. View template
single-cell	Single-cell proteomics metadata. View template

💡	You can combine templates! Start with a core template (e.g., "human") and add columns from specialized templates as needed.

Download all templates from: GitHub Templates

3. Understanding Column Types

SDRF columns follow a naming pattern that tells you what kind of information they contain:

3.1. characteristics[…] — Sample Metadata

Describe the biological sample:

characteristics[organism] — Species name
characteristics[disease] — Disease or "normal"
characteristics[organism part] — Tissue or organ

See Sample Metadata Guidelines for all available characteristics.

3.2. comment[…] — Data File Metadata

Describe the data file or MS run:

comment[data file] — Raw file name
comment[instrument] — Mass spectrometer
comment[label] — Labeling type

See MS-Proteomics Template for all available comments.

3.3. factor value[…] — Experimental Variables

The experimental variable you’re comparing:

factor value[disease] — Comparing disease states
factor value[compound] — Drug treatment study
factor value[time] — Time course experiment

❗	Column names are case-sensitive and spacing matters! `characteristics[organism]` — Correct `Characteristics[organism]` — Wrong (capital C) `characteristics [organism]` — Wrong (space before bracket)

4. Step 1: Choose Your Template

Answer this question to find your template:

Your Sample Type	Template	Link
Human samples	human	Download
Mouse, rat, zebrafish	vertebrates	Download
Insects, worms	invertebrates	Download
Plants	plants	Download
Cell lines	cell-lines	Download
Other / Not sure	ms-proteomics	Download

5. Step 2: Fill in Sample Information

Open your template in Excel, Google Sheets, or any spreadsheet software. For each sample, fill in:

Column	What to Write	Example	Notes
`source name`	A unique identifier for your sample	patient_001	Must be unique across the file
`characteristics[organism]`	Species name (lowercase)	homo sapiens	Use scientific name from NCBI Taxonomy
`characteristics[organism part]`	Tissue or body part	liver	Use terms from UBERON
`characteristics[disease]`	Disease name, or "normal"	hepatocellular carcinoma	Use "normal" for healthy samples

💡	Don’t stress about finding exact ontology terms. Write the common name (e.g., "liver", "breast cancer") and the validator will check it for you. You can always refine later.

6. Step 3: Fill in Data File Information

For each row, also fill in information about the raw file:

Column	What to Write	Example	Notes
`assay name`	A name for this MS run	run_001	Often same as source name
`comment[label]`	Type of labeling	label free sample	Or TMT126, TMT127N, etc.
`comment[instrument]`	Mass spectrometer used	Q Exactive HF	From PSI-MS ontology
`comment[data file]`	Your raw file name	sample_001.raw	Exact filename including extension

ℹ️

One row = one sample-to-file relationship. In multiplexed experiments (TMT/iTRAQ), multiple samples share the same file, so you’ll have multiple rows pointing to the same raw file. In fractionated experiments, one sample spans multiple files, so you’ll have multiple rows for the same sample.

For more details, see SDRF File Format in the specification.

7. Step 4: Define Your Experimental Variables

Factor values tell analysis tools what you’re comparing in your experiment. This is crucial for downstream analysis!

7.1. What is a Factor Value?

A factor value is the experimental variable you’re studying. If your experiment compares cancer vs. healthy tissue, then disease is your factor. The values would be "hepatocellular carcinoma" and "normal".

Experiment Type	Factor Value Column	Example Values
Disease vs. healthy	`factor value[disease]`	cancer, normal
Drug treatment	`factor value[compound]`	aspirin, DMSO
Time course	`factor value[time]`	0 hour, 6 hour, 24 hour
Tissue comparison	`factor value[organism part]`	liver, kidney, heart
Multiple variables	Multiple factor columns	Both disease AND time

🔥	Factor values often duplicate information from characteristics columns — and that’s correct! The factor value explicitly marks which characteristic is the experimental variable.

8. Step 5: Validate Your File

Save your file as .sdrf.tsv and validate it:

8.1. Option 1: Command Line (Recommended)

# Install the validator
pip install sdrf-pipelines

# Validate your file
parse_sdrf validate-sdrf --sdrf_file your_file.sdrf.tsv

8.2. Option 2: Validate Against a Template

# Validate against a specific template
parse_sdrf validate-sdrf --sdrf_file your_file.sdrf.tsv --template human

This checks that all required columns for your template are present.

Validation checks for:

Correct column names and formatting
Valid ontology terms (organism, disease, etc.)
Required columns present
No empty cells where values are required

For more validation options, see Tool Support.

9. Complete Example

Here’s a minimum valid SDRF file for a human liver cancer study, including all required columns from the human template:

source name	characteristics[organism]	characteristics[organism part]	characteristics[disease]	characteristics[biological replicate]	characteristics[age]	characteristics[sex]	assay name	technology type	comment[proteomics data acquisition method]	comment[label]	comment[instrument]	comment[cleavage agent details]	comment[fraction identifier]	comment[technical replicate]	comment[data file]	factor value[disease]
patient_001	homo sapiens	liver	hepatocellular carcinoma	1	55Y	male	run_001	proteomic profiling by mass spectrometry	Data-dependent acquisition	label free sample	Q Exactive HF	NT=Trypsin;AC=MS:1001251	1	1	patient_001.raw	hepatocellular carcinoma
patient_002	homo sapiens	liver	hepatocellular carcinoma	2	62Y	female	run_002	proteomic profiling by mass spectrometry	Data-dependent acquisition	label free sample	Q Exactive HF	NT=Trypsin;AC=MS:1001251	1	1	patient_002.raw	hepatocellular carcinoma
control_001	homo sapiens	liver	normal	1	48Y	male	run_003	proteomic profiling by mass spectrometry	Data-dependent acquisition	label free sample	Q Exactive HF	NT=Trypsin;AC=MS:1001251	1	1	control_001.raw	normal
control_002	homo sapiens	liver	normal	2	51Y	female	run_004	proteomic profiling by mass spectrometry	Data-dependent acquisition	label free sample	Q Exactive HF	NT=Trypsin;AC=MS:1001251	1	1	control_002.raw	normal

Required columns in this example:

Sample metadata: source name, organism, organism part, disease, biological replicate, age, sex
Data file metadata: assay name, technology type, proteomics data acquisition method, label, instrument, cleavage agent, fraction identifier, technical replicate, data file
Factor value: the experimental variable being compared (disease)

What this example tells us:

2 biological replicates per condition (numbered 1-2 within each factor value group)
No fractionation (fraction identifier = 1 for all)
Single injection per sample (technical replicate = 1)
Label-free DDA proteomics with trypsin digestion on Q Exactive HF

10. Common Scenarios

10.1. TMT/iTRAQ Multiplexed Samples

For multiplexed experiments, multiple samples share the same raw file. Each sample gets its own row with a different label:

source name	comment[label]	comment[data file]
sample_A	TMT126	multiplex_1.raw
sample_B	TMT127N	multiplex_1.raw
sample_C	TMT127C	multiplex_1.raw

For complete TMT/iTRAQ documentation, see Isobaric Labelling in the specification.

10.2. Fractionated Samples

If you fractionated your sample before MS, add a comment[fraction identifier] column:

source name	comment[fraction identifier]	comment[data file]
sample_001	1	sample_001_F01.raw
sample_001	2	sample_001_F02.raw
sample_001	3	sample_001_F03.raw

For more details, see Fractions in the specification.

10.3. Technical Replicates

Same sample run multiple times? Use the same source name with different assay names and data files:

source name	assay name	comment[technical replicate]	comment[data file]
sample_001	sample_001_rep1	1	sample_001_rep1.raw
sample_001	sample_001_rep2	2	sample_001_rep2.raw

10.4. Cell Line Experiments

For cell lines, include the cell line name and Cellosaurus accession:

source name	characteristics[cell line]	characteristics[cellosaurus accession]
hela_001	HeLa	CVCL_0030
hek_001	HEK293	CVCL_0045

Find accessions at Cellosaurus.

For the complete cell lines template, see Cell Lines Template.

11. Common Mistakes to Avoid

Mistake	Correct	Explanation
`Source Name` (capitalized)	`source name` (lowercase)	Column names must be lowercase
`characteristics [organism]` (space)	`characteristics[organism]` (no space)	No space before the bracket
`control` for healthy samples	`normal` for healthy samples	Use "normal" for healthy tissue/samples
Empty cells	`not available` or `not applicable`	Never leave cells empty
`sourcename` (no space)	`source name` (with space)	Two words separated by a space

12. Finding the Right Terms (Ontologies)

SDRF uses ontology terms to ensure consistency across datasets. Here’s where to find them:

For This Field	Look Here	Examples
Organism names	NCBI Taxonomy	homo sapiens, mus musculus
Tissue/organ names	UBERON	liver, brain, blood
Disease names	MONDO	breast cancer, diabetes
Cell types	Cell Ontology	T cell, hepatocyte
Instruments & methods	PSI-MS	Q Exactive, Orbitrap
Cell lines	Cellosaurus	HeLa (CVCL_0030), HEK293

💡	Don’t worry about finding the exact ontology term initially. Just write the common name (e.g., "liver", "breast cancer") and the validator will check it for you.

13. Getting Help

13.1. Examples

Browse real SDRF files from published datasets in ProteomeXchange:

Annotated Projects on GitHub

13.2. Questions

Open an issue on GitHub to reach the bigbio team

13.3. Full Documentation

For advanced use cases and complete details:

14. Next Steps

Once you’re comfortable with the basics:

Explore templates for your specific experiment type: All Templates
Read the metadata guidelines for detailed field descriptions:
Learn about tool support for converting SDRF to analysis pipelines: Tool Support
See the full specification for advanced use cases: SDRF-Proteomics Specification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDRF-Proteomics Quick Start Tutorial

1. What is SDRF-Proteomics?

1.1. The Core Concept

2. Understanding Templates

2.1. Core Templates (Organism-based)

2.2. Specialized Templates (Experiment-based)

3. Understanding Column Types

3.1. characteristics[…] — Sample Metadata

3.2. comment[…] — Data File Metadata

3.3. factor value[…] — Experimental Variables

4. Step 1: Choose Your Template

5. Step 2: Fill in Sample Information

6. Step 3: Fill in Data File Information

7. Step 4: Define Your Experimental Variables

7.1. What is a Factor Value?

8. Step 5: Validate Your File

8.1. Option 1: Command Line (Recommended)

8.2. Option 2: Validate Against a Template

9. Complete Example

10. Common Scenarios

10.1. TMT/iTRAQ Multiplexed Samples

10.2. Fractionated Samples

10.3. Technical Replicates

10.4. Cell Line Experiments

11. Common Mistakes to Avoid

12. Finding the Right Terms (Ontologies)

13. Getting Help

13.1. Examples

13.2. Questions

13.3. Full Documentation

14. Next Steps

FilesExpand file tree

quickstart.adoc

Latest commit

History

quickstart.adoc

File metadata and controls

SDRF-Proteomics Quick Start Tutorial

1. What is SDRF-Proteomics?

1.1. The Core Concept

2. Understanding Templates

2.1. Core Templates (Organism-based)

2.2. Specialized Templates (Experiment-based)

3. Understanding Column Types

3.1. characteristics[…​] — Sample Metadata

3.2. comment[…​] — Data File Metadata

3.3. factor value[…​] — Experimental Variables

4. Step 1: Choose Your Template

5. Step 2: Fill in Sample Information

6. Step 3: Fill in Data File Information

7. Step 4: Define Your Experimental Variables

7.1. What is a Factor Value?

8. Step 5: Validate Your File

8.1. Option 1: Command Line (Recommended)

8.2. Option 2: Validate Against a Template

9. Complete Example

10. Common Scenarios

10.1. TMT/iTRAQ Multiplexed Samples

10.2. Fractionated Samples

10.3. Technical Replicates

10.4. Cell Line Experiments

11. Common Mistakes to Avoid

12. Finding the Right Terms (Ontologies)

13. Getting Help

13.1. Examples

13.2. Questions

13.3. Full Documentation

14. Next Steps

3.1. characteristics[…] — Sample Metadata

3.2. comment[…] — Data File Metadata

3.3. factor value[…] — Experimental Variables