Documentation
Analysis & Additional Resources

Sample and File Nomenclature

SMaHT Sample and File Nomenclature

Overview

The SMaHT sample and file names are the primary identifiers of biosamples from the Tissue Procurement Center (TPC) and files generated by the Data Analysis Center (DAC) of the SMaHT Network (“Network”).

The SMaHT sample and file names contain identifiers that are unique and immovable, as well as semi-human-readable codes that correspond to metadata. This document describes the naming schema and tables of codes for each metadata type that are included in sample and file names. The metadata fields in the sample and file names are delimited by a hyphen (“-”). “#” indicates a single-digit integer number, and “A” indicates an alphabetical letter in this document.

Schema Documentation

DownloadVersionRelease dateFilename
2.1 (latest)01/02/2026SMaHT Sample and File Nomenclature v2.1.pdf

Part 1: Sample Schema and Protocol ID Tables

Naming Schema

Nomenclature Part 1

Table 1. Benchmarking cell line codes.

Kit/Sample IDCell line description
COLO829TCOLO829 tumor cell line
COLO829BLCOLO829BL normal lymphoblast cell line
COLO829BLT50COLO829 1:50 admixture
HAPMAP6Cell admixture of six HapMap cell lines
LBLA2LB-LA2 fibroblast cell line
LBIPSC1iPSC line from clone #1 derived from the LB-LA2 fibroblast cell line
LBIPSC2iPSC line from clone #2 derived from the LB-LA2 fibroblast cell line
LBIPSC4iPSC line from clone #4 derived from the LB-LA2 fibroblast cell line
LBIPSC52iPSC line from clone #52 derived from the LB-LA2 fibroblast cell line
LBIPSC60iPSC line from clone #60 derived from the LB-LA2 fibroblast cell line

Table 2A. Protocol IDs for SMaHT benchmarking tissues.

Protocol IDTissue Name for ContainerPreservationNotes
1ALiverSnap FrozenHomogenate and non-homogenate samples
1BunassignedN/A
1CLiverFixed
1DLungSnap FrozenHomogenate and non-homogenate samples
1EunassignedN/A
1FLungFixed
1GColonSnap FrozenHomogenate and non-homogenate samples
1HunassignedN/A
1IColonFixed
1JSkinSnap FrozenTissue specimen (~10 cm)
1KSkinSnap FrozenTissue core from the intact tissue was made (~1 cm)
1LSkinFixed
1M/N/O/PunassignedN/A
1QBrain, Frontal LobeSnap FrozenHomogenate and non-homogenate samples

Table 2B. Protocol IDs for SMaHT production tissues.

Protocol IDTissue Name for ContainerPreservation
3ABlood, WholeSnap Frozen
3BBuccal SwabFresh
3CEsophagusSnap Frozen
3DEsophagusFixed
3EColon, AscendingSnap Frozen
3FColon, AscendingFixed
3GColon, DescendingSnap Frozen
3HColon, DescendingFixed
3ILiver SampleSnap Frozen
3JLiver SampleFixed
3KAdrenal Gland, LeftSnap Frozen
3LAdrenal Gland, LeftFixed
3MAdrenal Gland, RightSnap Frozen
3NAdrenal Gland, RightFixed
3OAorta, AbdominalSnap Frozen
3PAorta, AbdominalFixed
3QLungSnap Frozen
3RLungFixed
3SHeart, LVSnap Frozen
3THeart, LVFixed
3UTestis, LeftSnap Frozen
3VTestis, LeftFixed
3WTestis, RightSnap Frozen
3XTestis, RightFixed
3YOvary, LeftSnap Frozen
3ZOvary, LeftFixed
3AAOvary, RightSnap Frozen
3ABOvary, RightFixed
3AC*Dermal FibroblastCultured Cells
3ADSkin, CalfSnap Frozen
3AESkin, CalfFixed
3AFSkin, AbdomenSnap Frozen
3AGSkin, AbdomenFixed
3AHMuscleSnap Frozen
3AIMuscleFixed
3AJBrainFresh
3AKFrontal Lobe, Brain, Left hemisphereSnap Frozen
3ALTemporal Lobe, Brain, Left hemisphereSnap Frozen
3AMCerebellum, Brain, Left hemisphereSnap Frozen
3ANHippocampus, Brain, Left hemisphereSnap Frozen
3AOHippocampus, Brain, Right hemisphereSnap Frozen
3APFrontal Lobe, Brain, Left hemisphereFixed
3AQTemporal Lobe, Brain, Left hemisphereFixed
3ARCerebellum, Brain, Left hemisphereFixed
3ASHippocampus, Brain, Left hemisphereFixed
3ATHippocampus, Brain, Right hemisphereFixed
* 3AC = Fibroblasts are isolated from fresh calf skin.

Part 2: Base Schema, Platform, and Assay Codes

Nomenclature Part 2

Table 3A. Sequencing platform codes.

SMaHT codeSequencing platform
AIllumina NovaSeq X, Illumina NovaSeq X Plus
BPacBio Revio HiFi
CIllumina NovaSeq 6000
DONT PromethION 24
EONT PromethION 2 Solo
FONT MinION Mk1B
GIllumina HiSeq X
H [deprecated]Illumina NovaSeq X Plus
IBGI DNBSEQ-G400
JElement AVITI
KIllumina NextSeq 2000
LPacBio Sequel IIe
MUltima Genomics UG 100
(set the codes as data are generated on different sequencing platforms and submitted to DAC)PacBio Onso

Table 3B. Experimental assay codes.

CodeAssay NameDescription
000(Null or not-applicable)
[001-100: DNA-based assays]
001WGSDNA, PCR-free, Bulk, Whole genome sequencing (WGS)
002PCR WGSDNA PCR, Bulk, WGS
003Ultra-Long WGSDNA, PCR-free, Bulk, Ultra-Long WGS
004Fiber-seqDNA, PCR-free, Bulk, Fiber-seq
005Hi-CDNA, Bulk, Hi-C
006Bulk NTSeqDNA, Bulk, NTSeq
007CODECDNA, Bulk, Duplex-seq, CODEC
008Bot-seqDNA, Bulk, Duplex-seq, Bot-seq
009NanoSeqDNA, Bulk, Duplex-seq, NanoSeq
010scNanoSeqDNA, Single-cell, Duplex-seq, scNanoSeq
011DLP+DNA, Single-cell, DLP+
012Microbulk MALBAC WGSDNA, Microbulk, MALBAC-amplified WGS
013Single-cell MALBAC WGSDNA, Single-cell, MALBAC-amplified WGS
014Microbulk PTA WGSDNA, Microbulk, PTA-amplified WGS
015Single-cell PTA WGSDNA, Single-cell, PTA-amplified WGS
016scDip-CDNA, Single-cell, scDip-C
017CompDuplex-seqDNA, Bulk, Duplex-seq, CompDuplex-seq
018scCompDuplex-seqDNA, Single-cell, Duplex-seq, scCompDuplex-seq
019Strand-seqDNA, Bulk, Strand-seq
020scStrand-seqDNA, Single-cell, scStrand-seq
021HiDEF-seqDNA, Bulk, Duplex-seq, HiDEF-seq
022HAT-seqDNA, Bulk, HAT-seq
023Microbulk HAT-seqDNA, Microbulk, PTA-amplified HAT-seq
024scHAT-seqDNA, Single-cell, PTA-amplified, HAT-seq
025VISTA-seqDNA, Bulk, Duplex-seq, VISTA-seq
026Microbulk VISTA-seqDNA, Microbulk, Duplex-seq, VISTA-seq
027scVISTA-seqDNA, Single-cell, Duplex-seq, VISTA-seq
028TEnCATSDNA, Bulk, TEnCATS
029L1-ONTDNA, Bulk, L1-ONT
030ppmSeqDNA, Bulk, Duplex-seq, ppmSeq
[101-200: RNA-based assays]
101RNA-seqRNA, Bulk, RNA-seq
102KinnexRNA, Bulk, Kinnex
103snRNA-seqRNA, Single-cell, snRNA-seq
104STORM-SeqRNA, Single-cell, STORM-seq
105Tranquil-SeqRNA, Single-cell, Tranquil-seq
[201-300: Chromatin-based assays]
201ATAC-seqChromatin, Bulk, ATAC-seq
202CUT&TagChromatin, Bulk, CUT&Tag
203varCUT&TagChromatin, Bulk, varCUT&Tag
204sc-varCUT&TagChromatin, Single-cell, sc-varCUT&Tag

Table 4. SMaHT data generation center codes.

CodeCategoryInstituteContact PI
bcmGCCBaylor College of MedicineRichard Gibbs
broadGCCBroad InstituteKristin Ardlie
nygcGCCNew York Genome CenterSoren Germer
uwscGCCUniversity of Washington & Seattle Children’s HospitalJimmy Bennett
washuGCCWashington University in St. LouisTing Wang
bcm1TTDBaylor College of MedicineChuck Zong
bcm2TTDBaylor College of MedicineFritz Sedlazeck
bch1TTDBoston Children’s HospitalChristopher Walsh
bch2TTDBoston Children’s HospitalSangita Choudhury
broad1TTDBroad InstituteFei Chen
cwruTTDCase Western Reserve UniversityFulai Jin
dfciTTDDana-Farber Cancer InstituteKathleen Burns
mayoTTDMayo ClinicAlexej Arbyzov
nyuTTDNew York UniversityGilad Evrony
stfdTTDStanford UniversityAlexander Urban
umassTTDUniversity of MassachusettsThomas Fazzio
umichTTDUniversity of MichiganRyan Mills
uutahTTDUniversity of UtahGabor Marth
wcnygcTTDWeill Cornell Medicine & New York Genome CenterDan Landau
dacDACHarvard Medical SchoolPeter Park
tpcTPCNational Disease Research Interchange (NDRI)Thomas Bell

Part 3: File Name breakdown

Nomenclature Part 3

Table 5. Genome version (A) and variant type (B) tables.

(A)
Reference GenomeCode
GRCh38 without ALT contigsGRCh38
GRCh38 with ALT contigsGRCh38_ALT
T2T CHM13CHM13
Donor-specific genome assemblyDSA
(B)
Data TypeCode
Reference conversion[Source]To[Target]
Donor-specific genome assembly haplotypehapX, hapY, hapX1, hapX2
Gene expression levelgene
Transcript isoform expression level or isoform informationisoform
Junction annotationsjunction
Full-length, non-concatemer (FLNC) Kinnex readsflnc
Aligned consensus Duplex-Seq BAMconsensus

Example Files with the SMaHT Nomenclature

Nomenclature_ExampleFiles