SMaHT Sample and File Nomenclature
Overview
The SMaHT sample and file names are the primary identifiers of biosamples from the Tissue Procurement Center (TPC) and files generated by the Data Analysis Center (DAC) of the SMaHT Network (“Network”).
The SMaHT sample and file names contain identifiers that are unique and immovable, as well as semi-human-readable codes that correspond to metadata. This document describes the naming schema and tables of codes for each metadata type that are included in sample and file names. The metadata fields in the sample and file names are delimited by a hyphen (“-”). “#” indicates a single-digit integer number, and “A” indicates an alphabetical letter in this document.
Schema Documentation
| Download | Version | Release date | Filename |
|---|---|---|---|
| 2.1 (latest) | 01/02/2026 | SMaHT Sample and File Nomenclature v2.1.pdf |
Part 1: Sample Schema and Protocol ID Tables
Naming Schema

Table 1. Benchmarking cell line codes.
| Kit/Sample ID | Cell line description |
|---|---|
| COLO829T | COLO829 tumor cell line |
| COLO829BL | COLO829BL normal lymphoblast cell line |
| COLO829BLT50 | COLO829 1:50 admixture |
| HAPMAP6 | Cell admixture of six HapMap cell lines |
| LBLA2 | LB-LA2 fibroblast cell line |
| LBIPSC1 | iPSC line from clone #1 derived from the LB-LA2 fibroblast cell line |
| LBIPSC2 | iPSC line from clone #2 derived from the LB-LA2 fibroblast cell line |
| LBIPSC4 | iPSC line from clone #4 derived from the LB-LA2 fibroblast cell line |
| LBIPSC52 | iPSC line from clone #52 derived from the LB-LA2 fibroblast cell line |
| LBIPSC60 | iPSC line from clone #60 derived from the LB-LA2 fibroblast cell line |
Table 2A. Protocol IDs for SMaHT benchmarking tissues.
| Protocol ID | Tissue Name for Container | Preservation | Notes |
|---|---|---|---|
| 1A | Liver | Snap Frozen | Homogenate and non-homogenate samples |
| 1B | unassigned | N/A | |
| 1C | Liver | Fixed | |
| 1D | Lung | Snap Frozen | Homogenate and non-homogenate samples |
| 1E | unassigned | N/A | |
| 1F | Lung | Fixed | |
| 1G | Colon | Snap Frozen | Homogenate and non-homogenate samples |
| 1H | unassigned | N/A | |
| 1I | Colon | Fixed | |
| 1J | Skin | Snap Frozen | Tissue specimen (~10 cm) |
| 1K | Skin | Snap Frozen | Tissue core from the intact tissue was made (~1 cm) |
| 1L | Skin | Fixed | |
| 1M/N/O/P | unassigned | N/A | |
| 1Q | Brain, Frontal Lobe | Snap Frozen | Homogenate and non-homogenate samples |
Table 2B. Protocol IDs for SMaHT production tissues.
| Protocol ID | Tissue Name for Container | Preservation |
|---|---|---|
| 3A | Blood, Whole | Snap Frozen |
| 3B | Buccal Swab | Fresh |
| 3C | Esophagus | Snap Frozen |
| 3D | Esophagus | Fixed |
| 3E | Colon, Ascending | Snap Frozen |
| 3F | Colon, Ascending | Fixed |
| 3G | Colon, Descending | Snap Frozen |
| 3H | Colon, Descending | Fixed |
| 3I | Liver Sample | Snap Frozen |
| 3J | Liver Sample | Fixed |
| 3K | Adrenal Gland, Left | Snap Frozen |
| 3L | Adrenal Gland, Left | Fixed |
| 3M | Adrenal Gland, Right | Snap Frozen |
| 3N | Adrenal Gland, Right | Fixed |
| 3O | Aorta, Abdominal | Snap Frozen |
| 3P | Aorta, Abdominal | Fixed |
| 3Q | Lung | Snap Frozen |
| 3R | Lung | Fixed |
| 3S | Heart, LV | Snap Frozen |
| 3T | Heart, LV | Fixed |
| 3U | Testis, Left | Snap Frozen |
| 3V | Testis, Left | Fixed |
| 3W | Testis, Right | Snap Frozen |
| 3X | Testis, Right | Fixed |
| 3Y | Ovary, Left | Snap Frozen |
| 3Z | Ovary, Left | Fixed |
| 3AA | Ovary, Right | Snap Frozen |
| 3AB | Ovary, Right | Fixed |
| 3AC* | Dermal Fibroblast | Cultured Cells |
| 3AD | Skin, Calf | Snap Frozen |
| 3AE | Skin, Calf | Fixed |
| 3AF | Skin, Abdomen | Snap Frozen |
| 3AG | Skin, Abdomen | Fixed |
| 3AH | Muscle | Snap Frozen |
| 3AI | Muscle | Fixed |
| 3AJ | Brain | Fresh |
| 3AK | Frontal Lobe, Brain, Left hemisphere | Snap Frozen |
| 3AL | Temporal Lobe, Brain, Left hemisphere | Snap Frozen |
| 3AM | Cerebellum, Brain, Left hemisphere | Snap Frozen |
| 3AN | Hippocampus, Brain, Left hemisphere | Snap Frozen |
| 3AO | Hippocampus, Brain, Right hemisphere | Snap Frozen |
| 3AP | Frontal Lobe, Brain, Left hemisphere | Fixed |
| 3AQ | Temporal Lobe, Brain, Left hemisphere | Fixed |
| 3AR | Cerebellum, Brain, Left hemisphere | Fixed |
| 3AS | Hippocampus, Brain, Left hemisphere | Fixed |
| 3AT | Hippocampus, Brain, Right hemisphere | Fixed |
* 3AC = Fibroblasts are isolated from fresh calf skin.
Part 2: Base Schema, Platform, and Assay Codes

Table 3A. Sequencing platform codes.
| SMaHT code | Sequencing platform |
|---|---|
| A | Illumina NovaSeq X, Illumina NovaSeq X Plus |
| B | PacBio Revio HiFi |
| C | Illumina NovaSeq 6000 |
| D | ONT PromethION 24 |
| E | ONT PromethION 2 Solo |
| F | ONT MinION Mk1B |
| G | Illumina HiSeq X |
| H [deprecated] | Illumina NovaSeq X Plus |
| I | BGI DNBSEQ-G400 |
| J | Element AVITI |
| K | Illumina NextSeq 2000 |
| L | PacBio Sequel IIe |
| M | Ultima Genomics UG 100 |
| (set the codes as data are generated on different sequencing platforms and submitted to DAC) | PacBio Onso |
Table 3B. Experimental assay codes.
| Code | Assay Name | Description |
|---|---|---|
| 000 | (Null or not-applicable) | |
| [001-100: DNA-based assays] | ||
| 001 | WGS | DNA, PCR-free, Bulk, Whole genome sequencing (WGS) |
| 002 | PCR WGS | DNA PCR, Bulk, WGS |
| 003 | Ultra-Long WGS | DNA, PCR-free, Bulk, Ultra-Long WGS |
| 004 | Fiber-seq | DNA, PCR-free, Bulk, Fiber-seq |
| 005 | Hi-C | DNA, Bulk, Hi-C |
| 006 | Bulk NTSeq | DNA, Bulk, NTSeq |
| 007 | CODEC | DNA, Bulk, Duplex-seq, CODEC |
| 008 | Bot-seq | DNA, Bulk, Duplex-seq, Bot-seq |
| 009 | NanoSeq | DNA, Bulk, Duplex-seq, NanoSeq |
| 010 | scNanoSeq | DNA, Single-cell, Duplex-seq, scNanoSeq |
| 011 | DLP+ | DNA, Single-cell, DLP+ |
| 012 | Microbulk MALBAC WGS | DNA, Microbulk, MALBAC-amplified WGS |
| 013 | Single-cell MALBAC WGS | DNA, Single-cell, MALBAC-amplified WGS |
| 014 | Microbulk PTA WGS | DNA, Microbulk, PTA-amplified WGS |
| 015 | Single-cell PTA WGS | DNA, Single-cell, PTA-amplified WGS |
| 016 | scDip-C | DNA, Single-cell, scDip-C |
| 017 | CompDuplex-seq | DNA, Bulk, Duplex-seq, CompDuplex-seq |
| 018 | scCompDuplex-seq | DNA, Single-cell, Duplex-seq, scCompDuplex-seq |
| 019 | Strand-seq | DNA, Bulk, Strand-seq |
| 020 | scStrand-seq | DNA, Single-cell, scStrand-seq |
| 021 | HiDEF-seq | DNA, Bulk, Duplex-seq, HiDEF-seq |
| 022 | HAT-seq | DNA, Bulk, HAT-seq |
| 023 | Microbulk HAT-seq | DNA, Microbulk, PTA-amplified HAT-seq |
| 024 | scHAT-seq | DNA, Single-cell, PTA-amplified, HAT-seq |
| 025 | VISTA-seq | DNA, Bulk, Duplex-seq, VISTA-seq |
| 026 | Microbulk VISTA-seq | DNA, Microbulk, Duplex-seq, VISTA-seq |
| 027 | scVISTA-seq | DNA, Single-cell, Duplex-seq, VISTA-seq |
| 028 | TEnCATS | DNA, Bulk, TEnCATS |
| 029 | L1-ONT | DNA, Bulk, L1-ONT |
| 030 | ppmSeq | DNA, Bulk, Duplex-seq, ppmSeq |
| [101-200: RNA-based assays] | ||
| 101 | RNA-seq | RNA, Bulk, RNA-seq |
| 102 | Kinnex | RNA, Bulk, Kinnex |
| 103 | snRNA-seq | RNA, Single-cell, snRNA-seq |
| 104 | STORM-Seq | RNA, Single-cell, STORM-seq |
| 105 | Tranquil-Seq | RNA, Single-cell, Tranquil-seq |
| [201-300: Chromatin-based assays] | ||
| 201 | ATAC-seq | Chromatin, Bulk, ATAC-seq |
| 202 | CUT&Tag | Chromatin, Bulk, CUT&Tag |
| 203 | varCUT&Tag | Chromatin, Bulk, varCUT&Tag |
| 204 | sc-varCUT&Tag | Chromatin, Single-cell, sc-varCUT&Tag |
Table 4. SMaHT data generation center codes.
| Code | Category | Institute | Contact PI |
|---|---|---|---|
| bcm | GCC | Baylor College of Medicine | Richard Gibbs |
| broad | GCC | Broad Institute | Kristin Ardlie |
| nygc | GCC | New York Genome Center | Soren Germer |
| uwsc | GCC | University of Washington & Seattle Children’s Hospital | Jimmy Bennett |
| washu | GCC | Washington University in St. Louis | Ting Wang |
| bcm1 | TTD | Baylor College of Medicine | Chuck Zong |
| bcm2 | TTD | Baylor College of Medicine | Fritz Sedlazeck |
| bch1 | TTD | Boston Children’s Hospital | Christopher Walsh |
| bch2 | TTD | Boston Children’s Hospital | Sangita Choudhury |
| broad1 | TTD | Broad Institute | Fei Chen |
| cwru | TTD | Case Western Reserve University | Fulai Jin |
| dfci | TTD | Dana-Farber Cancer Institute | Kathleen Burns |
| mayo | TTD | Mayo Clinic | Alexej Arbyzov |
| nyu | TTD | New York University | Gilad Evrony |
| stfd | TTD | Stanford University | Alexander Urban |
| umass | TTD | University of Massachusetts | Thomas Fazzio |
| umich | TTD | University of Michigan | Ryan Mills |
| uutah | TTD | University of Utah | Gabor Marth |
| wcnygc | TTD | Weill Cornell Medicine & New York Genome Center | Dan Landau |
| dac | DAC | Harvard Medical School | Peter Park |
| tpc | TPC | National Disease Research Interchange (NDRI) | Thomas Bell |
Part 3: File Name breakdown

Table 5. Genome version (A) and variant type (B) tables.
| Reference Genome | Code |
|---|---|
| GRCh38 without ALT contigs | GRCh38 |
| GRCh38 with ALT contigs | GRCh38_ALT |
| T2T CHM13 | CHM13 |
| Donor-specific genome assembly | DSA |
| Data Type | Code |
|---|---|
| Reference conversion | [Source]To[Target] |
| Donor-specific genome assembly haplotype | hapX, hapY, hapX1, hapX2 |
| Gene expression level | gene |
| Transcript isoform expression level or isoform information | isoform |
| Junction annotations | junction |
| Full-length, non-concatemer (FLNC) Kinnex reads | flnc |
| Aligned consensus Duplex-Seq BAM | consensus |
Example Files with the SMaHT Nomenclature
