{"title": "Gene and Transcript Annotations", "status": "open", "content": [{"body": "## Overview\n\nGenome annotations are critical for understanding the functional elements within a genome, including genes, transcripts, regulatory regions, and other important features. Accurate genome annotations are essential for tasks such as variant annotation, gene expression analysis, and understanding the biological significance of genetic variants.\n\nThe pipelines incorporate several genome annotation resources compatible with the GRCh38 Genome Build.\n\n### Resources\n\n1. **GENCODE:** A comprehensive resource providing detailed annotations of gene features and other significant elements in the human genome.\n\n---\n\n## GENCODE\n\nThe GENCODE project<sup><sub>1</sub></sup> provides comprehensive annotation of gene features for the human genome, including coding and non-coding genes, pseudogenes, and other significant genomic elements.\n\nThe specific version in use is GENCODE Release 47 (GRCh38.p14), which aligns with the Genome Reference Consortium Human Build 38 (GRCh38) and is accessible for download [here](https://www.gencodegenes.org/human/release_47.html).\n\n### Collapsing GENCODE Annotation\n\n###### Download comprehensive gene annotation\n\n<pre class=\"code-block copy-wrapper\">\nwget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.annotation.gtf.gz\n</pre>\n\n###### Collapse gene annotation\n\n<pre class=\"code-block copy-wrapper\">\npython3 collapse_annotation.py \\\n    --collapse_only gencode.v47.annotation.gtf \\\n    gencode.v47.genes.gtf\n</pre>\n\nSource code for the ``collapse_annotation.py``<sup><sub>2</sub></sup> script is available [here](https://github.com/smaht-dac/rnaseq-pipelines/blob/main/preprocessing/collapse_annotation.py).\n\n<sub><b>1</b>: *Frankish A, et al.* GENCODE: reference annotation for the human and mouse genomes in 2023. *Nucleic Acids Res., Volume 51, Issue D1, 6 January 2023, Pages D942\u2013D949.* doi: 10.1093/nar/gkac1071; <b>2</b>: *Original author: Francois Aguet*</sub>", "title": "Gene and Transcript Annotations", "status": "open", "options": {"filetype": "md", "collapsible": false, "default_open": true, "convert_ext_links": true, "initial_header_level": 2}, "consortia": [{"@type": ["Consortium", "Item"], "status": "open", "@id": "/consortia/358aed10-9b9d-4e26-ab84-4bd162da182b/", "uuid": "358aed10-9b9d-4e26-ab84-4bd162da182b", "display_title": "SMaHT", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "identifier": "genome_annotations", "date_created": "2026-01-09T20:07:19.554596+00:00", "section_type": "Page Section", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2026-04-14T21:31:06.381681+00:00"}, "schema_version": "1", "submission_centers": [{"@id": "/submission-centers/9626d82e-8110-4213-ac75-0a50adf890ff/", "uuid": "9626d82e-8110-4213-ac75-0a50adf890ff", "display_title": "HMS DAC", "status": "open", "@type": ["SubmissionCenter", "Item"], "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "@id": "/static-sections/1dc93422-bed2-45c5-8f98-ee57414d4c4a/", "@type": ["StaticSection", "UserContent", "Item"], "uuid": "1dc93422-bed2-45c5-8f98-ee57414d4c4a", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}, "display_title": "Gene and Transcript Annotations", "content_as_html": "<div><h2>Overview</h2>\n<p>Genome annotations are critical for understanding the functional elements within a genome, including genes, transcripts, regulatory regions, and other important features. Accurate genome annotations are essential for tasks such as variant annotation, gene expression analysis, and understanding the biological significance of genetic variants.</p>\n<p>The pipelines incorporate several genome annotation resources compatible with the GRCh38 Genome Build.</p>\n<h3>Resources</h3>\n<ol>\n<li><strong>GENCODE:</strong> A comprehensive resource providing detailed annotations of gene features and other significant elements in the human genome.</li>\n</ol>\n<hr />\n<h2>GENCODE</h2>\n<p>The GENCODE project<sup><sub>1</sub></sup> provides comprehensive annotation of gene features for the human genome, including coding and non-coding genes, pseudogenes, and other significant genomic elements.</p>\n<p>The specific version in use is GENCODE Release 47 (GRCh38.p14), which aligns with the Genome Reference Consortium Human Build 38 (GRCh38) and is accessible for download <a href=\"https://www.gencodegenes.org/human/release_47.html\" target=\"_blank\" rel=\"noopener noreferrer\">here</a>.</p>\n<h3>Collapsing GENCODE Annotation</h3>\n<h6>Download comprehensive gene annotation</h6>\n<pre class=\"code-block copy-wrapper\">\nwget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.annotation.gtf.gz\n</pre>\n\n<h6>Collapse gene annotation</h6>\n<pre class=\"code-block copy-wrapper\">\npython3 collapse_annotation.py \\\n    --collapse_only gencode.v47.annotation.gtf \\\n    gencode.v47.genes.gtf\n</pre>\n\n<p>Source code for the <code>collapse_annotation.py</code><sup><sub>2</sub></sup> script is available <a href=\"https://github.com/smaht-dac/rnaseq-pipelines/blob/main/preprocessing/collapse_annotation.py\" target=\"_blank\" rel=\"noopener noreferrer\">here</a>.</p>\n<p><sub><b>1</b>: <em>Frankish A, et al.</em> GENCODE: reference annotation for the human and mouse genomes in 2023. <em>Nucleic Acids Res., Volume 51, Issue D1, 6 January 2023, Pages D942\u2013D949.</em> doi: 10.1093/nar/gkac1071; <b>2</b>: <em>Original author: Francois Aguet</em></sub></p></div>", "content": "## Overview\n\nGenome annotations are critical for understanding the functional elements within a genome, including genes, transcripts, regulatory regions, and other important features. Accurate genome annotations are essential for tasks such as variant annotation, gene expression analysis, and understanding the biological significance of genetic variants.\n\nThe pipelines incorporate several genome annotation resources compatible with the GRCh38 Genome Build.\n\n### Resources\n\n1. **GENCODE:** A comprehensive resource providing detailed annotations of gene features and other significant elements in the human genome.\n\n---\n\n## GENCODE\n\nThe GENCODE project<sup><sub>1</sub></sup> provides comprehensive annotation of gene features for the human genome, including coding and non-coding genes, pseudogenes, and other significant genomic elements.\n\nThe specific version in use is GENCODE Release 47 (GRCh38.p14), which aligns with the Genome Reference Consortium Human Build 38 (GRCh38) and is accessible for download [here](https://www.gencodegenes.org/human/release_47.html).\n\n### Collapsing GENCODE Annotation\n\n###### Download comprehensive gene annotation\n\n<pre class=\"code-block copy-wrapper\">\nwget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.annotation.gtf.gz\n</pre>\n\n###### Collapse gene annotation\n\n<pre class=\"code-block copy-wrapper\">\npython3 collapse_annotation.py \\\n    --collapse_only gencode.v47.annotation.gtf \\\n    gencode.v47.genes.gtf\n</pre>\n\nSource code for the ``collapse_annotation.py``<sup><sub>2</sub></sup> script is available [here](https://github.com/smaht-dac/rnaseq-pipelines/blob/main/preprocessing/collapse_annotation.py).\n\n<sub><b>1</b>: *Frankish A, et al.* GENCODE: reference annotation for the human and mouse genomes in 2023. *Nucleic Acids Res., Volume 51, Issue D1, 6 January 2023, Pages D942\u2013D949.* doi: 10.1093/nar/gkac1071; <b>2</b>: *Original author: Francois Aguet*</sub>", "filetype": "md"}], "consortia": [{"uuid": "358aed10-9b9d-4e26-ab84-4bd162da182b", "@id": "/consortia/358aed10-9b9d-4e26-ab84-4bd162da182b/", "status": "open", "@type": ["Consortium", "Item"], "display_title": "SMaHT", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "identifier": "docs/additional-resources/pipeline-docs/genome_annotations", "date_created": "2026-01-09T20:07:39.426175+00:00", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2026-04-14T21:31:34.743971+00:00"}, "schema_version": "1", "table-of-contents": {"enabled": true, "skip-depth": 1, "header-depth": 2, "include-top-link": false}, "submission_centers": [{"@id": "/submission-centers/9626d82e-8110-4213-ac75-0a50adf890ff/", "display_title": "HMS DAC", "uuid": "9626d82e-8110-4213-ac75-0a50adf890ff", "@type": ["SubmissionCenter", "Item"], "status": "open", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "@id": "/docs/additional-resources/pipeline-docs/genome_annotations", "@type": ["DocsAdditional-resourcesPipeline-docsGenome_annotationsPage", "DocsAdditional-resourcesPipeline-docsPage", "DocsAdditional-resourcesPage", "DocsPage", "StaticPage", "Portal"], "uuid": "01f81cd2-5423-4c26-afd2-5e31f6eef6b6", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}, "display_title": "Gene and Transcript Annotations", "@context": "/docs/additional-resources/pipeline-docs/genome_annotations", "is_leaf": true, "toc": {"enabled": true, "skip-depth": 1, "header-depth": 2, "include-top-link": false}, "next": {"identifier": "docs/additional-resources/pipeline-docs/variant_catalogs", "title": "Variant Databases", "status": "open", "content": [{"@id": "/static-sections/2112c809-bca7-46cc-b700-a5280cb2cdee/", "status": "open", "uuid": "2112c809-bca7-46cc-b700-a5280cb2cdee", "@type": ["StaticSection", "UserContent", "Item"], "display_title": "Variant Databases", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "consortia": [{"status": "open", "@id": "/consortia/358aed10-9b9d-4e26-ab84-4bd162da182b/", "@type": ["Consortium", "Item"], "uuid": "358aed10-9b9d-4e26-ab84-4bd162da182b", "display_title": "SMaHT", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "date_created": "2026-01-09T20:07:39.604158+00:00", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2026-04-14T21:31:35.270518+00:00"}, "schema_version": "1", "table-of-contents": {"enabled": true, "skip-depth": 1, "header-depth": 2, "include-top-link": false}, "submission_centers": [{"display_title": "HMS DAC", "@type": ["SubmissionCenter", "Item"], "@id": "/submission-centers/9626d82e-8110-4213-ac75-0a50adf890ff/", "status": "open", "uuid": "9626d82e-8110-4213-ac75-0a50adf890ff", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "@id": "/docs/additional-resources/pipeline-docs/variant_catalogs", "@type": ["DocsAdditional-resourcesPipeline-docsVariant_catalogsPage", "DocsAdditional-resourcesPipeline-docsPage", "DocsAdditional-resourcesPage", "DocsPage", "StaticPage", "Portal"], "uuid": "f873a9df-a511-4e02-af8c-d6e050980b4e", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}, "display_title": "Variant Databases", "is_leaf": true, "sibling_length": 11, "sibling_position": 8}, "previous": {"identifier": "docs/additional-resources/pipeline-docs/genome_builds", "title": "Genome Builds", "status": "open", "content": [{"@id": "/static-sections/455af80f-2dbd-40c7-b7b0-451783dc87ef/", "status": "open", "uuid": "455af80f-2dbd-40c7-b7b0-451783dc87ef", "@type": ["StaticSection", "UserContent", "Item"], "display_title": "Genome Builds", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "consortia": [{"status": "open", "@id": "/consortia/358aed10-9b9d-4e26-ab84-4bd162da182b/", "@type": ["Consortium", "Item"], "uuid": "358aed10-9b9d-4e26-ab84-4bd162da182b", "display_title": "SMaHT", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "date_created": "2026-01-09T20:07:39.260318+00:00", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2026-04-14T21:31:34.169953+00:00"}, "schema_version": "1", "table-of-contents": {"enabled": true, "skip-depth": 1, "header-depth": 2, "include-top-link": false}, "submission_centers": [{"display_title": "HMS DAC", "@type": ["SubmissionCenter", "Item"], "@id": "/submission-centers/9626d82e-8110-4213-ac75-0a50adf890ff/", "status": "open", "uuid": "9626d82e-8110-4213-ac75-0a50adf890ff", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "@id": "/docs/additional-resources/pipeline-docs/genome_builds", "@type": ["DocsAdditional-resourcesPipeline-docsGenome_buildsPage", "DocsAdditional-resourcesPipeline-docsPage", "DocsAdditional-resourcesPage", "DocsPage", "StaticPage", "Portal"], "uuid": "1ac937fa-8057-44da-ae9d-869d1cc56a1b", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}, "display_title": "Genome Builds", "is_leaf": true, "sibling_length": 11, "sibling_position": 6}, "parent": {"identifier": "docs/additional-resources/pipeline-docs", "parent": {"identifier": "docs/additional-resources", "parent": {"identifier": "docs", "parent": {"identifier": "", "@id": "/", "display_title": "Home", "@type": ["DirectoryPage", "StaticPage", "Portal"]}, "@id": "/docs", "uuid": "089319c4-3ce9-4ec1-bd0b-5451a48bd99e", "display_title": "Documentation", "@type": ["DocsPage", "DirectoryPage", "StaticPage", "Portal"], "sibling_length": 5, "sibling_position": 3}, "title": "Analysis & Additional Resources", "status": "open", "consortia": [{"status": "open", "uuid": "358aed10-9b9d-4e26-ab84-4bd162da182b", "@type": ["Consortium", "Item"], "@id": "/consortia/358aed10-9b9d-4e26-ab84-4bd162da182b/", "display_title": "SMaHT", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "date_created": "2024-03-01T19:21:24.278212+00:00", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2026-04-14T21:31:37.074563+00:00"}, "schema_version": "1", "table-of-contents": {"enabled": true, "skip-depth": 1, "header-depth": 4, "include-top-link": false}, "submission_centers": [{"status": "open", "@id": "/submission-centers/9626d82e-8110-4213-ac75-0a50adf890ff/", "@type": ["SubmissionCenter", "Item"], "uuid": "9626d82e-8110-4213-ac75-0a50adf890ff", "display_title": "HMS DAC", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "@id": "/docs/additional-resources", "@type": ["DocsAdditional-resourcesPage", "DocsPage", "DirectoryPage", "StaticPage", "Portal"], "uuid": "1ada4fca-af4b-4304-947d-59e2918ab728", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}, "display_title": "Analysis & Additional Resources", "sibling_length": 3, "sibling_position": 2}, "title": "Analysis Pipelines", "status": "open", "content": [{"@id": "/static-sections/b78b2ebb-d01c-4635-8a67-76a98ab81772/", "uuid": "b78b2ebb-d01c-4635-8a67-76a98ab81772", "display_title": "Analysis Pipelines", "status": "open", "@type": ["StaticSection", "UserContent", "Item"], "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "redirect": {"code": 307, "enabled": false}, "consortia": [{"display_title": "SMaHT", "status": "open", "@id": "/consortia/358aed10-9b9d-4e26-ab84-4bd162da182b/", "@type": ["Consortium", "Item"], "uuid": "358aed10-9b9d-4e26-ab84-4bd162da182b", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "date_created": "2026-01-09T20:07:37.883644+00:00", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2026-04-14T21:31:32.772495+00:00"}, "schema_version": "1", "submission_centers": [{"status": "open", "@type": ["SubmissionCenter", "Item"], "uuid": "9626d82e-8110-4213-ac75-0a50adf890ff", "@id": "/submission-centers/9626d82e-8110-4213-ac75-0a50adf890ff/", "display_title": "HMS DAC", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}], "@id": "/docs/additional-resources/pipeline-docs", "@type": ["DocsAdditional-resourcesPipeline-docsPage", "DocsAdditional-resourcesPage", "DocsPage", "DirectoryPage", "StaticPage", "Portal"], "uuid": "6e144832-6abc-47e2-bea5-f720598cf61a", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}, "display_title": "Analysis Pipelines", "sibling_length": 5, "sibling_position": 0}, "sibling_length": 11, "sibling_position": 7}