{ "cells": [ { "cell_type": "markdown", "id": "23c813fb", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "# Part 4 — Core API Deep-Dive (Human Example)\n", "\n", "*Last updated:* 2026-01-08\n", "\n", "This notebook is a **hands-on tutorial** of the public IDTrack API using **human**.\n", "\n", "**Learning objectives**\n", "- Create the `idtrack.API` façade and understand what it wraps.\n", "- Convert single identifiers (time travel + optional external outputs).\n", "- Convert batches and summarize outcomes (1→0 / 1→1 / 1→n).\n", "- Request explanation payloads for audit trails.\n", "- Learn advanced knobs (external bridging, ambiguity strategy, assembly awareness).\n", "- Learn introspection helpers (available databases, assemblies, releases, active ranges).\n", "\n", "> **Prerequisite:** `03_initialization_graph.ipynb` (Part 3) is recommended so the graph loads from cache.\n" ] }, { "cell_type": "markdown", "id": "aa04831d", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "## 4.1 — The API Facade\n", "\n", "`idtrack.API` is the user-facing entry point. It handles:\n", "- organism resolution (human/mouse/pig names and synonyms)\n", "- building or loading a graph snapshot (the reproducible snapshot boundary)\n", "- conversion helpers like `convert_identifier(...)` and `convert_identifier_multiple(...)`\n", "\n", "In this notebook we build (or load) the **human** snapshot, then use it for the rest of the examples.\n", "\n", "> **Expected result:** after the setup cell runs, `api.track` exists and conversions become available.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "7e64623d", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2026-01-17 14:53:38 INFO:verify_organism: Ensembl Rest API query to get the organism names and associated releases.\n", "2026-01-17 14:54:03 INFO:database_manager: Using assembly-specific release range for homo_sapiens assembly 38: releases 76-115 (from config [76, None])\n", "2026-01-17 14:54:57 INFO:graph_maker: The graph is being read: /ictstr01/home/icb/kemal.inecik/work/codes/idtrack/docs/_notebooks/idtrack_cache/graph_homo_sapiens_min48_max115_narrow.pickle\n", "2026-01-17 14:56:40 INFO:the_graph: Cached properties being calculated: available_genome_assemblies\n", "2026-01-17 14:56:40 INFO:the_graph: Cached properties being calculated: combined_edges\n", "2026-01-17 14:58:16 INFO:the_graph: Cached properties being calculated: combined_edges_genes\n", "2026-01-17 15:00:04 INFO:the_graph: Cached properties being calculated: combined_edges_assembly_specific_genes\n", "2026-01-17 15:00:09 INFO:the_graph: Cached properties being calculated: lower_chars_graph\n", "2026-01-17 15:00:11 INFO:the_graph: Cached properties being calculated: get_active_ranges_of_id\n", "2026-01-17 15:01:14 INFO:the_graph: Cached properties being calculated: available_external_databases\n", "2026-01-17 15:01:19 INFO:the_graph: Cached properties being calculated: available_external_databases_assembly\n", "2026-01-17 15:01:23 INFO:the_graph: Cached properties being calculated: node_trios\n", "2026-01-17 15:02:16 INFO:the_graph: Cached properties being calculated: hyperconnective_nodes\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Ready: homo_sapiens snapshot 115\n" ] } ], "source": [ "from __future__ import annotations\n", "\n", "import os\n", "from pathlib import Path\n", "\n", "import idtrack\n", "\n", "LOCAL_REPOSITORY = Path(os.environ.get('IDTRACK_LOCAL_REPO', './idtrack_cache')).resolve()\n", "LOCAL_REPOSITORY.mkdir(parents=True, exist_ok=True)\n", "\n", "api = idtrack.API(local_repository=str(LOCAL_REPOSITORY))\n", "api.configure_logger()\n", "\n", "organism, latest_release = api.resolve_organism('human')\n", "SNAPSHOT_RELEASE = latest_release\n", "\n", "api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)\n", "print('Ready:', organism, 'snapshot', SNAPSHOT_RELEASE)\n" ] }, { "cell_type": "markdown", "id": "adc0ede4", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "## 4.2 — Single Identifier Conversion\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "dbedd707", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "data": { "text/plain": [ "{'target_id': ['ENSG00000141510.20'],\n", " 'last_node': [('ENSG00000141510.20', 'ENSG00000141510.20')],\n", " 'final_database': 'ensembl_gene',\n", " 'graph_id': 'TP53',\n", " 'query_id': 'TP53',\n", " 'no_corresponding': False,\n", " 'no_conversion': False,\n", " 'no_target': False}" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "api.convert_identifier('TP53', to_release=SNAPSHOT_RELEASE)\n" ] }, { "cell_type": "markdown", "id": "567b689b", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "If you see `no_corresponding=True`, it means the input could not be matched.\n", "Try a different spelling/casing, or use an Ensembl ID directly.\n" ] }, { "cell_type": "markdown", "id": "d832c626", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "### 4.2.1 Example: time travel (convert to an older release)\n", "\n", "Why this matters: published datasets often use older releases.\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "f251b006", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "data": { "text/plain": [ "{'target_id': ['ENSG00000141510.18'],\n", " 'last_node': [('ENSG00000141510.18', 'ENSG00000141510.18')],\n", " 'final_database': 'ensembl_gene',\n", " 'graph_id': 'TP53',\n", " 'query_id': 'TP53',\n", " 'no_corresponding': False,\n", " 'no_conversion': False,\n", " 'no_target': False}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Choose an older release to demonstrate time travel\n", "older_release = SNAPSHOT_RELEASE - 10\n", "api.convert_identifier('TP53', to_release=older_release)\n" ] }, { "cell_type": "markdown", "id": "1f4beed4", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "### 4.2.2 Convert into an external database (HGNC)\n", "\n", "To convert into a specific external database, pass `final_database=...`.\n", "Database names are the same names you see in your external YAML.\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "c9c3dd54", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "data": { "text/plain": [ "{'target_id': ['TP53'],\n", " 'last_node': [('ENSG00000141510.20', 'TP53')],\n", " 'final_database': 'HGNC Symbol',\n", " 'graph_id': 'ENSG00000141510',\n", " 'query_id': 'ENSG00000141510',\n", " 'no_corresponding': False,\n", " 'no_conversion': False,\n", " 'no_target': False}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "api.convert_identifier('ENSG00000141510', to_release=SNAPSHOT_RELEASE, final_database='HGNC Symbol')\n" ] }, { "cell_type": "markdown", "id": "b50aa467", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "### 4.2.3 How do I know which external databases are available?\n", "\n", "Use the graph itself to list databases currently represented.\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "67598312", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "data": { "text/plain": [ "['CCDS',\n", " 'Clone_based_ensembl_gene',\n", " 'Clone_based_vega_gene',\n", " 'EntrezGene',\n", " 'HGNC Symbol',\n", " 'Havana gene',\n", " 'Havana transcript',\n", " 'Havana translation',\n", " 'NCBI gene',\n", " 'NCBI gene (formerly Entrezgene)',\n", " 'RFAM',\n", " 'RefSeq_mRNA',\n", " 'RefSeq_mRNA_predicted',\n", " 'RefSeq_ncRNA',\n", " 'RefSeq_ncRNA_predicted',\n", " 'RefSeq_peptide',\n", " 'RefSeq_peptide_predicted',\n", " 'UniProtKB Gene Name',\n", " 'Uniprot/SPTREMBL',\n", " 'Uniprot/SWISSPROT',\n", " 'Vega gene',\n", " 'Vega_gene',\n", " 'synonym_id::EntrezGene',\n", " 'synonym_id::HGNC Symbol',\n", " 'synonym_id::NCBI gene',\n", " 'synonym_id::NCBI gene (formerly Entrezgene)',\n", " 'synonym_id::UniProtKB Gene Name']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g = api.track.graph\n", "sorted(g.available_external_databases)[:50]\n" ] }, { "cell_type": "markdown", "id": "10f77c9b", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "### 4.2.4 Understanding the result dictionary\n", "\n", "Key fields:\n", "- `query_id`: exactly what you typed\n", "- `graph_id`: what IDTrack matched internally (normalization step)\n", "- `target_id`: list of outputs (can be 0, 1, or many)\n", "- `no_corresponding`: input didn’t match any node\n", "- `no_conversion`: input matched, but no path to target release / database\n", "- `no_target`: reached an Ensembl target, but requested external DB had no synonym\n", "\n", "Important: `target_id` is a list because ambiguity is real and common.\n" ] }, { "cell_type": "markdown", "id": "ac47627b", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "### 4.2.5 Ambiguity control: strategy='best' vs strategy='all'\n", "\n", "- `strategy='best'` (default): returns a single best target when possible\n", "- `strategy='all'`: returns *all* candidates IDTrack found\n", "\n", "Use `'all'` when you are doing QC or want to inspect ambiguous mappings.\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "d0a3f563", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "data": { "text/plain": [ "{'target_id': ['ENSG00000141510.20'],\n", " 'last_node': [('ENSG00000141510.20', 'ENSG00000141510.20')],\n", " 'final_database': 'ensembl_gene',\n", " 'graph_id': 'TP53',\n", " 'query_id': 'TP53',\n", " 'no_corresponding': False,\n", " 'no_conversion': False,\n", " 'no_target': False}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "api.convert_identifier('TP53', to_release=SNAPSHOT_RELEASE, strategy='all')\n" ] }, { "cell_type": "markdown", "id": "8ab242d3", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "## 4.3 — Batch Conversion\n", "\n", "Most workflows start from a list of identifiers (genes in a count matrix, markers, hits, etc.).\n", "IDTrack provides two helpers:\n", "- `convert_identifier_multiple(...)`\n", "- `classify_multiple_conversion(...)` to summarize outcomes\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "ca00f64f", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|█████████████████████████████████████████████| 6/6 [00:00<00:00, 73.13it/s, ID:NOT_A_REAL_GENE]\n" ] }, { "data": { "text/plain": [ "[{'target_id': ['TP53'],\n", " 'last_node': [('ENSG00000141510.20', 'TP53')],\n", " 'final_database': 'HGNC Symbol',\n", " 'graph_id': 'TP53',\n", " 'query_id': 'TP53',\n", " 'no_corresponding': False,\n", " 'no_conversion': False,\n", " 'no_target': False},\n", " {'target_id': ['BRCA1'],\n", " 'last_node': [('ENSG00000012048.27', 'BRCA1')],\n", " 'final_database': 'HGNC Symbol',\n", " 'graph_id': 'BRCA1',\n", " 'query_id': 'BRCA1',\n", " 'no_corresponding': False,\n", " 'no_conversion': False,\n", " 'no_target': False}]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "genes = ['TP53', 'BRCA1', 'BRCA2', 'BRAF', 'KRAS', 'NOT_A_REAL_GENE']\n", "results = api.convert_identifier_multiple(genes, to_release=SNAPSHOT_RELEASE, final_database='HGNC Symbol')\n", "results[:2] # show first two\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "49300b27", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "data": { "text/plain": [ "{'changed_only_1_to_n': 0,\n", " 'changed_only_1_to_1': 0,\n", " 'alternative_target_1_to_1': 0,\n", " 'alternative_target_1_to_n': 0,\n", " 'matching_1_to_0': 1,\n", " 'matching_1_to_1': 5,\n", " 'matching_1_to_n': 0,\n", " 'input_identifiers': 6}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summary = api.classify_multiple_conversion(results)\n", "# Each bin is a list of per-gene dictionaries\n", "{k: len(v) for k, v in summary.items()}\n" ] }, { "cell_type": "markdown", "id": "24116800", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "If you want a human-readable report, you can print the summary bins:\n" ] }, { "cell_type": "code", "execution_count": 9, "id": "b9931ed6", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2026-01-17 15:02:27 INFO:api: \n", "IDTrack conversion summary:\n", " Total processed: 6\n", " 1→0: 1 (16.7%)\n", " 1→1: 5 (83.3%)\n", " Changed only: 0 (0.0%)\n", " Alternative targets: 0 (0.0%)\n", " Rest: 5 (100.0%)\n", " 1→n: 0 (0.0%)\n", " Changed only: 0 (0.0%)\n", " Alternative targets: 0 (0.0%)\n", " Diagnostics:\n", " no_corresponding: 1\n", " no_conversion: 0\n", " no_target: 0\n" ] } ], "source": [ "api.print_binned_conversion(summary)\n" ] }, { "cell_type": "markdown", "id": "739b6db6", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "## 4.4 — Explainability & Auditability\n", "\n", "When you set `explain=True`, the result includes a `the_path` field describing the graph edges followed.\n", "This is very useful for advanced QC and debugging.\n" ] }, { "cell_type": "code", "execution_count": 10, "id": "4a51a1c4", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "data": { "text/plain": [ "['target_id',\n", " 'last_node',\n", " 'final_database',\n", " 'graph_id',\n", " 'query_id',\n", " 'no_corresponding',\n", " 'no_conversion',\n", " 'no_target',\n", " 'the_path']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "explained = api.convert_identifier('TP53', to_release=SNAPSHOT_RELEASE, final_database='HGNC Symbol', explain=True)\n", "list(explained.keys())\n" ] }, { "cell_type": "code", "execution_count": 11, "id": "102844e8", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "data": { "text/plain": [ "[('TP53', 'ENSG00000141510.20')]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The path dictionary keys are (target_id, ensembl_gene_id) pairs\n", "list(explained['the_path'].keys())[:3]\n" ] }, { "cell_type": "markdown", "id": "aa85d082", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "`the_path` is intentionally detailed. For most users, the summary flags and `target_id` are enough.\n" ] }, { "cell_type": "markdown", "id": "af01ac49", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "## 4.5 — Advanced Conversion Options\n", "\n", "`api.convert_identifier(...)` is a convenience wrapper around `api.track.convert(...)`.\n", "\n", "Use the high-level API most of the time.\n", "But if you need full control (search settings, whether external bridging is allowed, deeper path diagnostics), you can call `Track.convert` directly.\n", "\n", "### 4.5.1 Best vs all (selection strategy)\n", "\n", "- `strategy='best'` returns a single globally best target.\n", "- `strategy='all'` returns all scored targets (useful for ambiguity-aware pipelines).\n", "\n", "### 4.5.2 Controlling external bridging\n", "\n", "External bridging helps reconnect broken Ensembl histories using external IDs, but it can also increase search space.\n", "Power users can toggle it via `go_external` on `Track.convert`.\n", "\n", "### 4.5.3 Hyperconnected nodes\n", "\n", "Some external identifiers connect to *many* entities (e.g. generic accessions). IDTrack detects these and limits their use to keep searches fast.\n", "\n", "### 4.5.4 Assembly-aware conversions\n", "\n", "In IDTrack, genome assemblies are part of the graph. This is crucial when you integrate datasets that were annotated with different references\n", "(for example, a GRCh37-based GTF and a GRCh38-based GTF).\n", "\n", "When you build a snapshot you choose a **primary assembly** (default: the newest/highest-priority assembly for that organism).\n", "The snapshot can still include other assemblies that Ensembl exposes within the snapshot window, and the path-finder can traverse between\n", "assemblies when it improves connectivity.\n", "\n", "Practical consequences:\n", "- You can feed identifiers originating from older builds and still harmonize them into one target space (your snapshot release + primary assembly).\n", "- External databases can be assembly-scoped; keeping assembly blocks enabled in your external YAML increases the set of bridges available for mapping.\n", "\n", "If you truly need outputs anchored to a different primary assembly (for example, a GRCh37-only downstream reference), rebuild with\n", "`genome_assembly=37`. Note that the cached graph filename does not include the assembly; use a separate local repository if you want to keep\n", "multiple primary-assembly snapshots side-by-side.\n", "\n", "Example (advanced):\n", "```python\n", "api.track.convert(\n", " from_id='TP53',\n", " from_release=None,\n", " to_release=SNAPSHOT_RELEASE,\n", " final_database='HGNC Symbol',\n", " go_external=True,\n", " return_path=True,\n", ")\n", "```\n" ] }, { "cell_type": "code", "execution_count": 12, "id": "53eeb29f", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hyperconnected external nodes: 1611\n", "Top 10 by out-degree:\n", " 1911 - Metazoa_SRP\n", " 1911 - RF00017\n", " 1619 - RF00026\n", " 1619 - U6\n", " 1090 - RF00019\n", " 1090 - Y_RNA\n", " 633 - 5S_rRNA\n", " 633 - RF00001\n", " 490 - RF01210\n", " 490 - snoU13\n", "go_external=False -> OK\n", "go_external=True -> OK\n" ] } ], "source": [ "# Advanced demo: inspect hyperconnected nodes and compare `go_external` behavior.\n", "# Safe: does not modify your cache; it only runs conversions.\n", "\n", "# 1) Hyperconnected nodes (performance/ambiguity concept)\n", "g = api.track.graph\n", "hc = getattr(g, 'hyperconnective_nodes', {})\n", "print('Hyperconnected external nodes:', len(hc))\n", "if hc:\n", " top = sorted(hc.items(), key=lambda kv: kv[1], reverse=True)[:10]\n", " print('Top 10 by out-degree:')\n", " for node, deg in top:\n", " print(' ', deg, '-', node)\n", "\n", "# 2) External bridging toggle (often matters when backbone history is disconnected)\n", "# For many well-behaved genes, both calls will succeed; the point is the *option* exists.\n", "res_no_external = api.track.convert(\n", " from_id='TP53',\n", " from_release=None,\n", " to_release=SNAPSHOT_RELEASE,\n", " final_database=None,\n", " go_external=False,\n", " prioritize_to_one_filter=True,\n", " return_path=False,\n", ")\n", "res_with_external = api.track.convert(\n", " from_id='TP53',\n", " from_release=None,\n", " to_release=SNAPSHOT_RELEASE,\n", " final_database=None,\n", " go_external=True,\n", " prioritize_to_one_filter=True,\n", " return_path=False,\n", ")\n", "\n", "print('go_external=False ->', 'OK' if res_no_external else None)\n", "print('go_external=True ->', 'OK' if res_with_external else None)\n" ] }, { "cell_type": "markdown", "id": "e7b55b58", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "## 4.6 — Introspection & Discovery\n", "\n", "These helpers answer practical questions like:\n", "- *Which external databases are available in my current graph?*\n", "- *Which genome assemblies are represented?*\n", "- *What Ensembl release range does my snapshot cover?*\n", "- *When was a given identifier active across releases?*\n", "\n", "The next cell demonstrates the most useful introspection calls.\n" ] }, { "cell_type": "code", "execution_count": 13, "id": "ed563412", "metadata": { "deletable": true, "editable": true, "frozen": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2026-01-17 15:02:28 INFO:the_graph: Cached properties being calculated (for tests): external_database_connection_form\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Assemblies in this graph: [36, 37, 38]\n", "External DBs enabled (count): 27\n", "External DBs (first 25): ['CCDS', 'Clone_based_ensembl_gene', 'Clone_based_vega_gene', 'EntrezGene', 'HGNC Symbol', 'Havana gene', 'Havana transcript', 'Havana translation', 'NCBI gene', 'NCBI gene (formerly Entrezgene)', 'RFAM', 'RefSeq_mRNA', 'RefSeq_mRNA_predicted', 'RefSeq_ncRNA', 'RefSeq_ncRNA_predicted', 'RefSeq_peptide', 'RefSeq_peptide_predicted', 'UniProtKB Gene Name', 'Uniprot/SPTREMBL', 'Uniprot/SWISSPROT', 'Vega gene', 'Vega_gene', 'synonym_id::EntrezGene', 'synonym_id::HGNC Symbol', 'synonym_id::NCBI gene']\n", "\n", "External DB connection forms (sample):\n", " CCDS → transcript\n", " Clone_based_ensembl_gene → gene\n", " Clone_based_vega_gene → gene\n", " EntrezGene → gene\n", " HGNC Symbol → gene\n", " Havana gene → gene\n", " Havana transcript → transcript\n", " Havana translation → translation\n", " NCBI gene → gene\n", " NCBI gene (formerly Entrezgene) → gene\n", "\n", "Ensembl releases in snapshot window: (76, 115)\n", "\n", "Active ranges (main assembly) for ENSG00000141510 : [[48, 115]]\n", "All-assemblies active range failed -> ValueError(\"Cannot get active ranges for 'ENSG00000141510': node type 'base_ensembl_gene' is not a gene type. Expected 'ensembl_gene' or an assembly-specific gene type.\")\n" ] } ], "source": [ "# Introspection demo\n", "\n", "print('Assemblies in this graph:', sorted(api.list_genome_assemblies()))\n", "\n", "ext_dbs = sorted(api.list_external_databases())\n", "print('External DBs enabled (count):', len(ext_dbs))\n", "print('External DBs (first 25):', ext_dbs[:25])\n", "\n", "forms = api.external_database_forms()\n", "print()\n", "print('External DB connection forms (sample):')\n", "for name in ext_dbs[:10]:\n", " print(' ', name, '→', forms.get(name))\n", "\n", "rels = api.list_ensembl_releases()\n", "print()\n", "print('Ensembl releases in snapshot window:', (min(rels), max(rels)) if rels else None)\n", "\n", "# Active ranges: when was an ID \"alive\" across releases?\n", "# (Useful for provenance documentation.)\n", "g = api.track.graph\n", "example_gene = 'ENSG00000141510' # TP53\n", "if example_gene in g.nodes:\n", " print()\n", " print('Active ranges (main assembly) for', example_gene, ':', g.get_active_ranges_of_id.get(example_gene))\n", " try:\n", " print('Active ranges (all assemblies) for', example_gene, ':', g.get_active_ranges_of_id_ensembl_all_inclusive(example_gene))\n", " except Exception as e:\n", " print('All-assemblies active range failed ->', repr(e))\n", "else:\n", " print('Example gene not found in graph (unexpected).')\n" ] }, { "cell_type": "markdown", "id": "4e306c2e", "metadata": { "deletable": true, "editable": true, "frozen": false }, "source": [ "## 4.7 — Practical advice (the kind that saves you a week)\n", "\n", "1. Always record your **snapshot boundary** (release) in your analysis notes.\n", "2. If you share results, share the **external YAML** too.\n", "3. When mapping is ambiguous, do not hide it — decide how your pipeline should handle 1→n mappings.\n", "4. For scRNA-seq harmonization, prefer stable namespaces (Ensembl IDs) before switching to symbols.\n", "\n", "> **Tip:** If you need troubleshooting checklists and diagnostics helpers, see `07_advanced_topics.ipynb` (Part 7.3).\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.12" } }, "nbformat": 4, "nbformat_minor": 5 }