========== Quickstart ========== Get started with IDTrack in minutes. This page provides minimal, copy-pasteable code examples to help you perform your first identifier conversions. .. contents:: :local: :depth: 2 :backlinks: none .. tip:: For the complete learning path with detailed explanations, see :doc:`tutorials` (Parts 0-7). .. note:: IDTrack snapshots are both **release-aware** and **assembly-aware**. By default, a snapshot graph keeps the organism’s configured assemblies (e.g. human GRCh38 + GRCh37), so you can harmonize mixed-build inputs into one target space (your snapshot release + chosen output/primary assembly). The Three-Step Workflow ----------------------- IDTrack follows a simple three-step workflow for all identifier conversions: .. list-table:: :header-rows: 1 :widths: 10 30 60 * - Step - Action - Description * - 1 - **Initialize** - Create an API instance with a local cache directory * - 2 - **Build** - Build a graph snapshot for your target organism and snapshot release (multi-assembly by default) * - 3 - **Convert** - Map identifiers using the precomputed graph The first graph build downloads external databases and takes several minutes. Subsequent runs load the cached graph in seconds. Example 1: Convert a Human Gene Symbol -------------------------------------- Convert a gene symbol to its Ensembl ID at the latest Ensembl release: .. code-block:: python import os from pathlib import Path import idtrack # Step 1: Set up your local cache directory # This stores downloaded databases and built graphs for reuse local_repo = Path(os.environ.get("IDTRACK_LOCAL_REPO", "./idtrack_cache")).resolve() local_repo.mkdir(parents=True, exist_ok=True) # Step 2: Initialize the API api = idtrack.API(local_repository=str(local_repo)) # Step 3: Resolve the organism and get the latest release organism, latest_release = api.resolve_organism("human") print(f"Using {organism} at Ensembl release {latest_release}") # Step 4: Build the graph (cached after first run) api.build_graph( organism_name=organism, snapshot_release=latest_release, calculate_caches=True ) # Step 5: Convert an identifier result = api.convert_identifier("TP53", to_release=latest_release) target = result["target_id"][0] if result["target_id"] else None print(f"TP53 -> {target}") **Expected output:** .. code-block:: text Using homo_sapiens at Ensembl release 114 TP53 -> ENSG00000141510 .. note:: The first graph build downloads external databases and constructs the mapping graph. This can take several minutes depending on your internet connection. Subsequent runs load the pre-built graph from your local cache in seconds. Example 2: Batch Conversion --------------------------- Convert multiple identifiers at once for better performance: .. code-block:: python import idtrack api = idtrack.API(local_repository="./idtrack_cache") organism, release = api.resolve_organism("human") api.build_graph(organism_name=organism, snapshot_release=release, calculate_caches=True) # Convert a batch of gene symbols genes = ["TP53", "BRCA1", "EGFR", "KRAS", "MYC"] results = api.convert_identifier_multiple(genes, to_release=release, verbose=False) # Print results print("Gene Symbol -> Ensembl ID") print("-" * 30) for gene, result in zip(genes, results): target = result["target_id"][0] if result["target_id"] else None print(f"{gene:<10} -> {target}") **Expected output:** .. code-block:: text Gene Symbol -> Ensembl ID ------------------------------ TP53 -> ENSG00000141510 BRCA1 -> ENSG00000012048 EGFR -> ENSG00000146648 KRAS -> ENSG00000133703 MYC -> ENSG00000136997 Example 3: Working with Mouse ----------------------------- IDTrack supports multiple organisms. Here's how to work with mouse: .. code-block:: python import idtrack api = idtrack.API(local_repository="./idtrack_cache") # Resolve mouse organism organism, release = api.resolve_organism("mouse") print(f"Using {organism} at Ensembl release {release}") # Build the mouse graph api.build_graph(organism_name=organism, snapshot_release=release, calculate_caches=True) # Convert mouse gene symbols mouse_genes = ["Trp53", "Brca1", "Egfr"] results = api.convert_identifier_multiple(mouse_genes, to_release=release, verbose=False) for gene, result in zip(mouse_genes, results): target = result["target_id"][0] if result["target_id"] else None print(f"{gene} -> {target}") Example 4: Understanding Conversion Outcomes -------------------------------------------- IDTrack is transparent about mapping ambiguity. Conversions can result in three outcomes: .. list-table:: :header-rows: 1 :widths: 15 25 60 * - Outcome - Meaning - Interpretation * - **1 -> 1** - Unique mapping - Ideal case: one identifier maps to exactly one target * - **1 -> 0** - No mapping found - Identifier may be retired, unknown, or from a different namespace * - **1 -> n** - Multiple mappings - Ambiguous: multiple valid targets exist (honestly reported) By default, :meth:`idtrack.API.convert_identifier` uses ``strategy="best"`` (it returns a single best target). To *surface ambiguity*, use ``strategy="all"`` and inspect ``result["target_id"]``. Get detailed conversion information with the ``explain`` parameter: .. code-block:: python import idtrack api = idtrack.API(local_repository="./idtrack_cache") organism, release = api.resolve_organism("human") api.build_graph(organism_name=organism, snapshot_release=release, calculate_caches=True) # Get detailed conversion information result = api.convert_identifier( "TP53", to_release=release, strategy="best", explain=True # Returns detailed mapping information ) print("Targets:", result["target_id"]) print("Matched to graph node:", result["graph_id"]) print("No corresponding node:", result["no_corresponding"]) print("No conversion possible:", result["no_conversion"]) The ``explain=True`` option provides details about the mapping path, intermediate nodes, and confidence of the conversion. Example 5: Cross-Release Conversion ----------------------------------- Convert identifiers between different Ensembl releases to handle legacy data: .. code-block:: python import idtrack api = idtrack.API(local_repository="./idtrack_cache") organism, latest_release = api.resolve_organism("human") # Build graph for the latest release api.build_graph( organism_name=organism, snapshot_release=latest_release, calculate_caches=True ) # Convert an identifier from an older release to the latest # This handles retired IDs, merged genes, and nomenclature changes old_ensembl_id = "ENSG00000141510" # TP53 result = api.convert_identifier( old_ensembl_id, from_release=100, # Identifier came from Ensembl release 100 to_release=latest_release ) target = result["target_id"][0] if result["target_id"] else None print(f"Release 100 -> Release {latest_release}: {target}") Environment Variable Configuration ---------------------------------- For convenience, set the ``IDTRACK_LOCAL_REPO`` environment variable to avoid specifying the cache directory in every script. .. tab-set:: .. tab-item:: Bash/Zsh .. code-block:: bash # Add to your ~/.bashrc or ~/.zshrc export IDTRACK_LOCAL_REPO="$HOME/.idtrack" .. tab-item:: Fish .. code-block:: fish # Persist across sessions set -Ux IDTRACK_LOCAL_REPO $HOME/.idtrack .. tab-item:: Windows PowerShell .. code-block:: powershell # Set user environment variable [Environment]::SetEnvironmentVariable("IDTRACK_LOCAL_REPO", "$env:USERPROFILE\.idtrack", "User") Then in your Python code: .. code-block:: python import os import idtrack # Automatically uses IDTRACK_LOCAL_REPO if set local_repo = os.environ.get("IDTRACK_LOCAL_REPO", "./idtrack_cache") api = idtrack.API(local_repository=local_repo) Supported Organisms ------------------- IDTrack currently supports the following organisms: .. list-table:: :header-rows: 1 :widths: 20 30 50 * - Organism - Identifier - External Database Support * - Human - ``"human"`` or ``"homo_sapiens"`` - Full Ensembl + HGNC, NCBI, UniProt * - Mouse - ``"mouse"`` or ``"mus_musculus"`` - Full Ensembl + MGI, NCBI, UniProt * - Pig - ``"pig"`` or ``"sus_scrofa"`` - Full Ensembl support .. tip:: Want to add support for another organism? See :doc:`_notebooks/02_prepare_new_external_yaml` for instructions on configuring external databases. Common Patterns --------------- Reusing Built Graphs ~~~~~~~~~~~~~~~~~~~~ Once a graph is built, it's cached locally. Check if a graph exists before building: .. code-block:: python api = idtrack.API(local_repository="./idtrack_cache") organism, release = api.resolve_organism("human") # The graph is automatically loaded from cache if it exists # Building is skipped if the cache is valid api.build_graph( organism_name=organism, snapshot_release=release, calculate_caches=True ) Working with AnnData Objects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IDTrack integrates with AnnData for single-cell analysis workflows: .. code-block:: python import anndata as ad import idtrack # Load your AnnData object adata = ad.read_h5ad("your_data.h5ad") # Use the harmonizer for batch feature conversion api = idtrack.API(local_repository="./idtrack_cache") organism, release = api.resolve_organism("human") api.build_graph(organism_name=organism, snapshot_release=release, calculate_caches=True) # See the harmonization tutorial for complete workflows # :doc:`_notebooks/05_tutorial_harmonization` For complete AnnData harmonization workflows, see :doc:`_notebooks/05_tutorial_harmonization`. What's Next? ------------ Now that you've completed the quickstart, explore these resources for deeper understanding: .. list-table:: :header-rows: 1 :widths: 30 70 * - Resource - Description * - :doc:`_notebooks/00_idtrack_overview` - Understand the mental model (time axis, space axis, snapshot boundary) * - :doc:`_notebooks/01_installation_guide` - Detailed environment setup and configuration * - :doc:`_notebooks/03_initialization_graph` - Graph building, caching, and management * - :doc:`_notebooks/04_api_deep_dive_human` - Complete API reference with advanced examples * - :doc:`_notebooks/05_tutorial_harmonization` - Real-world dataset harmonization workflows * - :doc:`_notebooks/06_tutorial_humanization_mouse_pig_to_human` - Cross-species "humanization" workflows * - :doc:`tutorials` - Complete learning path (Parts 0-7)