Quickstart
Get started with IDTrack in minutes. This page provides minimal, copy-pasteable code examples to help you perform your first identifier conversions.
Tip
For the complete learning path with detailed explanations, see Tutorials (Parts 0-7).
Note
IDTrack snapshots are both release-aware and assembly-aware. By default, a snapshot graph keeps the organism’s configured assemblies (e.g. human GRCh38 + GRCh37), so you can harmonize mixed-build inputs into one target space (your snapshot release + chosen output/primary assembly).
The Three-Step Workflow
IDTrack follows a simple three-step workflow for all identifier conversions:
Step |
Action |
Description |
|---|---|---|
1 |
Initialize |
Create an API instance with a local cache directory |
2 |
Build |
Build a graph snapshot for your target organism and snapshot release (multi-assembly by default) |
3 |
Convert |
Map identifiers using the precomputed graph |
The first graph build downloads external databases and takes several minutes. Subsequent runs load the cached graph in seconds.
Example 1: Convert a Human Gene Symbol
Convert a gene symbol to its Ensembl ID at the latest Ensembl release:
import os
from pathlib import Path
import idtrack
# Step 1: Set up your local cache directory
# This stores downloaded databases and built graphs for reuse
local_repo = Path(os.environ.get("IDTRACK_LOCAL_REPO", "./idtrack_cache")).resolve()
local_repo.mkdir(parents=True, exist_ok=True)
# Step 2: Initialize the API
api = idtrack.API(local_repository=str(local_repo))
# Step 3: Resolve the organism and get the latest release
organism, latest_release = api.resolve_organism("human")
print(f"Using {organism} at Ensembl release {latest_release}")
# Step 4: Build the graph (cached after first run)
api.build_graph(
organism_name=organism,
snapshot_release=latest_release,
calculate_caches=True
)
# Step 5: Convert an identifier
result = api.convert_identifier("TP53", to_release=latest_release)
target = result["target_id"][0] if result["target_id"] else None
print(f"TP53 -> {target}")
Expected output:
Using homo_sapiens at Ensembl release 114
TP53 -> ENSG00000141510
Note
The first graph build downloads external databases and constructs the mapping graph. This can take several minutes depending on your internet connection. Subsequent runs load the pre-built graph from your local cache in seconds.
Example 2: Batch Conversion
Convert multiple identifiers at once for better performance:
import idtrack
api = idtrack.API(local_repository="./idtrack_cache")
organism, release = api.resolve_organism("human")
api.build_graph(organism_name=organism, snapshot_release=release, calculate_caches=True)
# Convert a batch of gene symbols
genes = ["TP53", "BRCA1", "EGFR", "KRAS", "MYC"]
results = api.convert_identifier_multiple(genes, to_release=release, verbose=False)
# Print results
print("Gene Symbol -> Ensembl ID")
print("-" * 30)
for gene, result in zip(genes, results):
target = result["target_id"][0] if result["target_id"] else None
print(f"{gene:<10} -> {target}")
Expected output:
Gene Symbol -> Ensembl ID
------------------------------
TP53 -> ENSG00000141510
BRCA1 -> ENSG00000012048
EGFR -> ENSG00000146648
KRAS -> ENSG00000133703
MYC -> ENSG00000136997
Example 3: Working with Mouse
IDTrack supports multiple organisms. Here’s how to work with mouse:
import idtrack
api = idtrack.API(local_repository="./idtrack_cache")
# Resolve mouse organism
organism, release = api.resolve_organism("mouse")
print(f"Using {organism} at Ensembl release {release}")
# Build the mouse graph
api.build_graph(organism_name=organism, snapshot_release=release, calculate_caches=True)
# Convert mouse gene symbols
mouse_genes = ["Trp53", "Brca1", "Egfr"]
results = api.convert_identifier_multiple(mouse_genes, to_release=release, verbose=False)
for gene, result in zip(mouse_genes, results):
target = result["target_id"][0] if result["target_id"] else None
print(f"{gene} -> {target}")
Example 4: Understanding Conversion Outcomes
IDTrack is transparent about mapping ambiguity. Conversions can result in three outcomes:
Outcome |
Meaning |
Interpretation |
|---|---|---|
1 -> 1 |
Unique mapping |
Ideal case: one identifier maps to exactly one target |
1 -> 0 |
No mapping found |
Identifier may be retired, unknown, or from a different namespace |
1 -> n |
Multiple mappings |
Ambiguous: multiple valid targets exist (honestly reported) |
By default, idtrack.API.convert_identifier() uses strategy="best" (it returns a single best target).
To surface ambiguity, use strategy="all" and inspect result["target_id"].
Get detailed conversion information with the explain parameter:
import idtrack
api = idtrack.API(local_repository="./idtrack_cache")
organism, release = api.resolve_organism("human")
api.build_graph(organism_name=organism, snapshot_release=release, calculate_caches=True)
# Get detailed conversion information
result = api.convert_identifier(
"TP53",
to_release=release,
strategy="best",
explain=True # Returns detailed mapping information
)
print("Targets:", result["target_id"])
print("Matched to graph node:", result["graph_id"])
print("No corresponding node:", result["no_corresponding"])
print("No conversion possible:", result["no_conversion"])
The explain=True option provides details about the mapping path, intermediate nodes,
and confidence of the conversion.
Example 5: Cross-Release Conversion
Convert identifiers between different Ensembl releases to handle legacy data:
import idtrack
api = idtrack.API(local_repository="./idtrack_cache")
organism, latest_release = api.resolve_organism("human")
# Build graph for the latest release
api.build_graph(
organism_name=organism,
snapshot_release=latest_release,
calculate_caches=True
)
# Convert an identifier from an older release to the latest
# This handles retired IDs, merged genes, and nomenclature changes
old_ensembl_id = "ENSG00000141510" # TP53
result = api.convert_identifier(
old_ensembl_id,
from_release=100, # Identifier came from Ensembl release 100
to_release=latest_release
)
target = result["target_id"][0] if result["target_id"] else None
print(f"Release 100 -> Release {latest_release}: {target}")
Environment Variable Configuration
For convenience, set the IDTRACK_LOCAL_REPO environment variable to avoid specifying
the cache directory in every script.
# Add to your ~/.bashrc or ~/.zshrc
export IDTRACK_LOCAL_REPO="$HOME/.idtrack"
# Persist across sessions
set -Ux IDTRACK_LOCAL_REPO $HOME/.idtrack
# Set user environment variable
[Environment]::SetEnvironmentVariable("IDTRACK_LOCAL_REPO", "$env:USERPROFILE\.idtrack", "User")
Then in your Python code:
import os
import idtrack
# Automatically uses IDTRACK_LOCAL_REPO if set
local_repo = os.environ.get("IDTRACK_LOCAL_REPO", "./idtrack_cache")
api = idtrack.API(local_repository=local_repo)
Supported Organisms
IDTrack currently supports the following organisms:
Organism |
Identifier |
External Database Support |
|---|---|---|
Human |
|
Full Ensembl + HGNC, NCBI, UniProt |
Mouse |
|
Full Ensembl + MGI, NCBI, UniProt |
Pig |
|
Full Ensembl support |
Tip
Want to add support for another organism? See Part 2 — External Database Configuration for instructions on configuring external databases.
Common Patterns
Reusing Built Graphs
Once a graph is built, it’s cached locally. Check if a graph exists before building:
api = idtrack.API(local_repository="./idtrack_cache")
organism, release = api.resolve_organism("human")
# The graph is automatically loaded from cache if it exists
# Building is skipped if the cache is valid
api.build_graph(
organism_name=organism,
snapshot_release=release,
calculate_caches=True
)
Working with AnnData Objects
IDTrack integrates with AnnData for single-cell analysis workflows:
import anndata as ad
import idtrack
# Load your AnnData object
adata = ad.read_h5ad("your_data.h5ad")
# Use the harmonizer for batch feature conversion
api = idtrack.API(local_repository="./idtrack_cache")
organism, release = api.resolve_organism("human")
api.build_graph(organism_name=organism, snapshot_release=release, calculate_caches=True)
# See the harmonization tutorial for complete workflows
# :doc:`_notebooks/05_tutorial_harmonization`
For complete AnnData harmonization workflows, see Part 5 — Real-World Experiments: Harmonization.
What’s Next?
Now that you’ve completed the quickstart, explore these resources for deeper understanding:
Resource |
Description |
|---|---|
Understand the mental model (time axis, space axis, snapshot boundary) |
|
Detailed environment setup and configuration |
|
Graph building, caching, and management |
|
Complete API reference with advanced examples |
|
Real-world dataset harmonization workflows |
|
Cross-species “humanization” workflows |
|
Complete learning path (Parts 0-7) |