IDTrack
Cross-Temporal and Cross-Database Biological Identifier Mapping
Modern biology constantly mixes identifiers from different years, databases, and genome builds. The result is a familiar set of problems: IDs disappear, symbols change, references disagree, and “the same gene” isn’t always represented the same way across datasets.
IDTrack is built for that reality. It provides a time-aware, audit-friendly way to translate and harmonize biological identifiers across Ensembl releases and across external namespaces (HGNC, UniProt, RefSeq, Entrez, …), while keeping ambiguity explicit instead of silently forcing a single answer.
What makes IDTrack different
Time-aware mapping: treat Ensembl releases as a “time axis” and travel forward/backward through identifier history.
Assembly-aware mapping: harmonize identifiers across genome builds (e.g. GRCh37 ↔ GRCh38) and respect external databases that are assembly-scoped.
Snapshot boundary for reproducibility: build a release-bounded graph snapshot so results are stable and repeatable.
Explicit external database opt-in: choose which external namespaces participate via a small, editable YAML contract.
Transparency over coercion: conversions are naturally classified as 1→0 (no match), 1→1 (clean), or 1→n (ambiguous).
Scale-ready workflows: caching and snapshot reuse make repeated conversions and multi-dataset harmonization practical.
Who is it for?
Wet-lab researchers who need a reliable, step-by-step path from “my gene list is old” to “my analysis is reproducible”.
Bioinformaticians who want release-pinned, auditable conversions in notebooks, pipelines, and integration workflows.
Atlas builders / integrators who need to harmonize gene identifiers across many cohorts (different Ensembl releases, symbols, and external IDs), keep an explicit audit trail of what mapped/failed/was ambiguous, and ship a release-pinned, reproducible feature space for downstream integration and publication.
Common use cases
Dataset harmonization before integration (single-cell, bulk, atlas-scale collections).
Legacy data rescue (old Ensembl releases, mixed symbols/IDs, retired identifiers).
Publication-grade reproducibility (pin a snapshot boundary + share the exact external configuration).
Cross-database interoperability when collaborators use different identifier conventions.
Start here
Tutorials — the full tutorial suite (Part 0 → Part 7); primary learning resource
Quickstart — a minimal quickstart