Appendix — Self-tests & Sanity Checks (Human, Mouse, Pig)
Last updated: 2026-01-08
Use this notebook when you want to confirm your setup is correct after building graphs.
It checks:
external YAML files exist
graphs can be built/loaded
a few representative conversions work
(optional) deeper integrity checks via
TrackTests
Tip: If you hit an error, cross-reference Part 7.3 (
07_advanced_topics.ipynb) for a diagnostic checklist.
1. Setup
from __future__ import annotations
import os
from pathlib import Path
import idtrack
from idtrack import DB
LOCAL_REPOSITORY = Path(os.environ.get('IDTRACK_LOCAL_REPO', './idtrack_cache')).resolve()
LOCAL_REPOSITORY.mkdir(parents=True, exist_ok=True)
api = idtrack.API(local_repository=str(LOCAL_REPOSITORY))
api.configure_logger()
print('Local repository:', LOCAL_REPOSITORY)
2. Helper: pick a representative gene ID from a built graph
We avoid hard-coding gene IDs for mouse/pig by selecting an example base Ensembl gene ID directly from the graph (these look like ENSG..., ENSMUSG..., ENSSSCG...).
def pick_example_base_gene_id(graph) -> str:
for node, attrs in graph.nodes(data=True):
if attrs.get(DB.node_type_str) == DB.nts_base_ensembl['gene']:
return str(node)
raise RuntimeError('No base_ensembl_gene node found in the graph.')
3. The checklist (repeat per organism)
For each organism we will:
confirm the YAML exists (or explain defaults)
build/load the graph snapshot
run a small conversion smoke test
3A. Human
# 3A.1 YAML presence
human_yaml = LOCAL_REPOSITORY / 'homo_sapiens_externals_modified.yml'
print('OK' if human_yaml.exists() else 'NOTE: missing (IDTrack can fall back to packaged default for human)', human_yaml)
# 3A.2 Build/load graph
organism, latest_release = api.resolve_organism('human')
SNAPSHOT_RELEASE = latest_release
api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)
g = api.track.graph
print('Organism:', g.graph['organism'])
print('Release:', g.graph['ensembl_release'])
print('Assembly:', g.graph['genome_assembly'])
# 3A.3 Smoke conversion
example_id = pick_example_base_gene_id(api.track.graph)
result = api.convert_identifier(example_id, from_release=SNAPSHOT_RELEASE, to_release=SNAPSHOT_RELEASE)
result
Interpretation tips:
no_corresponding=Truemeans the input ID didn’t match anything in the graph.no_conversion=Truemeans it matched, but IDTrack could not find a path to the target release.target_idis a list because ambiguity is possible (1→n).
3B. Mouse
mouse_yaml = LOCAL_REPOSITORY / 'mus_musculus_externals_modified.yml'
HAS_MOUSE_YAML = mouse_yaml.exists()
print('OK' if HAS_MOUSE_YAML else 'MISSING (create this in Part 2 for mouse)', mouse_yaml)
if HAS_MOUSE_YAML:
organism, latest_release = api.resolve_organism('mus musculus')
SNAPSHOT_RELEASE = latest_release
api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)
g = api.track.graph
print('Organism:', g.graph['organism'])
print('Release:', g.graph['ensembl_release'])
print('Assembly:', g.graph['genome_assembly'])
else:
print('Skipping mouse tests: missing mus_musculus_externals_modified.yml')
if HAS_MOUSE_YAML:
example_id = pick_example_base_gene_id(api.track.graph)
result = api.convert_identifier(example_id, from_release=SNAPSHOT_RELEASE, to_release=SNAPSHOT_RELEASE)
result
else:
print('Skipping mouse smoke conversion (mouse graph not built).')
3C. Pig
pig_yaml = LOCAL_REPOSITORY / 'sus_scrofa_externals_modified.yml'
HAS_PIG_YAML = pig_yaml.exists()
print('OK' if HAS_PIG_YAML else 'MISSING (create this in Part 2 for pig)', pig_yaml)
if HAS_PIG_YAML:
organism, latest_release = api.resolve_organism('sus scrofa')
SNAPSHOT_RELEASE = latest_release
api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)
g = api.track.graph
print('Organism:', g.graph['organism'])
print('Release:', g.graph['ensembl_release'])
print('Assembly:', g.graph['genome_assembly'])
else:
print('Skipping pig tests: missing sus_scrofa_externals_modified.yml')
if HAS_PIG_YAML:
example_id = pick_example_base_gene_id(api.track.graph)
result = api.convert_identifier(example_id, from_release=SNAPSHOT_RELEASE, to_release=SNAPSHOT_RELEASE)
result
else:
print('Skipping pig smoke conversion (pig graph not built).')
4. Optional: deeper integrity checks (advanced)
IDTrack exposes a developer-focused test harness called TrackTests.
If you are a power user and want extra confidence, you can build the graph in test mode and run invariants. These tests can be slow on large graphs.
# Example: build TrackTests for human and run one invariant
# (This is optional; skip if you just want to use IDTrack.)
organism, latest_release = api.resolve_organism('human')
api.build_graph(organism_name=organism, snapshot_release=latest_release, return_test=True, calculate_caches=False)
tests = api.track # TrackTests instance
# A relatively small check compared to full history stress-tests:
tests.is_base_is_range_correct(verbose=False)
5. If something fails
Common failure modes and fixes:
Missing YAML (mouse/pig): run
02_prepare_new_external_yaml.ipynband create the_modified.yml.Permission errors: change
IDTRACK_LOCAL_REPOto a writable path.Network / MySQL issues: REST/FTP access is required; direct MySQL access is optional (IDTrack falls back to HTTPS/FTP dumps).
Disk full: graphs can be large; move your local repository to a larger disk.