Appendix — Self-tests & Sanity Checks (Human, Mouse, Pig)

Last updated: 2026-01-08

Use this notebook when you want to confirm your setup is correct after building graphs.

It checks:

  • external YAML files exist

  • graphs can be built/loaded

  • a few representative conversions work

  • (optional) deeper integrity checks via TrackTests

Tip: If you hit an error, cross-reference Part 7.3 (07_advanced_topics.ipynb) for a diagnostic checklist.

1. Setup

 
from __future__ import annotations

import os
from pathlib import Path

import idtrack
from idtrack import DB

LOCAL_REPOSITORY = Path(os.environ.get('IDTRACK_LOCAL_REPO', './idtrack_cache')).resolve()
LOCAL_REPOSITORY.mkdir(parents=True, exist_ok=True)

api = idtrack.API(local_repository=str(LOCAL_REPOSITORY))
api.configure_logger()

print('Local repository:', LOCAL_REPOSITORY)

2. Helper: pick a representative gene ID from a built graph

We avoid hard-coding gene IDs for mouse/pig by selecting an example base Ensembl gene ID directly from the graph (these look like ENSG..., ENSMUSG..., ENSSSCG...).

 
def pick_example_base_gene_id(graph) -> str:
    for node, attrs in graph.nodes(data=True):
        if attrs.get(DB.node_type_str) == DB.nts_base_ensembl['gene']:
            return str(node)
    raise RuntimeError('No base_ensembl_gene node found in the graph.')

3. The checklist (repeat per organism)

For each organism we will:

  1. confirm the YAML exists (or explain defaults)

  2. build/load the graph snapshot

  3. run a small conversion smoke test

3A. Human

 
# 3A.1 YAML presence
human_yaml = LOCAL_REPOSITORY / 'homo_sapiens_externals_modified.yml'
print('OK' if human_yaml.exists() else 'NOTE: missing (IDTrack can fall back to packaged default for human)', human_yaml)

 
# 3A.2 Build/load graph
organism, latest_release = api.resolve_organism('human')
SNAPSHOT_RELEASE = latest_release
api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)

g = api.track.graph
print('Organism:', g.graph['organism'])
print('Release:', g.graph['ensembl_release'])
print('Assembly:', g.graph['genome_assembly'])

 
# 3A.3 Smoke conversion
example_id = pick_example_base_gene_id(api.track.graph)
result = api.convert_identifier(example_id, from_release=SNAPSHOT_RELEASE, to_release=SNAPSHOT_RELEASE)
result

Interpretation tips:

  • no_corresponding=True means the input ID didn’t match anything in the graph.

  • no_conversion=True means it matched, but IDTrack could not find a path to the target release.

  • target_id is a list because ambiguity is possible (1→n).

3B. Mouse

 
mouse_yaml = LOCAL_REPOSITORY / 'mus_musculus_externals_modified.yml'
HAS_MOUSE_YAML = mouse_yaml.exists()
print('OK' if HAS_MOUSE_YAML else 'MISSING (create this in Part 2 for mouse)', mouse_yaml)

 
if HAS_MOUSE_YAML:
    organism, latest_release = api.resolve_organism('mus musculus')
    SNAPSHOT_RELEASE = latest_release
    api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)

    g = api.track.graph
    print('Organism:', g.graph['organism'])
    print('Release:', g.graph['ensembl_release'])
    print('Assembly:', g.graph['genome_assembly'])
else:
    print('Skipping mouse tests: missing mus_musculus_externals_modified.yml')

 
if HAS_MOUSE_YAML:
    example_id = pick_example_base_gene_id(api.track.graph)
    result = api.convert_identifier(example_id, from_release=SNAPSHOT_RELEASE, to_release=SNAPSHOT_RELEASE)
    result
else:
    print('Skipping mouse smoke conversion (mouse graph not built).')

3C. Pig

 
pig_yaml = LOCAL_REPOSITORY / 'sus_scrofa_externals_modified.yml'
HAS_PIG_YAML = pig_yaml.exists()
print('OK' if HAS_PIG_YAML else 'MISSING (create this in Part 2 for pig)', pig_yaml)

 
if HAS_PIG_YAML:
    organism, latest_release = api.resolve_organism('sus scrofa')
    SNAPSHOT_RELEASE = latest_release
    api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)

    g = api.track.graph
    print('Organism:', g.graph['organism'])
    print('Release:', g.graph['ensembl_release'])
    print('Assembly:', g.graph['genome_assembly'])
else:
    print('Skipping pig tests: missing sus_scrofa_externals_modified.yml')

 
if HAS_PIG_YAML:
    example_id = pick_example_base_gene_id(api.track.graph)
    result = api.convert_identifier(example_id, from_release=SNAPSHOT_RELEASE, to_release=SNAPSHOT_RELEASE)
    result
else:
    print('Skipping pig smoke conversion (pig graph not built).')

4. Optional: deeper integrity checks (advanced)

IDTrack exposes a developer-focused test harness called TrackTests.

If you are a power user and want extra confidence, you can build the graph in test mode and run invariants. These tests can be slow on large graphs.

 
# Example: build TrackTests for human and run one invariant
# (This is optional; skip if you just want to use IDTrack.)

organism, latest_release = api.resolve_organism('human')
api.build_graph(organism_name=organism, snapshot_release=latest_release, return_test=True, calculate_caches=False)

tests = api.track  # TrackTests instance
# A relatively small check compared to full history stress-tests:
tests.is_base_is_range_correct(verbose=False)

5. If something fails

Common failure modes and fixes:

  1. Missing YAML (mouse/pig): run 02_prepare_new_external_yaml.ipynb and create the _modified.yml.

  2. Permission errors: change IDTRACK_LOCAL_REPO to a writable path.

  3. Network / MySQL issues: REST/FTP access is required; direct MySQL access is optional (IDTrack falls back to HTTPS/FTP dumps).

  4. Disk full: graphs can be large; move your local repository to a larger disk.