{ "cells": [ { "cell_type": "markdown", "id": "0b2e93d0", "metadata": {}, "source": [ "# Appendix — Self-tests & Sanity Checks (Human, Mouse, Pig)\n", "\n", "*Last updated:* 2026-01-08\n", "\n", "Use this notebook when you want to **confirm your setup is correct** after building graphs.\n", "\n", "It checks:\n", "- external YAML files exist\n", "- graphs can be built/loaded\n", "- a few representative conversions work\n", "- (optional) deeper integrity checks via `TrackTests`\n", "\n", "> **Tip:** If you hit an error, cross-reference Part 7.3 (`07_advanced_topics.ipynb`) for a diagnostic checklist.\n" ] }, { "cell_type": "markdown", "id": "ff6381f8", "metadata": {}, "source": [ "## 1. Setup\n" ] }, { "cell_type": "code", "execution_count": null, "id": "44b0728e", "metadata": {}, "outputs": [], "source": [ "from __future__ import annotations\n", "\n", "import os\n", "from pathlib import Path\n", "\n", "import idtrack\n", "from idtrack import DB\n", "\n", "LOCAL_REPOSITORY = Path(os.environ.get('IDTRACK_LOCAL_REPO', './idtrack_cache')).resolve()\n", "LOCAL_REPOSITORY.mkdir(parents=True, exist_ok=True)\n", "\n", "api = idtrack.API(local_repository=str(LOCAL_REPOSITORY))\n", "api.configure_logger()\n", "\n", "print('Local repository:', LOCAL_REPOSITORY)\n" ] }, { "cell_type": "markdown", "id": "895cfa46", "metadata": {}, "source": [ "## 2. Helper: pick a representative gene ID from a built graph\n", "\n", "We avoid hard-coding gene IDs for mouse/pig by selecting an example **base Ensembl gene ID** directly from\n", "the graph (these look like `ENSG...`, `ENSMUSG...`, `ENSSSCG...`).\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5832b4e0", "metadata": {}, "outputs": [], "source": [ "def pick_example_base_gene_id(graph) -> str:\n", " for node, attrs in graph.nodes(data=True):\n", " if attrs.get(DB.node_type_str) == DB.nts_base_ensembl['gene']:\n", " return str(node)\n", " raise RuntimeError('No base_ensembl_gene node found in the graph.')\n" ] }, { "cell_type": "markdown", "id": "127aa451", "metadata": {}, "source": [ "## 3. The checklist (repeat per organism)\n", "\n", "For each organism we will:\n", "1. confirm the YAML exists (or explain defaults)\n", "2. build/load the graph snapshot\n", "3. run a small conversion smoke test\n" ] }, { "cell_type": "markdown", "id": "614a53f0", "metadata": {}, "source": [ "### 3A. Human" ] }, { "cell_type": "code", "execution_count": null, "id": "c9dd8b9e", "metadata": {}, "outputs": [], "source": [ "# 3A.1 YAML presence\n", "human_yaml = LOCAL_REPOSITORY / 'homo_sapiens_externals_modified.yml'\n", "print('OK' if human_yaml.exists() else 'NOTE: missing (IDTrack can fall back to packaged default for human)', human_yaml)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9f0aba9f", "metadata": {}, "outputs": [], "source": [ "# 3A.2 Build/load graph\n", "organism, latest_release = api.resolve_organism('human')\n", "SNAPSHOT_RELEASE = latest_release\n", "api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)\n", "\n", "g = api.track.graph\n", "print('Organism:', g.graph['organism'])\n", "print('Release:', g.graph['ensembl_release'])\n", "print('Assembly:', g.graph['genome_assembly'])\n" ] }, { "cell_type": "code", "execution_count": null, "id": "2fc9415e", "metadata": {}, "outputs": [], "source": [ "# 3A.3 Smoke conversion\n", "example_id = pick_example_base_gene_id(api.track.graph)\n", "result = api.convert_identifier(example_id, from_release=SNAPSHOT_RELEASE, to_release=SNAPSHOT_RELEASE)\n", "result\n" ] }, { "cell_type": "markdown", "id": "98985144", "metadata": {}, "source": [ "Interpretation tips:\n", "- `no_corresponding=True` means the input ID didn't match anything in the graph.\n", "- `no_conversion=True` means it matched, but IDTrack could not find a path to the target release.\n", "- `target_id` is a list because ambiguity is possible (1→n).\n" ] }, { "cell_type": "markdown", "id": "c181b8dd", "metadata": {}, "source": [ "### 3B. Mouse" ] }, { "cell_type": "code", "execution_count": null, "id": "8170c8e4", "metadata": {}, "outputs": [], "source": [ "mouse_yaml = LOCAL_REPOSITORY / 'mus_musculus_externals_modified.yml'\n", "HAS_MOUSE_YAML = mouse_yaml.exists()\n", "print('OK' if HAS_MOUSE_YAML else 'MISSING (create this in Part 2 for mouse)', mouse_yaml)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "64f881bc", "metadata": {}, "outputs": [], "source": [ "if HAS_MOUSE_YAML:\n", " organism, latest_release = api.resolve_organism('mus musculus')\n", " SNAPSHOT_RELEASE = latest_release\n", " api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)\n", "\n", " g = api.track.graph\n", " print('Organism:', g.graph['organism'])\n", " print('Release:', g.graph['ensembl_release'])\n", " print('Assembly:', g.graph['genome_assembly'])\n", "else:\n", " print('Skipping mouse tests: missing mus_musculus_externals_modified.yml')\n" ] }, { "cell_type": "code", "execution_count": null, "id": "de227285", "metadata": {}, "outputs": [], "source": [ "if HAS_MOUSE_YAML:\n", " example_id = pick_example_base_gene_id(api.track.graph)\n", " result = api.convert_identifier(example_id, from_release=SNAPSHOT_RELEASE, to_release=SNAPSHOT_RELEASE)\n", " result\n", "else:\n", " print('Skipping mouse smoke conversion (mouse graph not built).')\n" ] }, { "cell_type": "markdown", "id": "5a802348", "metadata": {}, "source": [ "### 3C. Pig" ] }, { "cell_type": "code", "execution_count": null, "id": "5a9a45c1", "metadata": {}, "outputs": [], "source": [ "pig_yaml = LOCAL_REPOSITORY / 'sus_scrofa_externals_modified.yml'\n", "HAS_PIG_YAML = pig_yaml.exists()\n", "print('OK' if HAS_PIG_YAML else 'MISSING (create this in Part 2 for pig)', pig_yaml)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d8793ffc", "metadata": {}, "outputs": [], "source": [ "if HAS_PIG_YAML:\n", " organism, latest_release = api.resolve_organism('sus scrofa')\n", " SNAPSHOT_RELEASE = latest_release\n", " api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)\n", "\n", " g = api.track.graph\n", " print('Organism:', g.graph['organism'])\n", " print('Release:', g.graph['ensembl_release'])\n", " print('Assembly:', g.graph['genome_assembly'])\n", "else:\n", " print('Skipping pig tests: missing sus_scrofa_externals_modified.yml')\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a54d6e38", "metadata": {}, "outputs": [], "source": [ "if HAS_PIG_YAML:\n", " example_id = pick_example_base_gene_id(api.track.graph)\n", " result = api.convert_identifier(example_id, from_release=SNAPSHOT_RELEASE, to_release=SNAPSHOT_RELEASE)\n", " result\n", "else:\n", " print('Skipping pig smoke conversion (pig graph not built).')\n" ] }, { "cell_type": "markdown", "id": "9ec4e0e9", "metadata": {}, "source": [ "## 4. Optional: deeper integrity checks (advanced)\n", "\n", "IDTrack exposes a developer-focused test harness called `TrackTests`.\n", "\n", "If you are a power user and want extra confidence, you can build the graph in test mode and run invariants.\n", "These tests can be slow on large graphs.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a46bcb55", "metadata": {}, "outputs": [], "source": [ "# Example: build TrackTests for human and run one invariant\n", "# (This is optional; skip if you just want to use IDTrack.)\n", "\n", "organism, latest_release = api.resolve_organism('human')\n", "api.build_graph(organism_name=organism, snapshot_release=latest_release, return_test=True, calculate_caches=False)\n", "\n", "tests = api.track # TrackTests instance\n", "# A relatively small check compared to full history stress-tests:\n", "tests.is_base_is_range_correct(verbose=False)\n" ] }, { "cell_type": "markdown", "id": "329a3f6c", "metadata": {}, "source": [ "## 5. If something fails\n", "\n", "Common failure modes and fixes:\n", "\n", "1. **Missing YAML** (mouse/pig): run `02_prepare_new_external_yaml.ipynb` and create the `_modified.yml`.\n", "2. **Permission errors**: change `IDTRACK_LOCAL_REPO` to a writable path.\n", "3. **Network / MySQL issues**: REST/FTP access is required; direct MySQL access is optional (IDTrack falls back to HTTPS/FTP dumps).\n", "4. **Disk full**: graphs can be large; move your local repository to a larger disk.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }