{ "cells": [ { "cell_type": "markdown", "id": "80c4f8bf", "metadata": {}, "source": [ "# Part 3 — Graph Initialization & Management\n", "\n", "*Last updated:* 2026-01-08\n", "\n", "This notebook shows how to **build and cache** an IDTrack graph snapshot for:\n", "- `homo_sapiens` (human)\n", "- `mus_musculus` (mouse)\n", "- `sus_scrofa` (pig)\n", "\n", "A graph build is the most expensive step. The good news:\n", "- you usually do it **once per organism + snapshot boundary + external YAML configuration**\n", "- the snapshot can be **multi-assembly** (human has overlapping GRCh38/GRCh37; mouse/pig are clean-handoff by release but legacy builds are supported)\n", "- then you reuse the cached graph for fast conversions\n", "\n", "**Learning objectives**\n", "- Build (or load) a graph snapshot for each organism.\n", "- Verify that the snapshot exists on disk.\n", "- Learn practical graph-management habits (reload vs rebuild, cache hygiene).\n", "\n", "> **Prerequisite:** run `02_prepare_new_external_yaml.ipynb` first (especially important for mouse and pig).\n" ] }, { "cell_type": "markdown", "id": "c557f533", "metadata": {}, "source": [ "## 3.0 — What you should expect (time / disk)\n", "\n", "Graph building can take:\n", "- minutes to hours (depends on organism, enabled externals, and cache status)\n", "- multiple GB of disk for cached tables + the graph pickle\n", "\n", "Plan for this like you would plan for downloading a reference genome + annotation.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "7d8b4430", "metadata": {}, "outputs": [], "source": [ "# Load notebook utilities (collapsible output magic for tutorials)\n", "%load_ext _notebook_utils" ] }, { "cell_type": "code", "execution_count": 2, "id": "8312a066", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Local repository: /Users/kemalinecik/git_nosync/master_idtrack/idtrack/docs/_notebooks/idtrack_cache\n" ] } ], "source": [ "# 1) Setup\n", "from __future__ import annotations\n", "\n", "import os\n", "from pathlib import Path\n", "\n", "import idtrack\n", "\n", "LOCAL_REPOSITORY = Path(os.environ.get('IDTRACK_LOCAL_REPO', './idtrack_cache')).resolve()\n", "LOCAL_REPOSITORY.mkdir(parents=True, exist_ok=True)\n", "\n", "api = idtrack.API(local_repository=str(LOCAL_REPOSITORY))\n", "api.configure_logger()\n", "\n", "print('Local repository:', LOCAL_REPOSITORY)\n" ] }, { "cell_type": "markdown", "id": "ebfc85d3", "metadata": {}, "source": [ "## 3.0.1 — Sanity check: do your external YAML files exist?\n", "\n", "Human has a packaged default, but for mouse and pig you should have local `*_externals_modified.yml` files.\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "a039edd6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OK homo_sapiens_externals_modified.yml\n", "MISSING (create in Part 2 for mouse) mus_musculus_externals_modified.yml\n", "MISSING (create in Part 2 for pig) sus_scrofa_externals_modified.yml\n" ] } ], "source": [ "# External YAML presence (created in Part 2)\n", "human_yaml = LOCAL_REPOSITORY / 'homo_sapiens_externals_modified.yml'\n", "mouse_yaml = LOCAL_REPOSITORY / 'mus_musculus_externals_modified.yml'\n", "pig_yaml = LOCAL_REPOSITORY / 'sus_scrofa_externals_modified.yml'\n", "\n", "HAS_HUMAN_YAML = human_yaml.exists()\n", "HAS_MOUSE_YAML = mouse_yaml.exists()\n", "HAS_PIG_YAML = pig_yaml.exists()\n", "\n", "print((\"OK\" if HAS_HUMAN_YAML else \"NOTE: missing (human can fall back to packaged default)\").ljust(55), human_yaml.name)\n", "print((\"OK\" if HAS_MOUSE_YAML else \"MISSING (create in Part 2 for mouse)\").ljust(55), mouse_yaml.name)\n", "print((\"OK\" if HAS_PIG_YAML else \"MISSING (create in Part 2 for pig)\").ljust(55), pig_yaml.name)\n" ] }, { "cell_type": "markdown", "id": "c8848d4e", "metadata": {}, "source": [ "If a file is missing:\n", "- go back to `02_prepare_new_external_yaml.ipynb`\n", "- generate the template and create the `_modified.yml` file\n" ] }, { "cell_type": "markdown", "id": "fe65092a", "metadata": {}, "source": [ "## 3.1–3.3 — Build graph snapshots (one per organism)\n", "\n", "The canonical pattern is:\n", "\n", "1. resolve organism name\n", "2. pick snapshot release\n", "3. (optional) choose a primary genome assembly for output (defaults to the newest/highest-priority assembly for that organism)\n", "4. `api.build_graph(...)`\n", "5. inspect + reuse\n", "\n", "We do this for each organism below. If you only need one organism, run only that section.\n" ] }, { "cell_type": "markdown", "id": "81d96570", "metadata": {}, "source": [ "### 3.1 — Human graph initialization (multi-assembly)\n", "\n", "By default, the human snapshot is built with GRCh38 as the primary assembly (assembly code `38`), while also including GRCh37 (`37`) and\n", "older archives when they exist within the snapshot window.\n", "\n", "This is what enables atlas-building workflows where different datasets were annotated with different genome builds, but you want one unified\n", "identifier space.\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "17ea5d79", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2026-01-10 16:42:27 INFO:verify_organism: Ensembl Rest API query to get the organism names and associated releases.\n" ] }, { "data": { "text/plain": [ "('homo_sapiens', 115)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "organism, latest_release = api.resolve_organism('human')\n", "SNAPSHOT_RELEASE = latest_release # pin to a specific release if needed\n", "organism, SNAPSHOT_RELEASE\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "a3c031c8", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "ename": "KeyboardInterrupt", "evalue": "", "output_type": "error", "traceback": [ "\u001b[31m---------------------------------------------------------------------------\u001b[39m", "\u001b[31mKeyboardInterrupt\u001b[39m Traceback (most recent call last)", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[5]\u001b[39m\u001b[32m, line 5\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;66;03m# Included for tutorial purposes only.\u001b[39;00m\n\u001b[32m 2\u001b[39m \n\u001b[32m 3\u001b[39m \u001b[38;5;66;03m# Build (or load) the graph snapshot\u001b[39;00m\n\u001b[32m 4\u001b[39m \u001b[38;5;66;03m# - calculate_caches=True speeds up later queries (slower build, faster use).\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m5\u001b[39m \u001b[43mapi\u001b[49m\u001b[43m.\u001b[49m\u001b[43mbuild_graph\u001b[49m\u001b[43m(\u001b[49m\u001b[43morganism_name\u001b[49m\u001b[43m=\u001b[49m\u001b[43morganism\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msnapshot_release\u001b[49m\u001b[43m=\u001b[49m\u001b[43mSNAPSHOT_RELEASE\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcalculate_caches\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_api.py:213\u001b[39m, in \u001b[36mAPI.build_graph\u001b[39m\u001b[34m(self, organism_name, snapshot_release, genome_assembly, return_test, calculate_caches)\u001b[39m\n\u001b[32m 211\u001b[39m \u001b[38;5;28mself\u001b[39m.track = TrackTests(dm)\n\u001b[32m 212\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m213\u001b[39m \u001b[38;5;28mself\u001b[39m.track = \u001b[43mTrack\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdm\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 215\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m calculate_caches \u001b[38;5;129;01mand\u001b[39;00m return_test:\n\u001b[32m 216\u001b[39m \u001b[38;5;28mself\u001b[39m.calculate_graph_caches(for_test=\u001b[38;5;28;01mTrue\u001b[39;00m)\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_track.py:83\u001b[39m, in \u001b[36mTrack.__init__\u001b[39m\u001b[34m(self, db_manager, **kwargs)\u001b[39m\n\u001b[32m 80\u001b[39m graph_creator = GraphMaker(\u001b[38;5;28mself\u001b[39m.db_manager)\n\u001b[32m 82\u001b[39m \u001b[38;5;66;03m# Calculate/Load the graph\u001b[39;00m\n\u001b[32m---> \u001b[39m\u001b[32m83\u001b[39m \u001b[38;5;28mself\u001b[39m.graph = \u001b[43mgraph_creator\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget_graph\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 84\u001b[39m \u001b[38;5;28mself\u001b[39m.version_info = \u001b[38;5;28mself\u001b[39m.graph.graph[\u001b[33m\"\u001b[39m\u001b[33mversion_info\u001b[39m\u001b[33m\"\u001b[39m]\n\u001b[32m 85\u001b[39m \u001b[38;5;28mself\u001b[39m._external_entrance_placeholder = {\u001b[38;5;28;01mFalse\u001b[39;00m: -\u001b[32m1\u001b[39m, \u001b[38;5;28;01mTrue\u001b[39;00m: \u001b[32m10001\u001b[39m}\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py:1086\u001b[39m, in \u001b[36mGraphMaker.get_graph\u001b[39m\u001b[34m(self, narrow, create_even_if_exist, save_after_calculation, overwrite_even_if_exist, form_list, narrow_external)\u001b[39m\n\u001b[32m 1084\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m os.access(file_path, os.R_OK) \u001b[38;5;129;01mor\u001b[39;00m create_even_if_exist:\n\u001b[32m 1085\u001b[39m \u001b[38;5;28mself\u001b[39m.log.info(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mThe graph is being constructed: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mfile_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m)\n\u001b[32m-> \u001b[39m\u001b[32m1086\u001b[39m g = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mconstruct_graph\u001b[49m\u001b[43m(\u001b[49m\u001b[43mnarrow\u001b[49m\u001b[43m=\u001b[49m\u001b[43mnarrow\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mform_list\u001b[49m\u001b[43m=\u001b[49m\u001b[43mform_list\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mnarrow_external\u001b[49m\u001b[43m=\u001b[49m\u001b[43mnarrow_external\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1087\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m: \u001b[38;5;66;03m# Otherwise, just read the file that is already in the directory.\u001b[39;00m\n\u001b[32m 1088\u001b[39m \u001b[38;5;28mself\u001b[39m.log.info(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mThe graph is being read: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mfile_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m)\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py:224\u001b[39m, in \u001b[36mGraphMaker.construct_graph\u001b[39m\u001b[34m(self, narrow, form_list, narrow_external)\u001b[39m\n\u001b[32m 219\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m ens_rel \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28msorted\u001b[39m(\u001b[38;5;28mself\u001b[39m.db_manager.available_releases):\n\u001b[32m 220\u001b[39m \u001b[38;5;66;03m# the order is important in adding new nodes into the core graph.\u001b[39;00m\n\u001b[32m 221\u001b[39m \u001b[38;5;66;03m# it is important to capture correct ens_release in min_ens_release dictionary\u001b[39;00m\n\u001b[32m 223\u001b[39m db_manager = dbman_s[f].change_release(ens_rel)\n\u001b[32m--> \u001b[39m\u001b[32m224\u001b[39m rc = \u001b[43mdb_manager\u001b[49m\u001b[43m.\u001b[49m\u001b[43mcreate_external_all\u001b[49m\u001b[43m(\u001b[49m\u001b[43mreturn_mode\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mall\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mnarrow_external\u001b[49m\u001b[43m=\u001b[49m\u001b[43mnarrow_external\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 226\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m _ind, entry \u001b[38;5;129;01min\u001b[39;00m rc.iterrows():\n\u001b[32m 227\u001b[39m \u001b[38;5;66;03m# Note that the `rc` dataframe have higher priority assembly entries at the top.\u001b[39;00m\n\u001b[32m 229\u001b[39m e1, e2 = entry[\u001b[33m\"\u001b[39m\u001b[33mgraph_id\u001b[39m\u001b[33m\"\u001b[39m], entry[\u001b[33m\"\u001b[39m\u001b[33mid_db\u001b[39m\u001b[33m\"\u001b[39m]\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:2279\u001b[39m, in \u001b[36mDatabaseManager.create_external_all\u001b[39m\u001b[34m(self, return_mode, narrow_external)\u001b[39m\n\u001b[32m 2277\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m:\n\u001b[32m 2278\u001b[39m \u001b[38;5;28;01mcontinue\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m2279\u001b[39m df_temp = \u001b[43mdm\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget_db\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdf_indicator\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 2280\u001b[39m df_temp[\u001b[33m\"\u001b[39m\u001b[33massembly\u001b[39m\u001b[33m\"\u001b[39m] = i\n\u001b[32m 2281\u001b[39m df = pd.concat([df, df_temp])\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:2484\u001b[39m, in \u001b[36mDatabaseManager.get_db\u001b[39m\u001b[34m(self, df_indicator, create_even_if_exist, save_after_calculation, overwrite_even_if_exist)\u001b[39m\n\u001b[32m 2481\u001b[39m df = \u001b[38;5;28mself\u001b[39m.create_external_db(filter_mode=\u001b[33m\"\u001b[39m\u001b[33mall\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m 2483\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m main_ind == \u001b[33m\"\u001b[39m\u001b[33mexternal\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;129;01mand\u001b[39;00m param1_ind \u001b[38;5;129;01min\u001b[39;00m [\u001b[33m\"\u001b[39m\u001b[33mrelevant\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mdatabase\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mrelevant-database\u001b[39m\u001b[33m\"\u001b[39m]:\n\u001b[32m-> \u001b[39m\u001b[32m2484\u001b[39m df = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mcreate_external_db\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilter_mode\u001b[49m\u001b[43m=\u001b[49m\u001b[43mparam1_ind\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 2486\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m main_ind == \u001b[33m\"\u001b[39m\u001b[33midsraw\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m 2487\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m param1_ind \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m.available_form_of_interests:\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:1990\u001b[39m, in \u001b[36mDatabaseManager.create_external_db\u001b[39m\u001b[34m(self, filter_mode)\u001b[39m\n\u001b[32m 1986\u001b[39m a = \u001b[38;5;28mself\u001b[39m.get_db(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33midsraw_\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m.form\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m, save_after_calculation=\u001b[38;5;28mself\u001b[39m.store_raw_always)\n\u001b[32m 1987\u001b[39m ox = \u001b[38;5;28mself\u001b[39m.get_table(\n\u001b[32m 1988\u001b[39m \u001b[33m\"\u001b[39m\u001b[33mobject_xref\u001b[39m\u001b[33m\"\u001b[39m, usecols=[\u001b[33m\"\u001b[39m\u001b[33mensembl_id\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mensembl_object_type\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mxref_id\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mobject_xref_id\u001b[39m\u001b[33m\"\u001b[39m], **m\n\u001b[32m 1989\u001b[39m )\n\u001b[32m-> \u001b[39m\u001b[32m1990\u001b[39m x = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mget_table\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mxref\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43musecols\u001b[49m\u001b[43m=\u001b[49m\u001b[43m[\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mxref_id\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mexternal_db_id\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mdbprimary_acc\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mdisplay_label\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mm\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1991\u001b[39m ed = \u001b[38;5;28mself\u001b[39m.get_table(\u001b[33m\"\u001b[39m\u001b[33mexternal_db\u001b[39m\u001b[33m\"\u001b[39m, usecols=[\u001b[33m\"\u001b[39m\u001b[33mexternal_db_id\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mdb_name\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mdb_display_name\u001b[39m\u001b[33m\"\u001b[39m], **m)\n\u001b[32m 1992\u001b[39m ix = \u001b[38;5;28mself\u001b[39m.get_table(\u001b[33m\"\u001b[39m\u001b[33midentity_xref\u001b[39m\u001b[33m\"\u001b[39m, usecols=[\u001b[33m\"\u001b[39m\u001b[33mensembl_identity\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mxref_identity\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mobject_xref_id\u001b[39m\u001b[33m\"\u001b[39m], **m)\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:1104\u001b[39m, in \u001b[36mDatabaseManager.get_table\u001b[39m\u001b[34m(self, table_key, usecols, create_even_if_exist, save_after_calculation, overwrite_even_if_exist)\u001b[39m\n\u001b[32m 1102\u001b[39m df = \u001b[38;5;28mself\u001b[39m.download_table(table_key, usecols)\n\u001b[32m 1103\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m: \u001b[38;5;66;03m# Otherwise, just read the file that is already in the directory.\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1104\u001b[39m df = \u001b[43mhs\u001b[49m\u001b[43m.\u001b[49m\u001b[43mread_exported\u001b[49m\u001b[43m(\u001b[49m\u001b[43mhierarchy\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mfile_path\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1106\u001b[39m \u001b[38;5;66;03m# If prompt, save the dataframe in requested format.\u001b[39;00m\n\u001b[32m 1107\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m save_after_calculation:\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:771\u001b[39m, in \u001b[36mread_exported\u001b[39m\u001b[34m(hierarchy, file_path)\u001b[39m\n\u001b[32m 768\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m check_h5_key(file_path, hierarchy):\n\u001b[32m 769\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mKey \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mhierarchy\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[33m not found in HDF5 file \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mfile_path\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[33m.\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m--> \u001b[39m\u001b[32m771\u001b[39m df = \u001b[43mread_hdf\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m=\u001b[49m\u001b[43mfile_path\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkey\u001b[49m\u001b[43m=\u001b[49m\u001b[43mhierarchy\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmode\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mr\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[32m 772\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m df\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:100\u001b[39m, in \u001b[36mread_hdf\u001b[39m\u001b[34m(path, key, mode)\u001b[39m\n\u001b[32m 97\u001b[39m index, index_names = _load_index_data(grp)\n\u001b[32m 99\u001b[39m \u001b[38;5;66;03m# Load column data\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m100\u001b[39m df = \u001b[43m_load_column_data\u001b[49m\u001b[43m(\u001b[49m\u001b[43mgrp\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcolumns\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtypes\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 102\u001b[39m \u001b[38;5;66;03m# Set index and metadata\u001b[39;00m\n\u001b[32m 103\u001b[39m df.index = index\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:497\u001b[39m, in \u001b[36m_load_column_data\u001b[39m\u001b[34m(grp, columns, dtypes)\u001b[39m\n\u001b[32m 495\u001b[39m raw = data_grp[col_key][()]\n\u001b[32m 496\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(raw[\u001b[32m0\u001b[39m], \u001b[38;5;28mbytes\u001b[39m):\n\u001b[32m--> \u001b[39m\u001b[32m497\u001b[39m raw = \u001b[43m[\u001b[49m\u001b[43mx\u001b[49m\u001b[43m.\u001b[49m\u001b[43mdecode\u001b[49m\u001b[43m(\u001b[49m\u001b[43mDB\u001b[49m\u001b[43m.\u001b[49m\u001b[43mUTF8\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mfor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mx\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01min\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mraw\u001b[49m\u001b[43m]\u001b[49m\n\u001b[32m 498\u001b[39m restored = [pd.NA \u001b[38;5;28;01mif\u001b[39;00m x == DB.placeholder_na \u001b[38;5;28;01melse\u001b[39;00m x \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m raw]\n\u001b[32m 499\u001b[39m data_dict[col] = restored\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:497\u001b[39m, in \u001b[36m\u001b[39m\u001b[34m(.0)\u001b[39m\n\u001b[32m 495\u001b[39m raw = data_grp[col_key][()]\n\u001b[32m 496\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(raw[\u001b[32m0\u001b[39m], \u001b[38;5;28mbytes\u001b[39m):\n\u001b[32m--> \u001b[39m\u001b[32m497\u001b[39m raw = [x.decode(DB.UTF8) \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m raw]\n\u001b[32m 498\u001b[39m restored = [pd.NA \u001b[38;5;28;01mif\u001b[39;00m x == DB.placeholder_na \u001b[38;5;28;01melse\u001b[39;00m x \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m raw]\n\u001b[32m 499\u001b[39m data_dict[col] = restored\n", "\u001b[31mKeyboardInterrupt\u001b[39m: " ] }, { "data": { "text/html": [ "
\n", "Click to show build logs\n", "
2026-01-10 16:43:06 INFO:graph_maker: The graph is being constructed: /Users/kemalinecik/git_nosync/master_idtrack/idtrack/docs/_notebooks/idtrack_cache/graph_homo_sapiens_min76_max115_narrow.pickle\n",
       "2026-01-10 16:43:06 INFO:graph_maker: Graph is being created: gene\n",
       "2026-01-10 16:43:08 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_versioninfo_gene`\n",
       "2026-01-10 16:43:57 INFO:database_manager: Raw table for `stable_id_event` on ensembl release `115` was downloaded.\n",
       "2026-01-10 16:43:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_mysql_stable_id_event`\n",
       "2026-01-10 16:45:07 INFO:database_manager: Raw table for `mapping_session` on ensembl release `115` was downloaded.\n",
       "2026-01-10 16:45:07 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_mysql_mapping_session`\n",
       "2026-01-10 16:45:10 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_idhistory_narrow_gene`\n",
       "2026-01-10 16:45:14 WARNING:graph_maker: Edge weights ignored due to duplicate entries: 2.\n",
       "2026-01-10 16:45:14 INFO:graph_maker: Edges between across different IDs and self loops are being added.\n",
       "2026-01-10 16:45:16 INFO:graph_maker: Edges between the same IDs are being added.\n",
       "2026-01-10 16:45:34 WARNING:graph_maker: Retired ID come alive again: 3.\n",
       "2026-01-10 16:45:34 INFO:graph_maker: Edges showing the retirement of IDs are being added.\n",
       "2026-01-10 16:45:40 INFO:graph_maker: Problematic nodes in Ensembl ID history are being removed.\n",
       "2026-01-10 16:45:46 WARNING:graph_maker: Nodes are deleted due to Ensembl ID history mistake: 3.\n",
       "2026-01-10 16:45:46 INFO:graph_maker: Self-loops for latest release entries are being added.\n",
       "2026-01-10 16:45:49 INFO:graph_maker: Node attributes are being added.\n",
       "2026-01-10 16:45:50 INFO:graph_maker: Graph is being created: transcript\n",
       "2026-01-10 16:45:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_versioninfo_transcript`\n",
       "2026-01-10 16:46:10 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_idhistory_narrow_transcript`\n",
       "2026-01-10 16:46:27 INFO:graph_maker: Edges between across different IDs and self loops are being added.\n",
       "2026-01-10 16:46:33 INFO:graph_maker: Edges between the same IDs are being added.\n",
       "2026-01-10 16:48:00 WARNING:graph_maker: Retired ID come alive again: 5.\n",
       "2026-01-10 16:48:00 INFO:graph_maker: Edges showing the retirement of IDs are being added.\n",
       "2026-01-10 16:48:25 INFO:graph_maker: Problematic nodes in Ensembl ID history are being removed.\n",
       "2026-01-10 16:48:47 INFO:graph_maker: Self-loops for latest release entries are being added.\n",
       "2026-01-10 16:49:02 INFO:graph_maker: Node attributes are being added.\n",
       "2026-01-10 16:49:11 INFO:graph_maker: Graph is being created: translation\n",
       "2026-01-10 16:49:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_versioninfo_translation`\n",
       "2026-01-10 16:49:23 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_idhistory_narrow_translation`\n",
       "2026-01-10 16:49:26 WARNING:graph_maker: Edge weights ignored due to duplicate entries: 4.\n",
       "2026-01-10 16:49:26 INFO:graph_maker: Edges between across different IDs and self loops are being added.\n",
       "2026-01-10 16:49:27 INFO:graph_maker: Edges between the same IDs are being added.\n",
       "2026-01-10 16:49:54 WARNING:graph_maker: Retired ID come alive again: 4.\n",
       "2026-01-10 16:49:54 INFO:graph_maker: Edges showing the retirement of IDs are being added.\n",
       "2026-01-10 16:50:04 INFO:graph_maker: Problematic nodes in Ensembl ID history are being removed.\n",
       "2026-01-10 16:50:14 WARNING:graph_maker: Nodes are deleted due to Ensembl ID history mistake: 1.\n",
       "2026-01-10 16:50:14 INFO:graph_maker: Self-loops for latest release entries are being added.\n",
       "2026-01-10 16:50:21 INFO:graph_maker: Node attributes are being added.\n",
       "2026-01-10 16:50:24 WARNING:graph_maker: Intersecting Ensembl nodes: Nodes in 'transcript' will be replaced by 'translation': 'ENST00000515292.1'.\n",
       "2026-01-10 16:50:28 INFO:graph_maker: Establishing connection between different forms.\n",
       "2026-01-10 16:50:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens76_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:50:32 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens76_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:50:35 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens76_common_relationcurrent`\n",
       "2026-01-10 16:50:49 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens77_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:50:52 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens77_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:50:55 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens77_common_relationcurrent`\n",
       "2026-01-10 16:51:05 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens78_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:51:09 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens78_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:51:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens78_common_relationcurrent`\n",
       "2026-01-10 16:51:20 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens79_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:51:23 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens79_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:51:25 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens79_common_relationcurrent`\n",
       "2026-01-10 16:51:36 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens80_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:51:40 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens80_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:51:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens80_common_relationcurrent`\n",
       "2026-01-10 16:51:51 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens81_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:51:55 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens81_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:51:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens81_common_relationcurrent`\n",
       "2026-01-10 16:52:09 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens82_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:52:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens82_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:52:15 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens82_common_relationcurrent`\n",
       "2026-01-10 16:52:24 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens83_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:52:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens83_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:52:30 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens83_common_relationcurrent`\n",
       "2026-01-10 16:52:39 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens84_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:52:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens84_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:52:45 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens84_common_relationcurrent`\n",
       "2026-01-10 16:52:55 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens85_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:52:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens85_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:53:01 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens85_common_relationcurrent`\n",
       "2026-01-10 16:53:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens86_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:53:15 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens86_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:53:17 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens86_common_relationcurrent`\n",
       "2026-01-10 16:53:26 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens87_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:53:30 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens87_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:53:32 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens87_common_relationcurrent`\n",
       "2026-01-10 16:53:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens88_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:53:45 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens88_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:53:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens88_common_relationcurrent`\n",
       "2026-01-10 16:53:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens89_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:54:01 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens89_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:54:03 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens89_common_relationcurrent`\n",
       "2026-01-10 16:54:12 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens90_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:54:16 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens90_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:54:18 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens90_common_relationcurrent`\n",
       "2026-01-10 16:54:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens91_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:54:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens91_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:54:33 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens91_common_relationcurrent`\n",
       "2026-01-10 16:54:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens92_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:54:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens92_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:54:49 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens92_common_relationcurrent`\n",
       "2026-01-10 16:54:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens93_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:55:02 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens93_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:55:05 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens93_common_relationcurrent`\n",
       "2026-01-10 16:55:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens94_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:55:18 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens94_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:55:20 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens94_common_relationcurrent`\n",
       "2026-01-10 16:55:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens95_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:55:34 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens95_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:55:37 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens95_common_relationcurrent`\n",
       "2026-01-10 16:55:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens96_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:55:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens96_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:55:53 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens96_common_relationcurrent`\n",
       "2026-01-10 16:56:03 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens97_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:56:06 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens97_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:56:08 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens97_common_relationcurrent`\n",
       "2026-01-10 16:56:21 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens98_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:56:24 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens98_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:56:27 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens98_common_relationcurrent`\n",
       "2026-01-10 16:56:36 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens99_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:56:39 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens99_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:56:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens99_common_relationcurrent`\n",
       "2026-01-10 16:56:51 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens100_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:56:54 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens100_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:56:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens100_common_relationcurrent`\n",
       "2026-01-10 16:57:08 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens101_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:57:10 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens101_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:57:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens101_common_relationcurrent`\n",
       "2026-01-10 16:57:23 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens102_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:57:27 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens102_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:57:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens102_common_relationcurrent`\n",
       "2026-01-10 16:57:40 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens103_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:57:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens103_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:57:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens103_common_relationcurrent`\n",
       "2026-01-10 16:57:58 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens104_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:58:01 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens104_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:58:04 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens104_common_relationcurrent`\n",
       "2026-01-10 16:58:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens105_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:58:18 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens105_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:58:20 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens105_common_relationcurrent`\n",
       "2026-01-10 16:58:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens106_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:58:35 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens106_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:58:38 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens106_common_relationcurrent`\n",
       "2026-01-10 16:58:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens107_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:58:53 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens107_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:58:56 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens107_common_relationcurrent`\n",
       "2026-01-10 16:59:08 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens108_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:59:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens108_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:59:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens108_common_relationcurrent`\n",
       "2026-01-10 16:59:25 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens109_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:59:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens109_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:59:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens109_common_relationcurrent`\n",
       "2026-01-10 16:59:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens110_processed_idsraw_transcript_gene`\n",
       "2026-01-10 16:59:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens110_processed_idsraw_translation_gene`\n",
       "2026-01-10 16:59:49 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens110_common_relationcurrent`\n",
       "2026-01-10 17:00:01 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens111_processed_idsraw_transcript_gene`\n",
       "2026-01-10 17:00:05 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens111_processed_idsraw_translation_gene`\n",
       "2026-01-10 17:00:07 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens111_common_relationcurrent`\n",
       "2026-01-10 17:00:19 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens112_processed_idsraw_transcript_gene`\n",
       "2026-01-10 17:00:23 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens112_processed_idsraw_translation_gene`\n",
       "2026-01-10 17:00:25 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens112_common_relationcurrent`\n",
       "2026-01-10 17:00:38 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens113_processed_idsraw_transcript_gene`\n",
       "2026-01-10 17:00:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens113_processed_idsraw_translation_gene`\n",
       "2026-01-10 17:00:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens113_common_relationcurrent`\n",
       "2026-01-10 17:01:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens114_processed_idsraw_transcript_gene`\n",
       "2026-01-10 17:01:17 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens114_processed_idsraw_translation_gene`\n",
       "2026-01-10 17:01:20 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens114_common_relationcurrent`\n",
       "2026-01-10 17:01:39 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_idsraw_transcript_gene`\n",
       "2026-01-10 17:01:48 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_idsraw_translation_gene`\n",
       "2026-01-10 17:01:53 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_common_relationcurrent`\n",
       "2026-01-10 17:02:18 INFO:graph_maker: Edges between external IDs to Ensembl IDs is being added for 'gene'.\n",
       "2026-01-10 17:02:33 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens76_processed_external_relevant_gene`\n",
       "2026-01-10 17:03:40 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\\\').\n",
       "2026-01-10 17:03:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens76_processed_external_relevant_gene`\n",
       "2026-01-10 17:04:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens77_processed_external_relevant_gene`\n",
       "2026-01-10 17:04:59 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\\\').\n",
       "2026-01-10 17:05:04 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens77_processed_external_relevant_gene`\n",
       "2026-01-10 17:05:52 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens78_processed_external_relevant_gene`\n",
       "2026-01-10 17:06:08 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\\\').\n",
       "2026-01-10 17:06:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens78_processed_external_relevant_gene`\n",
       "2026-01-10 17:06:51 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens79_processed_external_relevant_gene`\n",
       "2026-01-10 17:07:12 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens79_processed_external_relevant_gene`\n",
       "2026-01-10 17:07:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens80_processed_external_relevant_gene`\n",
       "2026-01-10 17:08:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens80_processed_external_relevant_gene`\n",
       "2026-01-10 17:08:48 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens81_processed_external_relevant_gene`\n",
       "2026-01-10 17:09:08 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens81_processed_external_relevant_gene`\n",
       "2026-01-10 17:09:46 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens82_processed_external_relevant_gene`\n",
       "2026-01-10 17:10:06 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens82_processed_external_relevant_gene`\n",
       "2026-01-10 17:10:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens83_processed_external_relevant_gene`\n",
       "2026-01-10 17:11:04 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens83_processed_external_relevant_gene`\n",
       "2026-01-10 17:11:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens84_processed_external_relevant_gene`\n",
       "2026-01-10 17:12:02 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens84_processed_external_relevant_gene`\n",
       "2026-01-10 17:12:40 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens85_processed_external_relevant_gene`\n",
       "2026-01-10 17:13:00 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens85_processed_external_relevant_gene`\n",
       "2026-01-10 17:13:40 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens86_processed_external_relevant_gene`\n",
       "2026-01-10 17:14:00 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens86_processed_external_relevant_gene`\n",
       "2026-01-10 17:14:39 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens87_processed_external_relevant_gene`\n",
       "2026-01-10 17:14:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens87_processed_external_relevant_gene`\n",
       "2026-01-10 17:15:38 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens88_processed_external_relevant_gene`\n",
       "2026-01-10 17:15:56 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens88_processed_external_relevant_gene`\n",
       "2026-01-10 17:16:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens89_processed_external_relevant_gene`\n",
       "2026-01-10 17:16:45 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens89_processed_external_relevant_gene`\n",
       "2026-01-10 17:17:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens90_processed_external_relevant_gene`\n",
       "2026-01-10 17:17:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens90_processed_external_relevant_gene`\n",
       "2026-01-10 17:17:56 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens91_processed_external_relevant_gene`\n",
       "2026-01-10 17:18:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens91_processed_external_relevant_gene`\n",
       "2026-01-10 17:18:39 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens92_processed_external_relevant_gene`\n",
       "2026-01-10 17:18:49 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\\\').\n",
       "2026-01-10 17:18:53 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens92_processed_external_relevant_gene`\n",
       "2026-01-10 17:19:21 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens93_processed_external_relevant_gene`\n",
       "2026-01-10 17:19:36 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens93_processed_external_relevant_gene`\n",
       "2026-01-10 17:20:03 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens94_processed_external_relevant_gene`\n",
       "2026-01-10 17:20:18 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens94_processed_external_relevant_gene`\n",
       "2026-01-10 17:20:46 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens95_processed_external_relevant_gene`\n",
       "2026-01-10 17:21:01 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens95_processed_external_relevant_gene`\n",
       "2026-01-10 17:21:30 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens96_processed_external_relevant_gene`\n",
       "2026-01-10 17:21:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens96_processed_external_relevant_gene`\n",
       "2026-01-10 17:22:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens97_processed_external_relevant_gene`\n",
       "2026-01-10 17:22:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens97_processed_external_relevant_gene`\n",
       "2026-01-10 17:22:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens98_processed_external_relevant_gene`\n",
       "2026-01-10 17:23:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens98_processed_external_relevant_gene`\n",
       "2026-01-10 17:23:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens99_processed_external_relevant_gene`\n",
       "2026-01-10 17:23:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens99_processed_external_relevant_gene`\n",
       "2026-01-10 17:24:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens100_processed_external_relevant_gene`\n",
       "2026-01-10 17:24:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens100_processed_external_relevant_gene`\n",
       "2026-01-10 17:25:12 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens101_processed_external_relevant_gene`\n",
       "2026-01-10 17:25:26 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens101_processed_external_relevant_gene`\n",
       "2026-01-10 17:25:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens102_processed_external_relevant_gene`\n",
       "2026-01-10 17:26:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens102_processed_external_relevant_gene`\n",
       "2026-01-10 17:26:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens103_processed_external_relevant_gene`\n",
       "2026-01-10 17:26:58 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens103_processed_external_relevant_gene`\n",
       "2026-01-10 17:27:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens104_processed_external_relevant_gene`\n",
       "2026-01-10 17:27:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens104_processed_external_relevant_gene`\n",
       "2026-01-10 17:28:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens105_processed_external_relevant_gene`\n",
       "2026-01-10 17:28:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens105_processed_external_relevant_gene`\n",
       "2026-01-10 17:29:32 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens106_processed_external_relevant_gene`\n",
       "2026-01-10 17:29:46 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens106_processed_external_relevant_gene`\n",
       "2026-01-10 17:30:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens107_processed_external_relevant_gene`\n",
       "2026-01-10 17:30:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens107_processed_external_relevant_gene`\n",
       "2026-01-10 17:30:58 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens108_processed_external_relevant_gene`\n",
       "2026-01-10 17:31:12 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens108_processed_external_relevant_gene`\n",
       "2026-01-10 17:31:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens109_processed_external_relevant_gene`\n",
       "2026-01-10 17:31:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens109_processed_external_relevant_gene`\n",
       "2026-01-10 17:32:26 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens110_processed_external_relevant_gene`\n",
       "2026-01-10 17:32:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens110_processed_external_relevant_gene`\n",
       "2026-01-10 17:33:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens111_processed_external_relevant_gene`\n",
       "2026-01-10 17:33:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens111_processed_external_relevant_gene`\n",
       "2026-01-10 17:33:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens112_processed_external_relevant_gene`\n",
       "2026-01-10 17:34:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens112_processed_external_relevant_gene`\n",
       "2026-01-10 17:34:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens113_processed_external_relevant_gene`\n",
       "2026-01-10 17:34:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens113_processed_external_relevant_gene`\n",
       "2026-01-10 17:35:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens114_processed_external_relevant_gene`\n",
       "2026-01-10 17:35:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens114_processed_external_relevant_gene`\n",
       "2026-01-10 17:36:15 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_external_relevant_gene`\n",
       "2026-01-10 17:36:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens115_processed_external_relevant_gene`\n",
       "2026-01-10 17:36:50 WARNING:graph_maker: New nodes added as assembly nodes: 28738\n",
       "2026-01-10 17:36:50 INFO:graph_maker: Edges between external IDs to Ensembl IDs is being added for 'transcript'.\n",
       "2026-01-10 17:37:07 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens76_processed_external_relevant_transcript`\n",
       "2026-01-10 17:37:20 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\\\').\n",
       "2026-01-10 17:37:32 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens76_processed_external_relevant_transcript`\n",
       "2026-01-10 17:38:12 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens77_processed_external_relevant_transcript`\n",
       "2026-01-10 17:38:25 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\\\').\n",
       "2026-01-10 17:38:34 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens77_processed_external_relevant_transcript`\n",
       "2026-01-10 17:39:30 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens78_processed_external_relevant_transcript`\n",
       "2026-01-10 17:39:43 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\\\').\n",
       "2026-01-10 17:39:52 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens78_processed_external_relevant_transcript`\n",
       "2026-01-10 17:40:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens79_processed_external_relevant_transcript`\n",
       "2026-01-10 17:40:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens79_processed_external_relevant_transcript`\n",
       "2026-01-10 17:41:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens80_processed_external_relevant_transcript`\n",
       "2026-01-10 17:41:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens80_processed_external_relevant_transcript`\n",
       "2026-01-10 17:42:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens81_processed_external_relevant_transcript`\n",
       "2026-01-10 17:42:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens81_processed_external_relevant_transcript`\n",
       "2026-01-10 17:43:30 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens82_processed_external_relevant_transcript`\n",
       "2026-01-10 17:43:53 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens82_processed_external_relevant_transcript`\n",
       "2026-01-10 17:44:32 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens83_processed_external_relevant_transcript`\n",
       "2026-01-10 17:44:55 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens83_processed_external_relevant_transcript`\n",
       "2026-01-10 17:45:35 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens84_processed_external_relevant_transcript`\n",
       "2026-01-10 17:46:00 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens84_processed_external_relevant_transcript`\n",
       "Traceback (most recent call last):\n",
       "  File "/Users/kemalinecik/tools/apps/mamba/envs/idtrack_dev_env/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3670, in run_code\n",
       "    exec(code_obj, self.user_global_ns, self.user_ns)\n",
       "  File "/var/folders/y7/7c17s0l57szdjc1cdc9dmnpm0000gn/T/ipykernel_5077/295636214.py", line 5, in <module>\n",
       "    api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_api.py", line 213, in build_graph\n",
       "    self.track = Track(dm)\n",
       "                 ^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_track.py", line 83, in __init__\n",
       "    self.graph = graph_creator.get_graph(**kwargs)\n",
       "                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py", line 1086, in get_graph\n",
       "    g = self.construct_graph(narrow=narrow, form_list=form_list, narrow_external=narrow_external)\n",
       "        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py", line 224, in construct_graph\n",
       "    rc = db_manager.create_external_all(return_mode="all", narrow_external=narrow_external)\n",
       "         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py", line 2279, in create_external_all\n",
       "    df_temp = dm.get_db(df_indicator)\n",
       "              ^^^^^^^^^^^^^^^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py", line 2484, in get_db\n",
       "    df = self.create_external_db(filter_mode=param1_ind)\n",
       "         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py", line 1990, in create_external_db\n",
       "    x = self.get_table("xref", usecols=["xref_id", "external_db_id", "dbprimary_acc", "display_label"], **m)\n",
       "        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py", line 1104, in get_table\n",
       "    df = hs.read_exported(hierarchy, file_path)\n",
       "         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py", line 771, in read_exported\n",
       "    df = read_hdf(path=file_path, key=hierarchy, mode="r")\n",
       "         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py", line 100, in read_hdf\n",
       "    df = _load_column_data(grp, columns, dtypes)\n",
       "         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py", line 497, in _load_column_data\n",
       "    raw = [x.decode(DB.UTF8) for x in raw]\n",
       "          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
       "  File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py", line 497, in <listcomp>\n",
       "    raw = [x.decode(DB.UTF8) for x in raw]\n",
       "           ^^^^^^^^^^^^^^^^^\n",
       "KeyboardInterrupt
\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "ename": "KeyboardInterrupt", "evalue": "", "output_type": "error", "traceback": [ "\u001b[31m---------------------------------------------------------------------------\u001b[39m", "\u001b[31mKeyboardInterrupt\u001b[39m Traceback (most recent call last)", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[5]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mget_ipython\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m.\u001b[49m\u001b[43mrun_cell_magic\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mcollapse\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mClick to show build logs\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43m# Included for tutorial purposes only.\u001b[39;49m\u001b[38;5;130;43;01m\\n\u001b[39;49;00m\u001b[38;5;130;43;01m\\n\u001b[39;49;00m\u001b[33;43m# Build (or load) the graph snapshot\u001b[39;49m\u001b[38;5;130;43;01m\\n\u001b[39;49;00m\u001b[33;43m# - calculate_caches=True speeds up later queries (slower build, faster use).\u001b[39;49m\u001b[38;5;130;43;01m\\n\u001b[39;49;00m\u001b[33;43mapi.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)\u001b[39;49m\u001b[38;5;130;43;01m\\n\u001b[39;49;00m\u001b[33;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n", "\u001b[36mFile \u001b[39m\u001b[32m~/tools/apps/mamba/envs/idtrack_dev_env/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2547\u001b[39m, in \u001b[36mInteractiveShell.run_cell_magic\u001b[39m\u001b[34m(self, magic_name, line, cell)\u001b[39m\n\u001b[32m 2545\u001b[39m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mself\u001b[39m.builtin_trap:\n\u001b[32m 2546\u001b[39m args = (magic_arg_s, cell)\n\u001b[32m-> \u001b[39m\u001b[32m2547\u001b[39m result = \u001b[43mfn\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 2549\u001b[39m \u001b[38;5;66;03m# The code below prevents the output from being displayed\u001b[39;00m\n\u001b[32m 2550\u001b[39m \u001b[38;5;66;03m# when using magics with decorator @output_can_be_silenced\u001b[39;00m\n\u001b[32m 2551\u001b[39m \u001b[38;5;66;03m# when the last Python token in the expression is a ';'.\u001b[39;00m\n\u001b[32m 2552\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mgetattr\u001b[39m(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, \u001b[38;5;28;01mFalse\u001b[39;00m):\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/docs/_notebooks/_notebook_utils.py:376\u001b[39m, in \u001b[36mCollapseMagics.collapse\u001b[39m\u001b[34m(self, line, cell)\u001b[39m\n\u001b[32m 374\u001b[39m err = \u001b[38;5;28mgetattr\u001b[39m(exec_result, \u001b[33m\"\u001b[39m\u001b[33merror_before_exec\u001b[39m\u001b[33m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m) \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mgetattr\u001b[39m(exec_result, \u001b[33m\"\u001b[39m\u001b[33merror_in_exec\u001b[39m\u001b[33m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m)\n\u001b[32m 375\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m err \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m376\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m err\n", " \u001b[31m[... skipping hidden 1 frame]\u001b[39m\n", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[5]\u001b[39m\u001b[32m, line 5\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;66;03m# Included for tutorial purposes only.\u001b[39;00m\n\u001b[32m 2\u001b[39m \n\u001b[32m 3\u001b[39m \u001b[38;5;66;03m# Build (or load) the graph snapshot\u001b[39;00m\n\u001b[32m 4\u001b[39m \u001b[38;5;66;03m# - calculate_caches=True speeds up later queries (slower build, faster use).\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m5\u001b[39m \u001b[43mapi\u001b[49m\u001b[43m.\u001b[49m\u001b[43mbuild_graph\u001b[49m\u001b[43m(\u001b[49m\u001b[43morganism_name\u001b[49m\u001b[43m=\u001b[49m\u001b[43morganism\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msnapshot_release\u001b[49m\u001b[43m=\u001b[49m\u001b[43mSNAPSHOT_RELEASE\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcalculate_caches\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_api.py:213\u001b[39m, in \u001b[36mAPI.build_graph\u001b[39m\u001b[34m(self, organism_name, snapshot_release, genome_assembly, return_test, calculate_caches)\u001b[39m\n\u001b[32m 211\u001b[39m \u001b[38;5;28mself\u001b[39m.track = TrackTests(dm)\n\u001b[32m 212\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m213\u001b[39m \u001b[38;5;28mself\u001b[39m.track = \u001b[43mTrack\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdm\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 215\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m calculate_caches \u001b[38;5;129;01mand\u001b[39;00m return_test:\n\u001b[32m 216\u001b[39m \u001b[38;5;28mself\u001b[39m.calculate_graph_caches(for_test=\u001b[38;5;28;01mTrue\u001b[39;00m)\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_track.py:83\u001b[39m, in \u001b[36mTrack.__init__\u001b[39m\u001b[34m(self, db_manager, **kwargs)\u001b[39m\n\u001b[32m 80\u001b[39m graph_creator = GraphMaker(\u001b[38;5;28mself\u001b[39m.db_manager)\n\u001b[32m 82\u001b[39m \u001b[38;5;66;03m# Calculate/Load the graph\u001b[39;00m\n\u001b[32m---> \u001b[39m\u001b[32m83\u001b[39m \u001b[38;5;28mself\u001b[39m.graph = \u001b[43mgraph_creator\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget_graph\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 84\u001b[39m \u001b[38;5;28mself\u001b[39m.version_info = \u001b[38;5;28mself\u001b[39m.graph.graph[\u001b[33m\"\u001b[39m\u001b[33mversion_info\u001b[39m\u001b[33m\"\u001b[39m]\n\u001b[32m 85\u001b[39m \u001b[38;5;28mself\u001b[39m._external_entrance_placeholder = {\u001b[38;5;28;01mFalse\u001b[39;00m: -\u001b[32m1\u001b[39m, \u001b[38;5;28;01mTrue\u001b[39;00m: \u001b[32m10001\u001b[39m}\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py:1086\u001b[39m, in \u001b[36mGraphMaker.get_graph\u001b[39m\u001b[34m(self, narrow, create_even_if_exist, save_after_calculation, overwrite_even_if_exist, form_list, narrow_external)\u001b[39m\n\u001b[32m 1084\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m os.access(file_path, os.R_OK) \u001b[38;5;129;01mor\u001b[39;00m create_even_if_exist:\n\u001b[32m 1085\u001b[39m \u001b[38;5;28mself\u001b[39m.log.info(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mThe graph is being constructed: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mfile_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m)\n\u001b[32m-> \u001b[39m\u001b[32m1086\u001b[39m g = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mconstruct_graph\u001b[49m\u001b[43m(\u001b[49m\u001b[43mnarrow\u001b[49m\u001b[43m=\u001b[49m\u001b[43mnarrow\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mform_list\u001b[49m\u001b[43m=\u001b[49m\u001b[43mform_list\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mnarrow_external\u001b[49m\u001b[43m=\u001b[49m\u001b[43mnarrow_external\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1087\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m: \u001b[38;5;66;03m# Otherwise, just read the file that is already in the directory.\u001b[39;00m\n\u001b[32m 1088\u001b[39m \u001b[38;5;28mself\u001b[39m.log.info(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mThe graph is being read: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mfile_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m)\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py:224\u001b[39m, in \u001b[36mGraphMaker.construct_graph\u001b[39m\u001b[34m(self, narrow, form_list, narrow_external)\u001b[39m\n\u001b[32m 219\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m ens_rel \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28msorted\u001b[39m(\u001b[38;5;28mself\u001b[39m.db_manager.available_releases):\n\u001b[32m 220\u001b[39m \u001b[38;5;66;03m# the order is important in adding new nodes into the core graph.\u001b[39;00m\n\u001b[32m 221\u001b[39m \u001b[38;5;66;03m# it is important to capture correct ens_release in min_ens_release dictionary\u001b[39;00m\n\u001b[32m 223\u001b[39m db_manager = dbman_s[f].change_release(ens_rel)\n\u001b[32m--> \u001b[39m\u001b[32m224\u001b[39m rc = \u001b[43mdb_manager\u001b[49m\u001b[43m.\u001b[49m\u001b[43mcreate_external_all\u001b[49m\u001b[43m(\u001b[49m\u001b[43mreturn_mode\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mall\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mnarrow_external\u001b[49m\u001b[43m=\u001b[49m\u001b[43mnarrow_external\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 226\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m _ind, entry \u001b[38;5;129;01min\u001b[39;00m rc.iterrows():\n\u001b[32m 227\u001b[39m \u001b[38;5;66;03m# Note that the `rc` dataframe have higher priority assembly entries at the top.\u001b[39;00m\n\u001b[32m 229\u001b[39m e1, e2 = entry[\u001b[33m\"\u001b[39m\u001b[33mgraph_id\u001b[39m\u001b[33m\"\u001b[39m], entry[\u001b[33m\"\u001b[39m\u001b[33mid_db\u001b[39m\u001b[33m\"\u001b[39m]\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:2279\u001b[39m, in \u001b[36mDatabaseManager.create_external_all\u001b[39m\u001b[34m(self, return_mode, narrow_external)\u001b[39m\n\u001b[32m 2277\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m:\n\u001b[32m 2278\u001b[39m \u001b[38;5;28;01mcontinue\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m2279\u001b[39m df_temp = \u001b[43mdm\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget_db\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdf_indicator\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 2280\u001b[39m df_temp[\u001b[33m\"\u001b[39m\u001b[33massembly\u001b[39m\u001b[33m\"\u001b[39m] = i\n\u001b[32m 2281\u001b[39m df = pd.concat([df, df_temp])\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:2484\u001b[39m, in \u001b[36mDatabaseManager.get_db\u001b[39m\u001b[34m(self, df_indicator, create_even_if_exist, save_after_calculation, overwrite_even_if_exist)\u001b[39m\n\u001b[32m 2481\u001b[39m df = \u001b[38;5;28mself\u001b[39m.create_external_db(filter_mode=\u001b[33m\"\u001b[39m\u001b[33mall\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m 2483\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m main_ind == \u001b[33m\"\u001b[39m\u001b[33mexternal\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;129;01mand\u001b[39;00m param1_ind \u001b[38;5;129;01min\u001b[39;00m [\u001b[33m\"\u001b[39m\u001b[33mrelevant\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mdatabase\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mrelevant-database\u001b[39m\u001b[33m\"\u001b[39m]:\n\u001b[32m-> \u001b[39m\u001b[32m2484\u001b[39m df = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mcreate_external_db\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilter_mode\u001b[49m\u001b[43m=\u001b[49m\u001b[43mparam1_ind\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 2486\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m main_ind == \u001b[33m\"\u001b[39m\u001b[33midsraw\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m 2487\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m param1_ind \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m.available_form_of_interests:\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:1990\u001b[39m, in \u001b[36mDatabaseManager.create_external_db\u001b[39m\u001b[34m(self, filter_mode)\u001b[39m\n\u001b[32m 1986\u001b[39m a = \u001b[38;5;28mself\u001b[39m.get_db(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33midsraw_\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m.form\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m, save_after_calculation=\u001b[38;5;28mself\u001b[39m.store_raw_always)\n\u001b[32m 1987\u001b[39m ox = \u001b[38;5;28mself\u001b[39m.get_table(\n\u001b[32m 1988\u001b[39m \u001b[33m\"\u001b[39m\u001b[33mobject_xref\u001b[39m\u001b[33m\"\u001b[39m, usecols=[\u001b[33m\"\u001b[39m\u001b[33mensembl_id\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mensembl_object_type\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mxref_id\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mobject_xref_id\u001b[39m\u001b[33m\"\u001b[39m], **m\n\u001b[32m 1989\u001b[39m )\n\u001b[32m-> \u001b[39m\u001b[32m1990\u001b[39m x = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mget_table\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mxref\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43musecols\u001b[49m\u001b[43m=\u001b[49m\u001b[43m[\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mxref_id\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mexternal_db_id\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mdbprimary_acc\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mdisplay_label\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mm\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1991\u001b[39m ed = \u001b[38;5;28mself\u001b[39m.get_table(\u001b[33m\"\u001b[39m\u001b[33mexternal_db\u001b[39m\u001b[33m\"\u001b[39m, usecols=[\u001b[33m\"\u001b[39m\u001b[33mexternal_db_id\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mdb_name\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mdb_display_name\u001b[39m\u001b[33m\"\u001b[39m], **m)\n\u001b[32m 1992\u001b[39m ix = \u001b[38;5;28mself\u001b[39m.get_table(\u001b[33m\"\u001b[39m\u001b[33midentity_xref\u001b[39m\u001b[33m\"\u001b[39m, usecols=[\u001b[33m\"\u001b[39m\u001b[33mensembl_identity\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mxref_identity\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mobject_xref_id\u001b[39m\u001b[33m\"\u001b[39m], **m)\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:1104\u001b[39m, in \u001b[36mDatabaseManager.get_table\u001b[39m\u001b[34m(self, table_key, usecols, create_even_if_exist, save_after_calculation, overwrite_even_if_exist)\u001b[39m\n\u001b[32m 1102\u001b[39m df = \u001b[38;5;28mself\u001b[39m.download_table(table_key, usecols)\n\u001b[32m 1103\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m: \u001b[38;5;66;03m# Otherwise, just read the file that is already in the directory.\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1104\u001b[39m df = \u001b[43mhs\u001b[49m\u001b[43m.\u001b[49m\u001b[43mread_exported\u001b[49m\u001b[43m(\u001b[49m\u001b[43mhierarchy\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mfile_path\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1106\u001b[39m \u001b[38;5;66;03m# If prompt, save the dataframe in requested format.\u001b[39;00m\n\u001b[32m 1107\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m save_after_calculation:\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:771\u001b[39m, in \u001b[36mread_exported\u001b[39m\u001b[34m(hierarchy, file_path)\u001b[39m\n\u001b[32m 768\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m check_h5_key(file_path, hierarchy):\n\u001b[32m 769\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mKey \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mhierarchy\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[33m not found in HDF5 file \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mfile_path\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[33m.\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m--> \u001b[39m\u001b[32m771\u001b[39m df = \u001b[43mread_hdf\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m=\u001b[49m\u001b[43mfile_path\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkey\u001b[49m\u001b[43m=\u001b[49m\u001b[43mhierarchy\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmode\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mr\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[32m 772\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m df\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:100\u001b[39m, in \u001b[36mread_hdf\u001b[39m\u001b[34m(path, key, mode)\u001b[39m\n\u001b[32m 97\u001b[39m index, index_names = _load_index_data(grp)\n\u001b[32m 99\u001b[39m \u001b[38;5;66;03m# Load column data\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m100\u001b[39m df = \u001b[43m_load_column_data\u001b[49m\u001b[43m(\u001b[49m\u001b[43mgrp\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcolumns\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtypes\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 102\u001b[39m \u001b[38;5;66;03m# Set index and metadata\u001b[39;00m\n\u001b[32m 103\u001b[39m df.index = index\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:497\u001b[39m, in \u001b[36m_load_column_data\u001b[39m\u001b[34m(grp, columns, dtypes)\u001b[39m\n\u001b[32m 495\u001b[39m raw = data_grp[col_key][()]\n\u001b[32m 496\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(raw[\u001b[32m0\u001b[39m], \u001b[38;5;28mbytes\u001b[39m):\n\u001b[32m--> \u001b[39m\u001b[32m497\u001b[39m raw = \u001b[43m[\u001b[49m\u001b[43mx\u001b[49m\u001b[43m.\u001b[49m\u001b[43mdecode\u001b[49m\u001b[43m(\u001b[49m\u001b[43mDB\u001b[49m\u001b[43m.\u001b[49m\u001b[43mUTF8\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mfor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mx\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01min\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mraw\u001b[49m\u001b[43m]\u001b[49m\n\u001b[32m 498\u001b[39m restored = [pd.NA \u001b[38;5;28;01mif\u001b[39;00m x == DB.placeholder_na \u001b[38;5;28;01melse\u001b[39;00m x \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m raw]\n\u001b[32m 499\u001b[39m data_dict[col] = restored\n", "\u001b[36mFile \u001b[39m\u001b[32m~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:497\u001b[39m, in \u001b[36m\u001b[39m\u001b[34m(.0)\u001b[39m\n\u001b[32m 495\u001b[39m raw = data_grp[col_key][()]\n\u001b[32m 496\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(raw[\u001b[32m0\u001b[39m], \u001b[38;5;28mbytes\u001b[39m):\n\u001b[32m--> \u001b[39m\u001b[32m497\u001b[39m raw = [x.decode(DB.UTF8) \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m raw]\n\u001b[32m 498\u001b[39m restored = [pd.NA \u001b[38;5;28;01mif\u001b[39;00m x == DB.placeholder_na \u001b[38;5;28;01melse\u001b[39;00m x \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m raw]\n\u001b[32m 499\u001b[39m data_dict[col] = restored\n", "\u001b[31mKeyboardInterrupt\u001b[39m: " ] } ], "source": [ "%%collapse Click to show build logs\n", "# Included for tutorial purposes only.\n", "\n", "# Build (or load) the graph snapshot\n", "# - calculate_caches=True speeds up later queries (slower build, faster use).\n", "api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "defa075f", "metadata": {}, "outputs": [], "source": [ "# Quick inspection\n", "g = api.track.graph\n", "print('Organism:', g.graph.get('organism'))\n", "print('Snapshot release:', g.graph.get('ensembl_release'))\n", "print('Main assembly:', g.graph.get('genome_assembly'))\n", "print('Assemblies in this graph:', sorted(api.list_genome_assemblies()))\n", "print('Nodes:', g.number_of_nodes())\n", "print('Edges:', g.number_of_edges())\n", "\n", "aed = sorted(getattr(g, 'available_external_databases', []))\n", "print('External DBs enabled (count):', len(aed))\n", "print('External DBs (first 20):', aed[:20])\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ab4d624a", "metadata": {}, "outputs": [], "source": [ "# Where is the graph file stored?\n", "sorted(LOCAL_REPOSITORY.glob('graph_homo_sapiens*.pickle'))[-5:]\n" ] }, { "cell_type": "markdown", "id": "cd4a8418", "metadata": {}, "source": [ "### 3.2 — Mouse graph initialization (clean handoff)\n", "\n", "Mouse is a clean-handoff species (one maintained assembly per release: GRCm37 → GRCm38 → GRCm39). Older assemblies mainly matter for\n", "legacy datasets and archive releases; you typically do not have overlapping assemblies within the same release.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a65d177c", "metadata": {}, "outputs": [], "source": [ "organism, latest_release = api.resolve_organism('mus musculus')\n", "SNAPSHOT_RELEASE = latest_release\n", "organism, SNAPSHOT_RELEASE\n" ] }, { "cell_type": "code", "execution_count": null, "id": "1636a3c4", "metadata": {}, "outputs": [], "source": [ "# Build (or load) the mouse graph snapshot\n", "if HAS_MOUSE_YAML:\n", " api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)\n", "else:\n", " print('Skipping mouse build: mus_musculus_externals_modified.yml is missing (run Part 2 first).')\n" ] }, { "cell_type": "code", "execution_count": null, "id": "41ff577b", "metadata": {}, "outputs": [], "source": [ "if HAS_MOUSE_YAML:\n", " g = api.track.graph\n", " print('Organism:', g.graph.get('organism'))\n", " print('Snapshot release:', g.graph.get('ensembl_release'))\n", " print('Main assembly:', g.graph.get('genome_assembly'))\n", " print('Assemblies in this graph:', sorted(api.list_genome_assemblies()))\n", " print('Nodes:', g.number_of_nodes())\n", " print('Edges:', g.number_of_edges())\n", "\n", " aed = sorted(getattr(g, 'available_external_databases', []))\n", " print('External DBs enabled (count):', len(aed))\n", " print('External DBs (first 20):', aed[:20])\n", "else:\n", " print('Mouse graph not built (missing YAML).')\n" ] }, { "cell_type": "code", "execution_count": null, "id": "4372ac04", "metadata": {}, "outputs": [], "source": [ "sorted(LOCAL_REPOSITORY.glob('graph_mus_musculus*.pickle'))[-5:]\n" ] }, { "cell_type": "markdown", "id": "b854f51e", "metadata": {}, "source": [ "### 3.3 — Pig graph initialization (clean handoff)\n", "\n", "Pig is a clean-handoff species (one maintained assembly per release: Sscrofa9.2 → Sscrofa10.2 → Sscrofa11.1). Older assemblies mainly\n", "matter for legacy datasets and archive releases; you typically do not have overlapping assemblies within the same release.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3f8bd787", "metadata": {}, "outputs": [], "source": [ "organism, latest_release = api.resolve_organism('sus scrofa')\n", "SNAPSHOT_RELEASE = latest_release\n", "organism, SNAPSHOT_RELEASE\n" ] }, { "cell_type": "code", "execution_count": null, "id": "56dd4c1e", "metadata": {}, "outputs": [], "source": [ "# Build (or load) the pig graph snapshot\n", "if HAS_PIG_YAML:\n", " api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)\n", "else:\n", " print('Skipping pig build: sus_scrofa_externals_modified.yml is missing (run Part 2 first).')\n" ] }, { "cell_type": "code", "execution_count": null, "id": "41615ba1", "metadata": {}, "outputs": [], "source": [ "if HAS_PIG_YAML:\n", " g = api.track.graph\n", " print('Organism:', g.graph.get('organism'))\n", " print('Snapshot release:', g.graph.get('ensembl_release'))\n", " print('Main assembly:', g.graph.get('genome_assembly'))\n", " print('Assemblies in this graph:', sorted(api.list_genome_assemblies()))\n", " print('Nodes:', g.number_of_nodes())\n", " print('Edges:', g.number_of_edges())\n", "\n", " aed = sorted(getattr(g, 'available_external_databases', []))\n", " print('External DBs enabled (count):', len(aed))\n", " print('External DBs (first 20):', aed[:20])\n", "else:\n", " print('Pig graph not built (missing YAML).')\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ec692c7e", "metadata": {}, "outputs": [], "source": [ "sorted(LOCAL_REPOSITORY.glob('graph_sus_scrofa*.pickle'))[-5:]\n" ] }, { "cell_type": "markdown", "id": "52d4dbd4", "metadata": {}, "source": [ "## 3.4 — Graph management (all species)\n", "\n", "### Reloading vs rebuilding\n", "\n", "- `api.build_graph(...)` is safe to call repeatedly.\n", " - If the snapshot already exists on disk, IDTrack will **load** it.\n", " - If it does not exist yet, IDTrack will **build** it (slow, first-time only).\n", "\n", "### Cache hygiene\n", "\n", "Your local repository can accumulate:\n", "- downloaded tables\n", "- graph snapshot pickle files\n", "- intermediate files used during builds\n", "\n", "> **Tip:** Treat your local repository as *project infrastructure*. Keep it stable so you get the benefits of caching.\n", "\n", "### Switching organisms / assemblies\n", "\n", "A graph snapshot is specific to:\n", "- organism\n", "- snapshot boundary (max release)\n", "- external YAML contents\n", "- the chosen **primary assembly** (the default output coordinate system)\n", "\n", "Even though the snapshot can include multiple assemblies, changing the primary assembly changes the snapshot and requires a rebuild.\n", "\n", "> **Tip:** The cached graph filename does not include the assembly. If you want to keep two different primary assemblies side-by-side, use\n", "> separate local repositories (or copy the graph pickle file).\n", "\n", "### Performance tips\n", "\n", "- Use `calculate_caches=True` during builds when you plan to do many conversions afterward.\n", "- Keep your external YAML allowlist small to reduce ambiguity and search space.\n", "\n", "> **Warning:** Do not delete caches unless you understand the consequence (you may force a full rebuild).\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9ee6e582", "metadata": {}, "outputs": [], "source": [ "# Helper: list what IDTrack has cached in your local repository.\n", "# Safe: this does NOT delete anything.\n", "\n", "from pathlib import Path\n", "\n", "cache = LOCAL_REPOSITORY\n", "\n", "print('Local repository:', cache)\n", "\n", "patterns = [\n", " 'graph_*.pickle',\n", " '*_externals_modified.yml',\n", " '*_externals_template.yml',\n", "]\n", "\n", "for pat in patterns:\n", " hits = sorted(cache.glob(pat))\n", " print()\n", " print(f'{pat} ({len(hits)}):')\n", " for p in hits[:10]:\n", " print(' ', p.name)\n", " if len(hits) > 10:\n", " print(' ...')\n" ] } ], "metadata": { "kernelspec": { "display_name": "idtrack_dev_env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.12" } }, "nbformat": 4, "nbformat_minor": 5 }