Part 3 — Graph Initialization & Management
Last updated: 2026-01-08
This notebook shows how to build and cache an IDTrack graph snapshot for:
homo_sapiens(human)mus_musculus(mouse)sus_scrofa(pig)
A graph build is the most expensive step. The good news:
you usually do it once per organism + snapshot boundary + external YAML configuration
the snapshot can be multi-assembly (human has overlapping GRCh38/GRCh37; mouse/pig are clean-handoff by release but legacy builds are supported)
then you reuse the cached graph for fast conversions
Learning objectives
Build (or load) a graph snapshot for each organism.
Verify that the snapshot exists on disk.
Learn practical graph-management habits (reload vs rebuild, cache hygiene).
Prerequisite: run
02_prepare_new_external_yaml.ipynbfirst (especially important for mouse and pig).
3.0 — What you should expect (time / disk)
Graph building can take:
minutes to hours (depends on organism, enabled externals, and cache status)
multiple GB of disk for cached tables + the graph pickle
Plan for this like you would plan for downloading a reference genome + annotation.
1
# Load notebook utilities (collapsible output magic for tutorials)
%load_ext _notebook_utils
2
# 1) Setup
from __future__ import annotations
import os
from pathlib import Path
import idtrack
LOCAL_REPOSITORY = Path(os.environ.get('IDTRACK_LOCAL_REPO', './idtrack_cache')).resolve()
LOCAL_REPOSITORY.mkdir(parents=True, exist_ok=True)
api = idtrack.API(local_repository=str(LOCAL_REPOSITORY))
api.configure_logger()
print('Local repository:', LOCAL_REPOSITORY)
Local repository: /Users/kemalinecik/git_nosync/master_idtrack/idtrack/docs/_notebooks/idtrack_cache
3.0.1 — Sanity check: do your external YAML files exist?
Human has a packaged default, but for mouse and pig you should have local *_externals_modified.yml files.
3
# External YAML presence (created in Part 2)
human_yaml = LOCAL_REPOSITORY / 'homo_sapiens_externals_modified.yml'
mouse_yaml = LOCAL_REPOSITORY / 'mus_musculus_externals_modified.yml'
pig_yaml = LOCAL_REPOSITORY / 'sus_scrofa_externals_modified.yml'
HAS_HUMAN_YAML = human_yaml.exists()
HAS_MOUSE_YAML = mouse_yaml.exists()
HAS_PIG_YAML = pig_yaml.exists()
print(("OK" if HAS_HUMAN_YAML else "NOTE: missing (human can fall back to packaged default)").ljust(55), human_yaml.name)
print(("OK" if HAS_MOUSE_YAML else "MISSING (create in Part 2 for mouse)").ljust(55), mouse_yaml.name)
print(("OK" if HAS_PIG_YAML else "MISSING (create in Part 2 for pig)").ljust(55), pig_yaml.name)
OK homo_sapiens_externals_modified.yml
MISSING (create in Part 2 for mouse) mus_musculus_externals_modified.yml
MISSING (create in Part 2 for pig) sus_scrofa_externals_modified.yml
If a file is missing:
go back to
02_prepare_new_external_yaml.ipynbgenerate the template and create the
_modified.ymlfile
3.1–3.3 — Build graph snapshots (one per organism)
The canonical pattern is:
resolve organism name
pick snapshot release
(optional) choose a primary genome assembly for output (defaults to the newest/highest-priority assembly for that organism)
api.build_graph(...)inspect + reuse
We do this for each organism below. If you only need one organism, run only that section.
3.1 — Human graph initialization (multi-assembly)
By default, the human snapshot is built with GRCh38 as the primary assembly (assembly code 38), while also including GRCh37 (37) and older archives when they exist within the snapshot window.
This is what enables atlas-building workflows where different datasets were annotated with different genome builds, but you want one unified identifier space.
4
organism, latest_release = api.resolve_organism('human')
SNAPSHOT_RELEASE = latest_release # pin to a specific release if needed
organism, SNAPSHOT_RELEASE
2026-01-10 16:42:27 INFO:verify_organism: Ensembl Rest API query to get the organism names and associated releases.
4
('homo_sapiens', 115)
5
%%collapse Click to show build logs
# Included for tutorial purposes only.
# Build (or load) the graph snapshot
# - calculate_caches=True speeds up later queries (slower build, faster use).
api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[5], line 5
1 # Included for tutorial purposes only.
2
3 # Build (or load) the graph snapshot
4 # - calculate_caches=True speeds up later queries (slower build, faster use).
----> 5 api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)
File ~/git_nosync/master_idtrack/idtrack/idtrack/_api.py:213, in API.build_graph(self, organism_name, snapshot_release, genome_assembly, return_test, calculate_caches)
211 self.track = TrackTests(dm)
212 else:
--> 213 self.track = Track(dm)
215 if calculate_caches and return_test:
216 self.calculate_graph_caches(for_test=True)
File ~/git_nosync/master_idtrack/idtrack/idtrack/_track.py:83, in Track.__init__(self, db_manager, **kwargs)
80 graph_creator = GraphMaker(self.db_manager)
82 # Calculate/Load the graph
---> 83 self.graph = graph_creator.get_graph(**kwargs)
84 self.version_info = self.graph.graph["version_info"]
85 self._external_entrance_placeholder = {False: -1, True: 10001}
File ~/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py:1086, in GraphMaker.get_graph(self, narrow, create_even_if_exist, save_after_calculation, overwrite_even_if_exist, form_list, narrow_external)
1084 if not os.access(file_path, os.R_OK) or create_even_if_exist:
1085 self.log.info(f"The graph is being constructed: {file_path}")
-> 1086 g = self.construct_graph(narrow=narrow, form_list=form_list, narrow_external=narrow_external)
1087 else: # Otherwise, just read the file that is already in the directory.
1088 self.log.info(f"The graph is being read: {file_path}")
File ~/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py:224, in GraphMaker.construct_graph(self, narrow, form_list, narrow_external)
219 for ens_rel in sorted(self.db_manager.available_releases):
220 # the order is important in adding new nodes into the core graph.
221 # it is important to capture correct ens_release in min_ens_release dictionary
223 db_manager = dbman_s[f].change_release(ens_rel)
--> 224 rc = db_manager.create_external_all(return_mode="all", narrow_external=narrow_external)
226 for _ind, entry in rc.iterrows():
227 # Note that the `rc` dataframe have higher priority assembly entries at the top.
229 e1, e2 = entry["graph_id"], entry["id_db"]
File ~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:2279, in DatabaseManager.create_external_all(self, return_mode, narrow_external)
2277 except ValueError:
2278 continue
-> 2279 df_temp = dm.get_db(df_indicator)
2280 df_temp["assembly"] = i
2281 df = pd.concat([df, df_temp])
File ~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:2484, in DatabaseManager.get_db(self, df_indicator, create_even_if_exist, save_after_calculation, overwrite_even_if_exist)
2481 df = self.create_external_db(filter_mode="all")
2483 elif main_ind == "external" and param1_ind in ["relevant", "database", "relevant-database"]:
-> 2484 df = self.create_external_db(filter_mode=param1_ind)
2486 elif main_ind == "idsraw":
2487 if param1_ind not in self.available_form_of_interests:
File ~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:1990, in DatabaseManager.create_external_db(self, filter_mode)
1986 a = self.get_db(f"idsraw_{self.form}", save_after_calculation=self.store_raw_always)
1987 ox = self.get_table(
1988 "object_xref", usecols=["ensembl_id", "ensembl_object_type", "xref_id", "object_xref_id"], **m
1989 )
-> 1990 x = self.get_table("xref", usecols=["xref_id", "external_db_id", "dbprimary_acc", "display_label"], **m)
1991 ed = self.get_table("external_db", usecols=["external_db_id", "db_name", "db_display_name"], **m)
1992 ix = self.get_table("identity_xref", usecols=["ensembl_identity", "xref_identity", "object_xref_id"], **m)
File ~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:1104, in DatabaseManager.get_table(self, table_key, usecols, create_even_if_exist, save_after_calculation, overwrite_even_if_exist)
1102 df = self.download_table(table_key, usecols)
1103 else: # Otherwise, just read the file that is already in the directory.
-> 1104 df = hs.read_exported(hierarchy, file_path)
1106 # If prompt, save the dataframe in requested format.
1107 if save_after_calculation:
File ~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:771, in read_exported(hierarchy, file_path)
768 if not check_h5_key(file_path, hierarchy):
769 raise KeyError(f"Key {hierarchy!r} not found in HDF5 file {file_path!r}.")
--> 771 df = read_hdf(path=file_path, key=hierarchy, mode="r")
772 return df
File ~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:100, in read_hdf(path, key, mode)
97 index, index_names = _load_index_data(grp)
99 # Load column data
--> 100 df = _load_column_data(grp, columns, dtypes)
102 # Set index and metadata
103 df.index = index
File ~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:497, in _load_column_data(grp, columns, dtypes)
495 raw = data_grp[col_key][()]
496 if isinstance(raw[0], bytes):
--> 497 raw = [x.decode(DB.UTF8) for x in raw]
498 restored = [pd.NA if x == DB.placeholder_na else x for x in raw]
499 data_dict[col] = restored
File ~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:497, in <listcomp>(.0)
495 raw = data_grp[col_key][()]
496 if isinstance(raw[0], bytes):
--> 497 raw = [x.decode(DB.UTF8) for x in raw]
498 restored = [pd.NA if x == DB.placeholder_na else x for x in raw]
499 data_dict[col] = restored
KeyboardInterrupt:
Click to show build logs
2026-01-10 16:43:06 INFO:graph_maker: The graph is being constructed: /Users/kemalinecik/git_nosync/master_idtrack/idtrack/docs/_notebooks/idtrack_cache/graph_homo_sapiens_min76_max115_narrow.pickle
2026-01-10 16:43:06 INFO:graph_maker: Graph is being created: gene
2026-01-10 16:43:08 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_versioninfo_gene`
2026-01-10 16:43:57 INFO:database_manager: Raw table for `stable_id_event` on ensembl release `115` was downloaded.
2026-01-10 16:43:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_mysql_stable_id_event`
2026-01-10 16:45:07 INFO:database_manager: Raw table for `mapping_session` on ensembl release `115` was downloaded.
2026-01-10 16:45:07 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_mysql_mapping_session`
2026-01-10 16:45:10 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_idhistory_narrow_gene`
2026-01-10 16:45:14 WARNING:graph_maker: Edge weights ignored due to duplicate entries: 2.
2026-01-10 16:45:14 INFO:graph_maker: Edges between across different IDs and self loops are being added.
2026-01-10 16:45:16 INFO:graph_maker: Edges between the same IDs are being added.
2026-01-10 16:45:34 WARNING:graph_maker: Retired ID come alive again: 3.
2026-01-10 16:45:34 INFO:graph_maker: Edges showing the retirement of IDs are being added.
2026-01-10 16:45:40 INFO:graph_maker: Problematic nodes in Ensembl ID history are being removed.
2026-01-10 16:45:46 WARNING:graph_maker: Nodes are deleted due to Ensembl ID history mistake: 3.
2026-01-10 16:45:46 INFO:graph_maker: Self-loops for latest release entries are being added.
2026-01-10 16:45:49 INFO:graph_maker: Node attributes are being added.
2026-01-10 16:45:50 INFO:graph_maker: Graph is being created: transcript
2026-01-10 16:45:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_versioninfo_transcript`
2026-01-10 16:46:10 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_idhistory_narrow_transcript`
2026-01-10 16:46:27 INFO:graph_maker: Edges between across different IDs and self loops are being added.
2026-01-10 16:46:33 INFO:graph_maker: Edges between the same IDs are being added.
2026-01-10 16:48:00 WARNING:graph_maker: Retired ID come alive again: 5.
2026-01-10 16:48:00 INFO:graph_maker: Edges showing the retirement of IDs are being added.
2026-01-10 16:48:25 INFO:graph_maker: Problematic nodes in Ensembl ID history are being removed.
2026-01-10 16:48:47 INFO:graph_maker: Self-loops for latest release entries are being added.
2026-01-10 16:49:02 INFO:graph_maker: Node attributes are being added.
2026-01-10 16:49:11 INFO:graph_maker: Graph is being created: translation
2026-01-10 16:49:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_versioninfo_translation`
2026-01-10 16:49:23 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_idhistory_narrow_translation`
2026-01-10 16:49:26 WARNING:graph_maker: Edge weights ignored due to duplicate entries: 4.
2026-01-10 16:49:26 INFO:graph_maker: Edges between across different IDs and self loops are being added.
2026-01-10 16:49:27 INFO:graph_maker: Edges between the same IDs are being added.
2026-01-10 16:49:54 WARNING:graph_maker: Retired ID come alive again: 4.
2026-01-10 16:49:54 INFO:graph_maker: Edges showing the retirement of IDs are being added.
2026-01-10 16:50:04 INFO:graph_maker: Problematic nodes in Ensembl ID history are being removed.
2026-01-10 16:50:14 WARNING:graph_maker: Nodes are deleted due to Ensembl ID history mistake: 1.
2026-01-10 16:50:14 INFO:graph_maker: Self-loops for latest release entries are being added.
2026-01-10 16:50:21 INFO:graph_maker: Node attributes are being added.
2026-01-10 16:50:24 WARNING:graph_maker: Intersecting Ensembl nodes: Nodes in 'transcript' will be replaced by 'translation': 'ENST00000515292.1'.
2026-01-10 16:50:28 INFO:graph_maker: Establishing connection between different forms.
2026-01-10 16:50:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens76_processed_idsraw_transcript_gene`
2026-01-10 16:50:32 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens76_processed_idsraw_translation_gene`
2026-01-10 16:50:35 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens76_common_relationcurrent`
2026-01-10 16:50:49 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens77_processed_idsraw_transcript_gene`
2026-01-10 16:50:52 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens77_processed_idsraw_translation_gene`
2026-01-10 16:50:55 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens77_common_relationcurrent`
2026-01-10 16:51:05 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens78_processed_idsraw_transcript_gene`
2026-01-10 16:51:09 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens78_processed_idsraw_translation_gene`
2026-01-10 16:51:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens78_common_relationcurrent`
2026-01-10 16:51:20 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens79_processed_idsraw_transcript_gene`
2026-01-10 16:51:23 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens79_processed_idsraw_translation_gene`
2026-01-10 16:51:25 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens79_common_relationcurrent`
2026-01-10 16:51:36 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens80_processed_idsraw_transcript_gene`
2026-01-10 16:51:40 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens80_processed_idsraw_translation_gene`
2026-01-10 16:51:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens80_common_relationcurrent`
2026-01-10 16:51:51 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens81_processed_idsraw_transcript_gene`
2026-01-10 16:51:55 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens81_processed_idsraw_translation_gene`
2026-01-10 16:51:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens81_common_relationcurrent`
2026-01-10 16:52:09 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens82_processed_idsraw_transcript_gene`
2026-01-10 16:52:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens82_processed_idsraw_translation_gene`
2026-01-10 16:52:15 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens82_common_relationcurrent`
2026-01-10 16:52:24 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens83_processed_idsraw_transcript_gene`
2026-01-10 16:52:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens83_processed_idsraw_translation_gene`
2026-01-10 16:52:30 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens83_common_relationcurrent`
2026-01-10 16:52:39 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens84_processed_idsraw_transcript_gene`
2026-01-10 16:52:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens84_processed_idsraw_translation_gene`
2026-01-10 16:52:45 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens84_common_relationcurrent`
2026-01-10 16:52:55 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens85_processed_idsraw_transcript_gene`
2026-01-10 16:52:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens85_processed_idsraw_translation_gene`
2026-01-10 16:53:01 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens85_common_relationcurrent`
2026-01-10 16:53:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens86_processed_idsraw_transcript_gene`
2026-01-10 16:53:15 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens86_processed_idsraw_translation_gene`
2026-01-10 16:53:17 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens86_common_relationcurrent`
2026-01-10 16:53:26 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens87_processed_idsraw_transcript_gene`
2026-01-10 16:53:30 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens87_processed_idsraw_translation_gene`
2026-01-10 16:53:32 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens87_common_relationcurrent`
2026-01-10 16:53:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens88_processed_idsraw_transcript_gene`
2026-01-10 16:53:45 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens88_processed_idsraw_translation_gene`
2026-01-10 16:53:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens88_common_relationcurrent`
2026-01-10 16:53:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens89_processed_idsraw_transcript_gene`
2026-01-10 16:54:01 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens89_processed_idsraw_translation_gene`
2026-01-10 16:54:03 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens89_common_relationcurrent`
2026-01-10 16:54:12 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens90_processed_idsraw_transcript_gene`
2026-01-10 16:54:16 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens90_processed_idsraw_translation_gene`
2026-01-10 16:54:18 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens90_common_relationcurrent`
2026-01-10 16:54:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens91_processed_idsraw_transcript_gene`
2026-01-10 16:54:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens91_processed_idsraw_translation_gene`
2026-01-10 16:54:33 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens91_common_relationcurrent`
2026-01-10 16:54:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens92_processed_idsraw_transcript_gene`
2026-01-10 16:54:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens92_processed_idsraw_translation_gene`
2026-01-10 16:54:49 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens92_common_relationcurrent`
2026-01-10 16:54:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens93_processed_idsraw_transcript_gene`
2026-01-10 16:55:02 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens93_processed_idsraw_translation_gene`
2026-01-10 16:55:05 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens93_common_relationcurrent`
2026-01-10 16:55:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens94_processed_idsraw_transcript_gene`
2026-01-10 16:55:18 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens94_processed_idsraw_translation_gene`
2026-01-10 16:55:20 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens94_common_relationcurrent`
2026-01-10 16:55:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens95_processed_idsraw_transcript_gene`
2026-01-10 16:55:34 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens95_processed_idsraw_translation_gene`
2026-01-10 16:55:37 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens95_common_relationcurrent`
2026-01-10 16:55:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens96_processed_idsraw_transcript_gene`
2026-01-10 16:55:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens96_processed_idsraw_translation_gene`
2026-01-10 16:55:53 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens96_common_relationcurrent`
2026-01-10 16:56:03 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens97_processed_idsraw_transcript_gene`
2026-01-10 16:56:06 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens97_processed_idsraw_translation_gene`
2026-01-10 16:56:08 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens97_common_relationcurrent`
2026-01-10 16:56:21 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens98_processed_idsraw_transcript_gene`
2026-01-10 16:56:24 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens98_processed_idsraw_translation_gene`
2026-01-10 16:56:27 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens98_common_relationcurrent`
2026-01-10 16:56:36 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens99_processed_idsraw_transcript_gene`
2026-01-10 16:56:39 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens99_processed_idsraw_translation_gene`
2026-01-10 16:56:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens99_common_relationcurrent`
2026-01-10 16:56:51 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens100_processed_idsraw_transcript_gene`
2026-01-10 16:56:54 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens100_processed_idsraw_translation_gene`
2026-01-10 16:56:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens100_common_relationcurrent`
2026-01-10 16:57:08 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens101_processed_idsraw_transcript_gene`
2026-01-10 16:57:10 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens101_processed_idsraw_translation_gene`
2026-01-10 16:57:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens101_common_relationcurrent`
2026-01-10 16:57:23 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens102_processed_idsraw_transcript_gene`
2026-01-10 16:57:27 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens102_processed_idsraw_translation_gene`
2026-01-10 16:57:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens102_common_relationcurrent`
2026-01-10 16:57:40 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens103_processed_idsraw_transcript_gene`
2026-01-10 16:57:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens103_processed_idsraw_translation_gene`
2026-01-10 16:57:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens103_common_relationcurrent`
2026-01-10 16:57:58 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens104_processed_idsraw_transcript_gene`
2026-01-10 16:58:01 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens104_processed_idsraw_translation_gene`
2026-01-10 16:58:04 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens104_common_relationcurrent`
2026-01-10 16:58:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens105_processed_idsraw_transcript_gene`
2026-01-10 16:58:18 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens105_processed_idsraw_translation_gene`
2026-01-10 16:58:20 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens105_common_relationcurrent`
2026-01-10 16:58:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens106_processed_idsraw_transcript_gene`
2026-01-10 16:58:35 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens106_processed_idsraw_translation_gene`
2026-01-10 16:58:38 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens106_common_relationcurrent`
2026-01-10 16:58:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens107_processed_idsraw_transcript_gene`
2026-01-10 16:58:53 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens107_processed_idsraw_translation_gene`
2026-01-10 16:58:56 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens107_common_relationcurrent`
2026-01-10 16:59:08 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens108_processed_idsraw_transcript_gene`
2026-01-10 16:59:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens108_processed_idsraw_translation_gene`
2026-01-10 16:59:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens108_common_relationcurrent`
2026-01-10 16:59:25 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens109_processed_idsraw_transcript_gene`
2026-01-10 16:59:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens109_processed_idsraw_translation_gene`
2026-01-10 16:59:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens109_common_relationcurrent`
2026-01-10 16:59:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens110_processed_idsraw_transcript_gene`
2026-01-10 16:59:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens110_processed_idsraw_translation_gene`
2026-01-10 16:59:49 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens110_common_relationcurrent`
2026-01-10 17:00:01 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens111_processed_idsraw_transcript_gene`
2026-01-10 17:00:05 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens111_processed_idsraw_translation_gene`
2026-01-10 17:00:07 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens111_common_relationcurrent`
2026-01-10 17:00:19 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens112_processed_idsraw_transcript_gene`
2026-01-10 17:00:23 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens112_processed_idsraw_translation_gene`
2026-01-10 17:00:25 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens112_common_relationcurrent`
2026-01-10 17:00:38 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens113_processed_idsraw_transcript_gene`
2026-01-10 17:00:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens113_processed_idsraw_translation_gene`
2026-01-10 17:00:47 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens113_common_relationcurrent`
2026-01-10 17:01:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens114_processed_idsraw_transcript_gene`
2026-01-10 17:01:17 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens114_processed_idsraw_translation_gene`
2026-01-10 17:01:20 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens114_common_relationcurrent`
2026-01-10 17:01:39 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_idsraw_transcript_gene`
2026-01-10 17:01:48 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_idsraw_translation_gene`
2026-01-10 17:01:53 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_common_relationcurrent`
2026-01-10 17:02:18 INFO:graph_maker: Edges between external IDs to Ensembl IDs is being added for 'gene'.
2026-01-10 17:02:33 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens76_processed_external_relevant_gene`
2026-01-10 17:03:40 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\').
2026-01-10 17:03:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens76_processed_external_relevant_gene`
2026-01-10 17:04:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens77_processed_external_relevant_gene`
2026-01-10 17:04:59 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\').
2026-01-10 17:05:04 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens77_processed_external_relevant_gene`
2026-01-10 17:05:52 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens78_processed_external_relevant_gene`
2026-01-10 17:06:08 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\').
2026-01-10 17:06:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens78_processed_external_relevant_gene`
2026-01-10 17:06:51 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens79_processed_external_relevant_gene`
2026-01-10 17:07:12 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens79_processed_external_relevant_gene`
2026-01-10 17:07:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens80_processed_external_relevant_gene`
2026-01-10 17:08:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens80_processed_external_relevant_gene`
2026-01-10 17:08:48 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens81_processed_external_relevant_gene`
2026-01-10 17:09:08 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens81_processed_external_relevant_gene`
2026-01-10 17:09:46 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens82_processed_external_relevant_gene`
2026-01-10 17:10:06 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens82_processed_external_relevant_gene`
2026-01-10 17:10:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens83_processed_external_relevant_gene`
2026-01-10 17:11:04 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens83_processed_external_relevant_gene`
2026-01-10 17:11:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens84_processed_external_relevant_gene`
2026-01-10 17:12:02 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens84_processed_external_relevant_gene`
2026-01-10 17:12:40 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens85_processed_external_relevant_gene`
2026-01-10 17:13:00 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens85_processed_external_relevant_gene`
2026-01-10 17:13:40 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens86_processed_external_relevant_gene`
2026-01-10 17:14:00 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens86_processed_external_relevant_gene`
2026-01-10 17:14:39 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens87_processed_external_relevant_gene`
2026-01-10 17:14:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens87_processed_external_relevant_gene`
2026-01-10 17:15:38 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens88_processed_external_relevant_gene`
2026-01-10 17:15:56 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens88_processed_external_relevant_gene`
2026-01-10 17:16:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens89_processed_external_relevant_gene`
2026-01-10 17:16:45 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens89_processed_external_relevant_gene`
2026-01-10 17:17:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens90_processed_external_relevant_gene`
2026-01-10 17:17:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens90_processed_external_relevant_gene`
2026-01-10 17:17:56 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens91_processed_external_relevant_gene`
2026-01-10 17:18:11 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens91_processed_external_relevant_gene`
2026-01-10 17:18:39 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens92_processed_external_relevant_gene`
2026-01-10 17:18:49 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\').
2026-01-10 17:18:53 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens92_processed_external_relevant_gene`
2026-01-10 17:19:21 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens93_processed_external_relevant_gene`
2026-01-10 17:19:36 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens93_processed_external_relevant_gene`
2026-01-10 17:20:03 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens94_processed_external_relevant_gene`
2026-01-10 17:20:18 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens94_processed_external_relevant_gene`
2026-01-10 17:20:46 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens95_processed_external_relevant_gene`
2026-01-10 17:21:01 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens95_processed_external_relevant_gene`
2026-01-10 17:21:30 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens96_processed_external_relevant_gene`
2026-01-10 17:21:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens96_processed_external_relevant_gene`
2026-01-10 17:22:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens97_processed_external_relevant_gene`
2026-01-10 17:22:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens97_processed_external_relevant_gene`
2026-01-10 17:22:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens98_processed_external_relevant_gene`
2026-01-10 17:23:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens98_processed_external_relevant_gene`
2026-01-10 17:23:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens99_processed_external_relevant_gene`
2026-01-10 17:23:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens99_processed_external_relevant_gene`
2026-01-10 17:24:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens100_processed_external_relevant_gene`
2026-01-10 17:24:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens100_processed_external_relevant_gene`
2026-01-10 17:25:12 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens101_processed_external_relevant_gene`
2026-01-10 17:25:26 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens101_processed_external_relevant_gene`
2026-01-10 17:25:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens102_processed_external_relevant_gene`
2026-01-10 17:26:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens102_processed_external_relevant_gene`
2026-01-10 17:26:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens103_processed_external_relevant_gene`
2026-01-10 17:26:58 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens103_processed_external_relevant_gene`
2026-01-10 17:27:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens104_processed_external_relevant_gene`
2026-01-10 17:27:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens104_processed_external_relevant_gene`
2026-01-10 17:28:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens105_processed_external_relevant_gene`
2026-01-10 17:28:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens105_processed_external_relevant_gene`
2026-01-10 17:29:32 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens106_processed_external_relevant_gene`
2026-01-10 17:29:46 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens106_processed_external_relevant_gene`
2026-01-10 17:30:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens107_processed_external_relevant_gene`
2026-01-10 17:30:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens107_processed_external_relevant_gene`
2026-01-10 17:30:58 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens108_processed_external_relevant_gene`
2026-01-10 17:31:12 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens108_processed_external_relevant_gene`
2026-01-10 17:31:42 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens109_processed_external_relevant_gene`
2026-01-10 17:31:57 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens109_processed_external_relevant_gene`
2026-01-10 17:32:26 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens110_processed_external_relevant_gene`
2026-01-10 17:32:43 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens110_processed_external_relevant_gene`
2026-01-10 17:33:13 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens111_processed_external_relevant_gene`
2026-01-10 17:33:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens111_processed_external_relevant_gene`
2026-01-10 17:33:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens112_processed_external_relevant_gene`
2026-01-10 17:34:14 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens112_processed_external_relevant_gene`
2026-01-10 17:34:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens113_processed_external_relevant_gene`
2026-01-10 17:34:59 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens113_processed_external_relevant_gene`
2026-01-10 17:35:29 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens114_processed_external_relevant_gene`
2026-01-10 17:35:44 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens114_processed_external_relevant_gene`
2026-01-10 17:36:15 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens115_processed_external_relevant_gene`
2026-01-10 17:36:31 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens115_processed_external_relevant_gene`
2026-01-10 17:36:50 WARNING:graph_maker: New nodes added as assembly nodes: 28738
2026-01-10 17:36:50 INFO:graph_maker: Edges between external IDs to Ensembl IDs is being added for 'transcript'.
2026-01-10 17:37:07 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens76_processed_external_relevant_transcript`
2026-01-10 17:37:20 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\').
2026-01-10 17:37:32 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens76_processed_external_relevant_transcript`
2026-01-10 17:38:12 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens77_processed_external_relevant_transcript`
2026-01-10 17:38:25 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\').
2026-01-10 17:38:34 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens77_processed_external_relevant_transcript`
2026-01-10 17:39:30 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens78_processed_external_relevant_transcript`
2026-01-10 17:39:43 WARNING:database_manager: Dropping 7 malformed rows from `external_db` where `external_db_id` is not numeric (e.g. '\\').
2026-01-10 17:39:52 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens78_processed_external_relevant_transcript`
2026-01-10 17:40:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens79_processed_external_relevant_transcript`
2026-01-10 17:40:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens79_processed_external_relevant_transcript`
2026-01-10 17:41:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens80_processed_external_relevant_transcript`
2026-01-10 17:41:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens80_processed_external_relevant_transcript`
2026-01-10 17:42:28 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens81_processed_external_relevant_transcript`
2026-01-10 17:42:50 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens81_processed_external_relevant_transcript`
2026-01-10 17:43:30 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens82_processed_external_relevant_transcript`
2026-01-10 17:43:53 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens82_processed_external_relevant_transcript`
2026-01-10 17:44:32 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens83_processed_external_relevant_transcript`
2026-01-10 17:44:55 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens83_processed_external_relevant_transcript`
2026-01-10 17:45:35 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-38.h5` with key `ens84_processed_external_relevant_transcript`
2026-01-10 17:46:00 INFO:database_manager: Exporting to the following file `homo_sapiens_assembly-37.h5` with key `ens84_processed_external_relevant_transcript`
Traceback (most recent call last):
File "/Users/kemalinecik/tools/apps/mamba/envs/idtrack_dev_env/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3670, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/var/folders/y7/7c17s0l57szdjc1cdc9dmnpm0000gn/T/ipykernel_5077/295636214.py", line 5, in <module>
api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_api.py", line 213, in build_graph
self.track = Track(dm)
^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_track.py", line 83, in __init__
self.graph = graph_creator.get_graph(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py", line 1086, in get_graph
g = self.construct_graph(narrow=narrow, form_list=form_list, narrow_external=narrow_external)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py", line 224, in construct_graph
rc = db_manager.create_external_all(return_mode="all", narrow_external=narrow_external)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py", line 2279, in create_external_all
df_temp = dm.get_db(df_indicator)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py", line 2484, in get_db
df = self.create_external_db(filter_mode=param1_ind)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py", line 1990, in create_external_db
x = self.get_table("xref", usecols=["xref_id", "external_db_id", "dbprimary_acc", "display_label"], **m)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py", line 1104, in get_table
df = hs.read_exported(hierarchy, file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py", line 771, in read_exported
df = read_hdf(path=file_path, key=hierarchy, mode="r")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py", line 100, in read_hdf
df = _load_column_data(grp, columns, dtypes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py", line 497, in _load_column_data
raw = [x.decode(DB.UTF8) for x in raw]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py", line 497, in <listcomp>
raw = [x.decode(DB.UTF8) for x in raw]
^^^^^^^^^^^^^^^^^
KeyboardInterrupt
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[5], line 1
----> 1 get_ipython().run_cell_magic('collapse', 'Click to show build logs', '# Included for tutorial purposes only.\n\n# Build (or load) the graph snapshot\n# - calculate_caches=True speeds up later queries (slower build, faster use).\napi.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)\n')
File ~/tools/apps/mamba/envs/idtrack_dev_env/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2547, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
2545 with self.builtin_trap:
2546 args = (magic_arg_s, cell)
-> 2547 result = fn(*args, **kwargs)
2549 # The code below prevents the output from being displayed
2550 # when using magics with decorator @output_can_be_silenced
2551 # when the last Python token in the expression is a ';'.
2552 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):
File ~/git_nosync/master_idtrack/idtrack/docs/_notebooks/_notebook_utils.py:376, in CollapseMagics.collapse(self, line, cell)
374 err = getattr(exec_result, "error_before_exec", None) or getattr(exec_result, "error_in_exec", None)
375 if err is not None:
--> 376 raise err
[... skipping hidden 1 frame]
Cell In[5], line 5
1 # Included for tutorial purposes only.
2
3 # Build (or load) the graph snapshot
4 # - calculate_caches=True speeds up later queries (slower build, faster use).
----> 5 api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)
File ~/git_nosync/master_idtrack/idtrack/idtrack/_api.py:213, in API.build_graph(self, organism_name, snapshot_release, genome_assembly, return_test, calculate_caches)
211 self.track = TrackTests(dm)
212 else:
--> 213 self.track = Track(dm)
215 if calculate_caches and return_test:
216 self.calculate_graph_caches(for_test=True)
File ~/git_nosync/master_idtrack/idtrack/idtrack/_track.py:83, in Track.__init__(self, db_manager, **kwargs)
80 graph_creator = GraphMaker(self.db_manager)
82 # Calculate/Load the graph
---> 83 self.graph = graph_creator.get_graph(**kwargs)
84 self.version_info = self.graph.graph["version_info"]
85 self._external_entrance_placeholder = {False: -1, True: 10001}
File ~/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py:1086, in GraphMaker.get_graph(self, narrow, create_even_if_exist, save_after_calculation, overwrite_even_if_exist, form_list, narrow_external)
1084 if not os.access(file_path, os.R_OK) or create_even_if_exist:
1085 self.log.info(f"The graph is being constructed: {file_path}")
-> 1086 g = self.construct_graph(narrow=narrow, form_list=form_list, narrow_external=narrow_external)
1087 else: # Otherwise, just read the file that is already in the directory.
1088 self.log.info(f"The graph is being read: {file_path}")
File ~/git_nosync/master_idtrack/idtrack/idtrack/_graph_maker.py:224, in GraphMaker.construct_graph(self, narrow, form_list, narrow_external)
219 for ens_rel in sorted(self.db_manager.available_releases):
220 # the order is important in adding new nodes into the core graph.
221 # it is important to capture correct ens_release in min_ens_release dictionary
223 db_manager = dbman_s[f].change_release(ens_rel)
--> 224 rc = db_manager.create_external_all(return_mode="all", narrow_external=narrow_external)
226 for _ind, entry in rc.iterrows():
227 # Note that the `rc` dataframe have higher priority assembly entries at the top.
229 e1, e2 = entry["graph_id"], entry["id_db"]
File ~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:2279, in DatabaseManager.create_external_all(self, return_mode, narrow_external)
2277 except ValueError:
2278 continue
-> 2279 df_temp = dm.get_db(df_indicator)
2280 df_temp["assembly"] = i
2281 df = pd.concat([df, df_temp])
File ~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:2484, in DatabaseManager.get_db(self, df_indicator, create_even_if_exist, save_after_calculation, overwrite_even_if_exist)
2481 df = self.create_external_db(filter_mode="all")
2483 elif main_ind == "external" and param1_ind in ["relevant", "database", "relevant-database"]:
-> 2484 df = self.create_external_db(filter_mode=param1_ind)
2486 elif main_ind == "idsraw":
2487 if param1_ind not in self.available_form_of_interests:
File ~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:1990, in DatabaseManager.create_external_db(self, filter_mode)
1986 a = self.get_db(f"idsraw_{self.form}", save_after_calculation=self.store_raw_always)
1987 ox = self.get_table(
1988 "object_xref", usecols=["ensembl_id", "ensembl_object_type", "xref_id", "object_xref_id"], **m
1989 )
-> 1990 x = self.get_table("xref", usecols=["xref_id", "external_db_id", "dbprimary_acc", "display_label"], **m)
1991 ed = self.get_table("external_db", usecols=["external_db_id", "db_name", "db_display_name"], **m)
1992 ix = self.get_table("identity_xref", usecols=["ensembl_identity", "xref_identity", "object_xref_id"], **m)
File ~/git_nosync/master_idtrack/idtrack/idtrack/_database_manager.py:1104, in DatabaseManager.get_table(self, table_key, usecols, create_even_if_exist, save_after_calculation, overwrite_even_if_exist)
1102 df = self.download_table(table_key, usecols)
1103 else: # Otherwise, just read the file that is already in the directory.
-> 1104 df = hs.read_exported(hierarchy, file_path)
1106 # If prompt, save the dataframe in requested format.
1107 if save_after_calculation:
File ~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:771, in read_exported(hierarchy, file_path)
768 if not check_h5_key(file_path, hierarchy):
769 raise KeyError(f"Key {hierarchy!r} not found in HDF5 file {file_path!r}.")
--> 771 df = read_hdf(path=file_path, key=hierarchy, mode="r")
772 return df
File ~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:100, in read_hdf(path, key, mode)
97 index, index_names = _load_index_data(grp)
99 # Load column data
--> 100 df = _load_column_data(grp, columns, dtypes)
102 # Set index and metadata
103 df.index = index
File ~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:497, in _load_column_data(grp, columns, dtypes)
495 raw = data_grp[col_key][()]
496 if isinstance(raw[0], bytes):
--> 497 raw = [x.decode(DB.UTF8) for x in raw]
498 restored = [pd.NA if x == DB.placeholder_na else x for x in raw]
499 data_dict[col] = restored
File ~/git_nosync/master_idtrack/idtrack/idtrack/_utils_hdf5.py:497, in <listcomp>(.0)
495 raw = data_grp[col_key][()]
496 if isinstance(raw[0], bytes):
--> 497 raw = [x.decode(DB.UTF8) for x in raw]
498 restored = [pd.NA if x == DB.placeholder_na else x for x in raw]
499 data_dict[col] = restored
KeyboardInterrupt:
# Quick inspection
g = api.track.graph
print('Organism:', g.graph.get('organism'))
print('Snapshot release:', g.graph.get('ensembl_release'))
print('Main assembly:', g.graph.get('genome_assembly'))
print('Assemblies in this graph:', sorted(api.list_genome_assemblies()))
print('Nodes:', g.number_of_nodes())
print('Edges:', g.number_of_edges())
aed = sorted(getattr(g, 'available_external_databases', []))
print('External DBs enabled (count):', len(aed))
print('External DBs (first 20):', aed[:20])
# Where is the graph file stored?
sorted(LOCAL_REPOSITORY.glob('graph_homo_sapiens*.pickle'))[-5:]
3.2 — Mouse graph initialization (clean handoff)
Mouse is a clean-handoff species (one maintained assembly per release: GRCm37 → GRCm38 → GRCm39). Older assemblies mainly matter for legacy datasets and archive releases; you typically do not have overlapping assemblies within the same release.
organism, latest_release = api.resolve_organism('mus musculus')
SNAPSHOT_RELEASE = latest_release
organism, SNAPSHOT_RELEASE
# Build (or load) the mouse graph snapshot
if HAS_MOUSE_YAML:
api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)
else:
print('Skipping mouse build: mus_musculus_externals_modified.yml is missing (run Part 2 first).')
if HAS_MOUSE_YAML:
g = api.track.graph
print('Organism:', g.graph.get('organism'))
print('Snapshot release:', g.graph.get('ensembl_release'))
print('Main assembly:', g.graph.get('genome_assembly'))
print('Assemblies in this graph:', sorted(api.list_genome_assemblies()))
print('Nodes:', g.number_of_nodes())
print('Edges:', g.number_of_edges())
aed = sorted(getattr(g, 'available_external_databases', []))
print('External DBs enabled (count):', len(aed))
print('External DBs (first 20):', aed[:20])
else:
print('Mouse graph not built (missing YAML).')
sorted(LOCAL_REPOSITORY.glob('graph_mus_musculus*.pickle'))[-5:]
3.3 — Pig graph initialization (clean handoff)
Pig is a clean-handoff species (one maintained assembly per release: Sscrofa9.2 → Sscrofa10.2 → Sscrofa11.1). Older assemblies mainly matter for legacy datasets and archive releases; you typically do not have overlapping assemblies within the same release.
organism, latest_release = api.resolve_organism('sus scrofa')
SNAPSHOT_RELEASE = latest_release
organism, SNAPSHOT_RELEASE
# Build (or load) the pig graph snapshot
if HAS_PIG_YAML:
api.build_graph(organism_name=organism, snapshot_release=SNAPSHOT_RELEASE, calculate_caches=True)
else:
print('Skipping pig build: sus_scrofa_externals_modified.yml is missing (run Part 2 first).')
if HAS_PIG_YAML:
g = api.track.graph
print('Organism:', g.graph.get('organism'))
print('Snapshot release:', g.graph.get('ensembl_release'))
print('Main assembly:', g.graph.get('genome_assembly'))
print('Assemblies in this graph:', sorted(api.list_genome_assemblies()))
print('Nodes:', g.number_of_nodes())
print('Edges:', g.number_of_edges())
aed = sorted(getattr(g, 'available_external_databases', []))
print('External DBs enabled (count):', len(aed))
print('External DBs (first 20):', aed[:20])
else:
print('Pig graph not built (missing YAML).')
sorted(LOCAL_REPOSITORY.glob('graph_sus_scrofa*.pickle'))[-5:]
3.4 — Graph management (all species)
Reloading vs rebuilding
api.build_graph(...)is safe to call repeatedly.If the snapshot already exists on disk, IDTrack will load it.
If it does not exist yet, IDTrack will build it (slow, first-time only).
Cache hygiene
Your local repository can accumulate:
downloaded tables
graph snapshot pickle files
intermediate files used during builds
Tip: Treat your local repository as project infrastructure. Keep it stable so you get the benefits of caching.
Switching organisms / assemblies
A graph snapshot is specific to:
organism
snapshot boundary (max release)
external YAML contents
the chosen primary assembly (the default output coordinate system)
Even though the snapshot can include multiple assemblies, changing the primary assembly changes the snapshot and requires a rebuild.
Tip: The cached graph filename does not include the assembly. If you want to keep two different primary assemblies side-by-side, use separate local repositories (or copy the graph pickle file).
Performance tips
Use
calculate_caches=Trueduring builds when you plan to do many conversions afterward.Keep your external YAML allowlist small to reduce ambiguity and search space.
Warning: Do not delete caches unless you understand the consequence (you may force a full rebuild).
# Helper: list what IDTrack has cached in your local repository.
# Safe: this does NOT delete anything.
from pathlib import Path
cache = LOCAL_REPOSITORY
print('Local repository:', cache)
patterns = [
'graph_*.pickle',
'*_externals_modified.yml',
'*_externals_template.yml',
]
for pat in patterns:
hits = sorted(cache.glob(pat))
print()
print(f'{pat} ({len(hits)}):')
for p in hits[:10]:
print(' ', p.name)
if len(hits) > 10:
print(' ...')