Part 1 — Environment Setup & Installation

Last updated: 2026-01-08

This tutorial gets you from “I have Python” to “IDTrack imports and has a working local cache directory”.

Learning objectives

Install IDTrack in an isolated environment (conda or pip).
Verify your installation with a short, copy/pasteable checklist.
Understand and set IDTRACK_LOCAL_REPO (your on-disk cache + configuration folder).
Know what “success” looks like before you start building graphs.

Warning: The first real graph build (Part 3) can take time and disk space. This notebook only verifies that your environment is ready.

1.1 — Installation Guide

IDTrack is a Python package. The easiest way to avoid dependency conflicts is to use a fresh environment.

You have two common workflows:

Conda/Mamba environment (recommended if you already use conda)
pip + venv (recommended if you prefer plain Python tooling)

Either option is fine. Pick the one that matches how your lab usually manages Python.

Step 1 — Create an isolated environment

Option A: conda/mamba

Create and activate a clean environment (example uses Python 3.11):

mamba create -n idtrack python=3.11 -y
mamba activate idtrack

Tip: If you are on Apple Silicon and see HDF5/h5py issues later, installing hdf5 via conda often fixes it:

mamba install -n idtrack hdf5 -y

Option B: venv

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip

Step 2 — Install IDTrack

If you are installing from PyPI:

pip install idtrack

If you are working from a cloned repository (developer install):

pip install -e .

Expected result: import idtrack works in Python, and idtrack.__version__ prints a version string.

Step 3 — Quick environment report (safe to run)

This cell prints your Python version and tries to import IDTrack.

Expected result: If installation succeeded, you will see an IDTrack version. If it failed, you will see a helpful error message (and the notebook continues).

import platform
import sys
from pathlib import Path

print('Python:', sys.version.split()[0])
print('Executable:', sys.executable)
print('Platform:', platform.platform())

try:
    import idtrack

    print('idtrack version:', getattr(idtrack, '__version__', 'unknown'))
    print('idtrack package path:', Path(idtrack.__file__).resolve())
    IDTRACK_OK = True
except Exception as e:
    print('idtrack import failed ->', repr(e))
    print('Fix: confirm you activated the intended environment, then re-run: pip install idtrack')
    IDTRACK_OK = False

Python: 3.11.12
Executable: /Users/kemalinecik/tools/apps/mamba/envs/idtrack_dev_env/bin/python
Platform: macOS-15.7.2-arm64-arm-64bit
idtrack version: 0.0.5
idtrack package path: /Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/__init__.py

Step 4 — Choose your local repository directory (`IDTRACK_LOCAL_REPO`)

IDTrack stores cached downloads, graph snapshots, and your external YAML files in one place: your local repository directory.

You can set it in your shell:

export IDTRACK_LOCAL_REPO=/path/to/idtrack_cache

In notebooks, many tutorials fall back to ./idtrack_cache if IDTRACK_LOCAL_REPO is not set.

Tip: In a real project, put this folder somewhere stable (not a temporary directory) so you reuse caches across sessions.

Step 5 — Verify local repository read/write (safe to run)

This cell creates the directory (if needed) and writes a tiny test file.

Expected result: You should see OK and the resolved path.

import os
from pathlib import Path

local_repo = Path(os.environ.get('IDTRACK_LOCAL_REPO', './idtrack_cache')).resolve()
local_repo.mkdir(parents=True, exist_ok=True)

test_file = local_repo / '_idtrack_write_test.txt'
test_file.write_text('ok', encoding='utf-8')

print('Local repository:', local_repo)
print('Write test:', 'OK' if test_file.exists() else 'FAILED')

Local repository: /Users/kemalinecik/git_nosync/master_idtrack/idtrack/docs/_notebooks/idtrack_cache
Write test: OK

Step 6 — Network sanity checks (optional, but recommended)

First-time graph builds need network access to Ensembl services. This cell performs non-destructive checks:

Can we reach the Ensembl REST API?
(If IDTrack is installed) what MySQL host does IDTrack expect?

Note: Some institutions block outbound MySQL ports. IDTrack can still work via the HTTPS/FTP MySQL dumps when MySQL is unreachable (slower but functional).

import socket

try:
    import requests

    try:
        r = requests.get('https://rest.ensembl.org/info/ping', headers={'Content-Type': 'application/json'}, timeout=15)
        print('Ensembl REST:', r.status_code, r.text.strip()[:80])
    except Exception as e:
        print('Ensembl REST check failed ->', repr(e))
except Exception as e:
    print('requests not available ->', repr(e))

if IDTRACK_OK:
    from idtrack._db import DB

    host = DB.mysql_host
    ports = [3306, 5306, 3337]
    for port in ports:
        try:
            with socket.create_connection((host, port), timeout=5):
                print(f'Ensembl MySQL: OK ({host}:{port})')
        except OSError as e:
            print(f'Ensembl MySQL: not reachable ({host}:{port}) -> {e.__class__.__name__}')
else:
    print('Skipping MySQL check (IDTrack not imported).')

Ensembl REST: 200 {"ping":1}
Ensembl MySQL: OK (ensembldb.ensembl.org:3306)
Ensembl MySQL: OK (ensembldb.ensembl.org:5306)
Ensembl MySQL: OK (ensembldb.ensembl.org:3337)

What’s next?

Part 0 (concepts): 00_idtrack_overview.ipynb
Part 2 (external database configuration): 02_prepare_new_external_yaml.ipynb
Part 3 (graph builds): 03_initialization_graph.ipynb

Expected milestone after Part 3: you have a cached graph snapshot on disk, and conversions become fast and reproducible.

Troubleshooting (common issues)

``ImportError: … h5py …`` (often on macOS): install hdf5 via conda, then reinstall h5py.
REST works but MySQL ports fail: this is common; IDTrack will fall back to HTTPS/FTP dumps. If you want live-MySQL speed, you may need outbound ports 3306/5306 (and 3337 only for the human GRCh37 archive).
Permission errors in the cache directory: choose a different IDTRACK_LOCAL_REPO you can write to.
Slow first run: normal. The first build populates caches; later runs reuse them.