{ "cells": [ { "cell_type": "markdown", "id": "6c0f0a63", "metadata": {}, "source": [ "# Part 1 — Environment Setup & Installation\n", "\n", "*Last updated:* 2026-01-08\n", "\n", "This tutorial gets you from **\"I have Python\"** to **\"IDTrack imports and has a working local cache directory\"**.\n", "\n", "**Learning objectives**\n", "- Install IDTrack in an isolated environment (conda or pip).\n", "- Verify your installation with a short, copy/pasteable checklist.\n", "- Understand and set `IDTRACK_LOCAL_REPO` (your on-disk cache + configuration folder).\n", "- Know what \"success\" looks like before you start building graphs.\n", "\n", "> **Warning:** The first real graph build (Part 3) can take time and disk space. This notebook only verifies that your environment is ready.\n" ] }, { "cell_type": "markdown", "id": "c8b9fd7c", "metadata": {}, "source": [ "## 1.1 — Installation Guide\n", "\n", "IDTrack is a Python package. The easiest way to avoid dependency conflicts is to use a **fresh environment**.\n", "\n", "You have two common workflows:\n", "\n", "1. **Conda/Mamba environment (recommended if you already use conda)**\n", "2. **pip + venv (recommended if you prefer plain Python tooling)**\n", "\n", "Either option is fine. Pick the one that matches how your lab usually manages Python.\n" ] }, { "cell_type": "markdown", "id": "be0db868", "metadata": {}, "source": [ "### Step 1 — Create an isolated environment\n", "\n", "**Option A: conda/mamba**\n", "\n", "Create and activate a clean environment (example uses Python 3.11):\n", "\n", "```bash\n", "mamba create -n idtrack python=3.11 -y\n", "mamba activate idtrack\n", "```\n", "\n", "> **Tip:** If you are on Apple Silicon and see HDF5/h5py issues later, installing `hdf5` via conda often fixes it:\n", ">\n", "> `mamba install -n idtrack hdf5 -y`\n", "\n", "**Option B: venv**\n", "\n", "```bash\n", "python -m venv .venv\n", "source .venv/bin/activate\n", "python -m pip install --upgrade pip\n", "```\n" ] }, { "cell_type": "markdown", "id": "80e11761", "metadata": {}, "source": [ "### Step 2 — Install IDTrack\n", "\n", "If you are installing from PyPI:\n", "\n", "```bash\n", "pip install idtrack\n", "```\n", "\n", "If you are working from a cloned repository (developer install):\n", "\n", "```bash\n", "pip install -e .\n", "```\n", "\n", "> **Expected result:** `import idtrack` works in Python, and `idtrack.__version__` prints a version string.\n" ] }, { "cell_type": "markdown", "id": "93b5845f", "metadata": {}, "source": [ "### Step 3 — Quick environment report (safe to run)\n", "\n", "This cell prints your Python version and tries to import IDTrack.\n", "\n", "> **Expected result:** If installation succeeded, you will see an IDTrack version. If it failed, you will see a helpful error message (and the notebook continues).\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "433d01fd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python: 3.11.12\n", "Executable: /Users/kemalinecik/tools/apps/mamba/envs/idtrack_dev_env/bin/python\n", "Platform: macOS-15.7.2-arm64-arm-64bit\n", "idtrack version: 0.0.5\n", "idtrack package path: /Users/kemalinecik/git_nosync/master_idtrack/idtrack/idtrack/__init__.py\n" ] } ], "source": [ "import platform\n", "import sys\n", "from pathlib import Path\n", "\n", "print('Python:', sys.version.split()[0])\n", "print('Executable:', sys.executable)\n", "print('Platform:', platform.platform())\n", "\n", "try:\n", " import idtrack\n", "\n", " print('idtrack version:', getattr(idtrack, '__version__', 'unknown'))\n", " print('idtrack package path:', Path(idtrack.__file__).resolve())\n", " IDTRACK_OK = True\n", "except Exception as e:\n", " print('idtrack import failed ->', repr(e))\n", " print('Fix: confirm you activated the intended environment, then re-run: pip install idtrack')\n", " IDTRACK_OK = False\n" ] }, { "cell_type": "markdown", "id": "05195db8", "metadata": {}, "source": [ "### Step 4 — Choose your local repository directory (`IDTRACK_LOCAL_REPO`)\n", "\n", "IDTrack stores **cached downloads**, **graph snapshots**, and your **external YAML** files in one place: your **local repository** directory.\n", "\n", "You can set it in your shell:\n", "\n", "```bash\n", "export IDTRACK_LOCAL_REPO=/path/to/idtrack_cache\n", "```\n", "\n", "In notebooks, many tutorials fall back to `./idtrack_cache` if `IDTRACK_LOCAL_REPO` is not set.\n", "\n", "> **Tip:** In a real project, put this folder somewhere stable (not a temporary directory) so you reuse caches across sessions.\n" ] }, { "cell_type": "markdown", "id": "584966ca", "metadata": {}, "source": [ "### Step 5 — Verify local repository read/write (safe to run)\n", "\n", "This cell creates the directory (if needed) and writes a tiny test file.\n", "\n", "> **Expected result:** You should see `OK` and the resolved path.\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "e7b74a5d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Local repository: /Users/kemalinecik/git_nosync/master_idtrack/idtrack/docs/_notebooks/idtrack_cache\n", "Write test: OK\n" ] } ], "source": [ "import os\n", "from pathlib import Path\n", "\n", "local_repo = Path(os.environ.get('IDTRACK_LOCAL_REPO', './idtrack_cache')).resolve()\n", "local_repo.mkdir(parents=True, exist_ok=True)\n", "\n", "test_file = local_repo / '_idtrack_write_test.txt'\n", "test_file.write_text('ok', encoding='utf-8')\n", "\n", "print('Local repository:', local_repo)\n", "print('Write test:', 'OK' if test_file.exists() else 'FAILED')\n" ] }, { "cell_type": "markdown", "id": "b05b266f", "metadata": {}, "source": [ "### Step 6 — Network sanity checks (optional, but recommended)\n", "\n", "First-time graph builds need network access to Ensembl services. This cell performs **non-destructive** checks:\n", "\n", "- Can we reach the Ensembl REST API?\n", "- (If IDTrack is installed) what MySQL host does IDTrack expect?\n", "\n", "> **Note:** Some institutions block outbound MySQL ports. IDTrack can still work via the HTTPS/FTP MySQL dumps when MySQL is unreachable (slower but functional).\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "81345c8f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ensembl REST: 200 {\"ping\":1}\n", "Ensembl MySQL: OK (ensembldb.ensembl.org:3306)\n", "Ensembl MySQL: OK (ensembldb.ensembl.org:5306)\n", "Ensembl MySQL: OK (ensembldb.ensembl.org:3337)\n" ] } ], "source": [ "import socket\n", "\n", "try:\n", " import requests\n", "\n", " try:\n", " r = requests.get('https://rest.ensembl.org/info/ping', headers={'Content-Type': 'application/json'}, timeout=15)\n", " print('Ensembl REST:', r.status_code, r.text.strip()[:80])\n", " except Exception as e:\n", " print('Ensembl REST check failed ->', repr(e))\n", "except Exception as e:\n", " print('requests not available ->', repr(e))\n", "\n", "if IDTRACK_OK:\n", " from idtrack._db import DB\n", "\n", " host = DB.mysql_host\n", " ports = [3306, 5306, 3337]\n", " for port in ports:\n", " try:\n", " with socket.create_connection((host, port), timeout=5):\n", " print(f'Ensembl MySQL: OK ({host}:{port})')\n", " except OSError as e:\n", " print(f'Ensembl MySQL: not reachable ({host}:{port}) -> {e.__class__.__name__}')\n", "else:\n", " print('Skipping MySQL check (IDTrack not imported).')\n" ] }, { "cell_type": "markdown", "id": "6696a762", "metadata": {}, "source": [ "### What’s next?\n", "\n", "- **Part 0** (concepts): `00_idtrack_overview.ipynb`\n", "- **Part 2** (external database configuration): `02_prepare_new_external_yaml.ipynb`\n", "- **Part 3** (graph builds): `03_initialization_graph.ipynb`\n", "\n", "> **Expected milestone after Part 3:** you have a cached graph snapshot on disk, and conversions become fast and reproducible.\n" ] }, { "cell_type": "markdown", "id": "657f0432", "metadata": {}, "source": [ "### Troubleshooting (common issues)\n", "\n", "- **`ImportError: ... h5py ...` (often on macOS):** install `hdf5` via conda, then reinstall `h5py`.\n", "- **REST works but MySQL ports fail:** this is common; IDTrack will fall back to HTTPS/FTP dumps. If you want live-MySQL speed, you may need outbound ports `3306/5306` (and `3337` only for the human GRCh37 archive).\n", "- **Permission errors in the cache directory:** choose a different `IDTRACK_LOCAL_REPO` you can write to.\n", "- **Slow first run:** normal. The first build populates caches; later runs reuse them.\n" ] } ], "metadata": { "kernelspec": { "display_name": "idtrack_dev_env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.12" } }, "nbformat": 4, "nbformat_minor": 5 }