Part 6 — Cross-Species Workflows: Humanization

Last updated: 2026-01-08

This tutorial shows a practical humanization workflow: mapping mouse/pig genes into a human gene space so you can run human-centric downstream analyses (pathways, marker lists, integration, annotation).

Learning objectives

  • Understand what humanization is (and when it is appropriate).

  • Run a step-by-step mouse → human and pig → human mapping.

  • Validate results and handle 1→n orthology ambiguity explicitly.

  • Prepare outputs in a tidy, analysis-friendly format for comparative workflows.

Warning: Orthology is not always one-to-one. This notebook focuses on making ambiguity visible and manageable.

6.1 — Humanization workflow (what it is, and when to use it)

This notebook shows a reproducible, step-by-step pipeline:

  1. Within-species cleanup (IDTrack)

    • Map mouse/pig identifiers to consistent Ensembl gene IDs within that species.

  2. Cross-species mapping (orthologs)

    • Map mouse/pig Ensembl gene IDs to human ortholog Ensembl gene IDs.

  3. Human naming (optional, IDTrack)

    • Convert human Ensembl gene IDs into HGNC symbols (or other human namespaces).

This notebook does not decide the ‘correct’ ortholog for you in complex families — it shows how to surface the candidates and (optionally) score them with sequence-based heuristics.

6.1.1 Pre-requisites

6.1.1.1 IDTrack graphs

You should have built graphs for the organisms you use:

  • mus_musculus and/or sus_scrofa

  • homo_sapiens

6.1.1.2 Optional dependencies for ortholog utilities

Ortholog utilities require optional packages. Install one of:

  • pip install idtrack[ortholog]

  • or pip install idtrack[all-external]

If you don’t install these extras, the ortholog steps will raise a helpful error.

 
# Check optional dependency status (ortholog utilities require gget + biopython)
from __future__ import annotations

from idtrack import _external_mappers

dep_status = _external_mappers.check_optional_dependencies(warn=True)
ORTHOLOG_OK = dep_status.get('gget', False) and dep_status.get('biopython', False)

print('Optional dependency status:', dep_status)
print('Ortholog utilities available:', ORTHOLOG_OK)

6.1.2 Step-by-step: mouse → human

6.1.2.1 Convert a mouse identifier to a mouse Ensembl gene ID (IDTrack)

Start with whatever you have (often an MGI symbol). Convert to a base Ensembl gene ID in mouse.

 
import os
from pathlib import Path
import idtrack

LOCAL_REPOSITORY = Path(os.environ.get('IDTRACK_LOCAL_REPO', './idtrack_cache')).resolve()
LOCAL_REPOSITORY.mkdir(parents=True, exist_ok=True)

api_mouse = idtrack.API(local_repository=str(LOCAL_REPOSITORY))
api_mouse.configure_logger()

mouse_name, mouse_latest = api_mouse.resolve_organism('mouse')
api_mouse.build_graph(organism_name=mouse_name, snapshot_release=mouse_latest)

mouse_query = 'Trp53'  # example MGI symbol; replace with your gene
mouse_to_ensembl = api_mouse.convert_identifier(
    mouse_query,
    to_release=mouse_latest,
    final_database='base_ensembl_gene',
)
mouse_to_ensembl

Take one of the returned target_id entries as your mouse Ensembl gene ID. If you get multiple candidates, you are in a 1→n case — you may need to decide how to handle it.

6.1.2.2 Find human ortholog(s) (ortholog utilities)

We use Bgee orthologs via gget (optional dependency).

 
human_ensembl_gene_ids = []

if not mouse_to_ensembl.get('target_id'):
    print('No mouse Ensembl gene ID found; check your input ID and mouse graph/YAML.')
elif not ORTHOLOG_OK:
    print('Ortholog utilities are not available (install extras: `pip install idtrack[ortholog]`).')
else:
    from idtrack._external_mappers import get_ortholog_table, get_ortholog_ids_for_species

    mouse_ensembl_gene_id = mouse_to_ensembl['target_id'][0]  # choose one
    ortholog_df = get_ortholog_table(mouse_ensembl_gene_id, verbose=True)

    human_ensembl_gene_ids = sorted(get_ortholog_ids_for_species(ortholog_df, target_species='human'))

human_ensembl_gene_ids

6.1.2.3 Convert human Ensembl IDs into HGNC symbols (IDTrack)

Now we switch to the human graph and convert the human Ensembl IDs into HGNC symbols (optional but common).

 
if not human_ensembl_gene_ids:
    print('No human ortholog IDs available; skipping human conversion step.')
else:
    api_human = idtrack.API(local_repository=str(LOCAL_REPOSITORY))
    api_human.configure_logger()

    human_name, human_latest = api_human.resolve_organism('human')
    api_human.build_graph(organism_name=human_name, snapshot_release=human_latest)

    # Convert all candidate orthologs (if many-to-many, you will see it here)
    human_results = api_human.convert_identifier_multiple(
        list(human_ensembl_gene_ids),
        to_release=human_latest,
        final_database='HGNC Symbol',
    )
    human_results

6.1.3 Step-by-step: pig → human (same pattern)

Replace api_mouse with a pig API instance and start from your pig identifiers. Many pig pipelines start from Ensembl IDs or Entrez IDs; adjust final_database accordingly.

 
api_pig = idtrack.API(local_repository=str(LOCAL_REPOSITORY))
api_pig.configure_logger()

pig_name, pig_latest = api_pig.resolve_organism('pig')
api_pig.build_graph(organism_name=pig_name, snapshot_release=pig_latest)

pig_query = 'TP53'  # example; replace with your pig gene symbol or ID
pig_to_ensembl = api_pig.convert_identifier(
    pig_query,
    to_release=pig_latest,
    final_database='base_ensembl_gene',
)
pig_to_ensembl

From here, reuse the same ortholog + human conversion steps as in the mouse section.

6.1.4 Advanced: choose among multiple ortholog candidates

When you have multiple orthologs, IDTrack can optionally compute additional features for ranking. This is for advanced use and requires extra dependencies.

 
# Example (advanced): compute sequence-based alignment features for each ortholog candidate
# from idtrack._external_mappers import align_ortholog_pair_with_features
#
# features = align_ortholog_pair_with_features(mouse_ensembl_gene_id, target_species='human', verbose=True)
# features

6.1.5 Practical cautions (please read)

  1. Orthology is context-dependent: paralogs, gene family expansions, and annotation differences matter.

  2. Do not silently pick one in many-to-many cases without recording the rule you used.

  3. Record provenance: snapshot releases + YAML configs + ortholog source/version.

6.2 — Comparative Analysis Preparation

Once you have humanized identifiers, you can prepare a comparative workflow that is reproducible and auditable.

Recommended preparation steps:

  1. Within each species: harmonize identifiers into stable Ensembl gene IDs at a fixed snapshot boundary.

  2. Across species: map to human orthologs, but keep ambiguity visible (store 1→n mappings as lists).

  3. Record provenance: snapshot boundaries, assemblies, and the orthology source/method.

  4. Define a policy for ambiguous cases: drop / keep-all / choose-best (and justify it).

A practical output format is a tidy table with one row per input gene and explicit columns for provenance. The next cell shows a minimal schema you can reuse.

 
# Example: a tidy, audit-friendly mapping table schema

import pandas as pd

mapping_table = pd.DataFrame(
    [
        {
            'source_species': 'mus_musculus',
            'source_namespace': 'MGI Symbol',
            'source_id': 'Trp53',
            'human_ensembl_gene_id': 'ENSG00000141510',
            'human_hgnc_symbol': 'TP53',
            'orthology_candidates': ['ENSG00000141510'],
            'snapshot_release_human': None,
            'snapshot_release_source': None,
            'notes': 'Example row; fill with your real results.'
        },
        {
            'source_species': 'sus_scrofa',
            'source_namespace': 'Ensembl Gene ID',
            'source_id': 'ENSSSCG00000000001',
            'human_ensembl_gene_id': None,
            'human_hgnc_symbol': None,
            'orthology_candidates': [],
            'snapshot_release_human': None,
            'snapshot_release_source': None,
            'notes': 'Example 1→0 / not found; keep these rows for reporting.'
        },
    ]
)

mapping_table