Part 6 — Cross-Species Workflows: Humanization

Last updated: 2026-01-08

This tutorial shows a practical humanization workflow: mapping mouse/pig genes into a human gene space so you can run human-centric downstream analyses (pathways, marker lists, integration, annotation).

Learning objectives

Understand what humanization is (and when it is appropriate).
Run a step-by-step mouse → human and pig → human mapping.
Validate results and handle 1→n orthology ambiguity explicitly.
Prepare outputs in a tidy, analysis-friendly format for comparative workflows.

Warning: Orthology is not always one-to-one. This notebook focuses on making ambiguity visible and manageable.

6.1 — Humanization workflow (what it is, and when to use it)

This notebook shows a reproducible, step-by-step pipeline:

Within-species cleanup (IDTrack)
- Map mouse/pig identifiers to consistent Ensembl gene IDs within that species.
Cross-species mapping (orthologs)
- Map mouse/pig Ensembl gene IDs to human ortholog Ensembl gene IDs.
Human naming (optional, IDTrack)
- Convert human Ensembl gene IDs into HGNC symbols (or other human namespaces).

This notebook does not decide the ‘correct’ ortholog for you in complex families — it shows how to surface the candidates and (optionally) score them with sequence-based heuristics.

6.1.1 Pre-requisites

6.1.1.1 IDTrack graphs

You should have built graphs for the organisms you use:

mus_musculus and/or sus_scrofa
homo_sapiens

6.1.1.2 Optional dependencies for ortholog utilities

Ortholog utilities require optional packages. Install one of:

pip install idtrack[ortholog]
or pip install idtrack[all-external]

If you don’t install these extras, the ortholog steps will raise a helpful error.

# Check optional dependency status (ortholog utilities require gget + biopython)
from __future__ import annotations

from idtrack import _external_mappers

dep_status = _external_mappers.check_optional_dependencies(warn=True)
ORTHOLOG_OK = dep_status.get('gget', False) and dep_status.get('biopython', False)

print('Optional dependency status:', dep_status)
print('Ortholog utilities available:', ORTHOLOG_OK)

6.1.2 Step-by-step: mouse → human

6.1.2.1 Convert a mouse identifier to a mouse Ensembl gene ID (IDTrack)

Start with whatever you have (often an MGI symbol). Convert to a base Ensembl gene ID in mouse.

import os
from pathlib import Path
import idtrack

LOCAL_REPOSITORY = Path(os.environ.get('IDTRACK_LOCAL_REPO', './idtrack_cache')).resolve()
LOCAL_REPOSITORY.mkdir(parents=True, exist_ok=True)

api_mouse = idtrack.API(local_repository=str(LOCAL_REPOSITORY))
api_mouse.configure_logger()

mouse_name, mouse_latest = api_mouse.resolve_organism('mouse')
api_mouse.build_graph(organism_name=mouse_name, snapshot_release=mouse_latest)

mouse_query = 'Trp53'  # example MGI symbol; replace with your gene
mouse_to_ensembl = api_mouse.convert_identifier(
    mouse_query,
    to_release=mouse_latest,
    final_database='base_ensembl_gene',
)
mouse_to_ensembl

Take one of the returned target_id entries as your mouse Ensembl gene ID. If you get multiple candidates, you are in a 1→n case — you may need to decide how to handle it.

6.1.2.2 Find human ortholog(s) (ortholog utilities)

We use Bgee orthologs via gget (optional dependency).

human_ensembl_gene_ids = []

if not mouse_to_ensembl.get('target_id'):
    print('No mouse Ensembl gene ID found; check your input ID and mouse graph/YAML.')
elif not ORTHOLOG_OK:
    print('Ortholog utilities are not available (install extras: `pip install idtrack[ortholog]`).')
else:
    from idtrack._external_mappers import get_ortholog_table, get_ortholog_ids_for_species

    mouse_ensembl_gene_id = mouse_to_ensembl['target_id'][0]  # choose one
    ortholog_df = get_ortholog_table(mouse_ensembl_gene_id, verbose=True)

    human_ensembl_gene_ids = sorted(get_ortholog_ids_for_species(ortholog_df, target_species='human'))

human_ensembl_gene_ids

6.1.2.3 Convert human Ensembl IDs into HGNC symbols (IDTrack)

Now we switch to the human graph and convert the human Ensembl IDs into HGNC symbols (optional but common).

if not human_ensembl_gene_ids:
    print('No human ortholog IDs available; skipping human conversion step.')
else:
    api_human = idtrack.API(local_repository=str(LOCAL_REPOSITORY))
    api_human.configure_logger()

    human_name, human_latest = api_human.resolve_organism('human')
    api_human.build_graph(organism_name=human_name, snapshot_release=human_latest)

    # Convert all candidate orthologs (if many-to-many, you will see it here)
    human_results = api_human.convert_identifier_multiple(
        list(human_ensembl_gene_ids),
        to_release=human_latest,
        final_database='HGNC Symbol',
    )
    human_results

6.1.3 Step-by-step: pig → human (same pattern)

Replace api_mouse with a pig API instance and start from your pig identifiers. Many pig pipelines start from Ensembl IDs or Entrez IDs; adjust final_database accordingly.

api_pig = idtrack.API(local_repository=str(LOCAL_REPOSITORY))
api_pig.configure_logger()

pig_name, pig_latest = api_pig.resolve_organism('pig')
api_pig.build_graph(organism_name=pig_name, snapshot_release=pig_latest)

pig_query = 'TP53'  # example; replace with your pig gene symbol or ID
pig_to_ensembl = api_pig.convert_identifier(
    pig_query,
    to_release=pig_latest,
    final_database='base_ensembl_gene',
)
pig_to_ensembl

From here, reuse the same ortholog + human conversion steps as in the mouse section.

6.1.4 Advanced: choose among multiple ortholog candidates

When you have multiple orthologs, IDTrack can optionally compute additional features for ranking. This is for advanced use and requires extra dependencies.

# Example (advanced): compute sequence-based alignment features for each ortholog candidate
# from idtrack._external_mappers import align_ortholog_pair_with_features
#
# features = align_ortholog_pair_with_features(mouse_ensembl_gene_id, target_species='human', verbose=True)
# features

6.1.5 Practical cautions (please read)

Orthology is context-dependent: paralogs, gene family expansions, and annotation differences matter.
Do not silently pick one in many-to-many cases without recording the rule you used.
Record provenance: snapshot releases + YAML configs + ortholog source/version.

6.2 — Comparative Analysis Preparation

Once you have humanized identifiers, you can prepare a comparative workflow that is reproducible and auditable.

Recommended preparation steps:

Within each species: harmonize identifiers into stable Ensembl gene IDs at a fixed snapshot boundary.
Across species: map to human orthologs, but keep ambiguity visible (store 1→n mappings as lists).
Record provenance: snapshot boundaries, assemblies, and the orthology source/method.
Define a policy for ambiguous cases: drop / keep-all / choose-best (and justify it).

A practical output format is a tidy table with one row per input gene and explicit columns for provenance. The next cell shows a minimal schema you can reuse.

# Example: a tidy, audit-friendly mapping table schema

import pandas as pd

mapping_table = pd.DataFrame(
    [
        {
            'source_species': 'mus_musculus',
            'source_namespace': 'MGI Symbol',
            'source_id': 'Trp53',
            'human_ensembl_gene_id': 'ENSG00000141510',
            'human_hgnc_symbol': 'TP53',
            'orthology_candidates': ['ENSG00000141510'],
            'snapshot_release_human': None,
            'snapshot_release_source': None,
            'notes': 'Example row; fill with your real results.'
        },
        {
            'source_species': 'sus_scrofa',
            'source_namespace': 'Ensembl Gene ID',
            'source_id': 'ENSSSCG00000000001',
            'human_ensembl_gene_id': None,
            'human_hgnc_symbol': None,
            'orthology_candidates': [],
            'snapshot_release_human': None,
            'snapshot_release_source': None,
            'notes': 'Example 1→0 / not found; keep these rows for reporting.'
        },
    ]
)

mapping_table