SCEPTR

What is your transcriptome doing, and where is it investing its resources?
Tumour biopsies. Clinical parasite isolates. Drug-resistant pathogens. Environmental metatranscriptomes. One sample, no replicates, no control required. SCEPTR tells you what your organism is spending its transcriptional budget on.

# Method only (pip install)
pip install sceptr-profiling
sceptr profile --expression data.tsv --category-set bacteria -o results/

# Full framework (raw reads to report)
# Requires Nextflow and Docker:
curl -s https://get.nextflow.io | bash && sudo mv nextflow /usr/local/bin/
sudo apt-get install -y docker.io && sudo usermod -aG docker $USER

git clone https://github.com/jsmccabe1/SCEPTR.git && cd SCEPTR
bash setup_databases.sh && docker build -t sceptr:1.0.0 .
./run_sceptr.sh
What SCEPTR Tells You

Not just which pathways are enriched, but how your cell allocates its entire transcriptional budget.

Which Programmes Dominate and Where

Continuous enrichment profiles show each functional category's fold enrichment across the full expression gradient. A programme dominating the top 50 genes looks fundamentally different from one distributed across hundreds of moderately expressed genes. SCEPTR classifies these patterns automatically as apex-concentrated, distributed, or flat, and tests each against a permutation null.

What the Cell Is Spending Its Budget On

The Functional Allocation Profile shows what proportion of the expression apex each programme commands. Fold enrichment tells you which categories are disproportionately represented; budget share tells you where transcriptional resources are actually being spent. A small category can be highly enriched yet command a tiny share of the budget. Both perspectives matter; SCEPTR provides both.

Whether the Patterns Are Real

Every enrichment profile is tested against a conditional permutation null that preserves each category's coarse expression composition. A significant call means the category shows finer rank structure than its own expression baseline would predict, not that the pathway is biologically activated. Single-sample SCEPTR is descriptive: it tells you where pathways sit on the expression hierarchy, not which ones are responding. For condition-level inference, use the shape-transition framework with replicates.

Which Genes Drive the Enrichment

Each category in the report lists the specific genes contributing to the enrichment, ranked by expression. Expandable gene tables connect every enrichment signal back to the concrete transcripts driving it, so you can follow a striking pattern all the way down to individual genes.

How Architecture Shifts Between Conditions

Compare enrichment profiles between two conditions (mock vs infected, treated vs control) using gene-label permutation testing. SCEPTR detects not just magnitude changes but shape transitions: a programme reorganised from distributed expression into apex concentration upon infection represents a qualitatively different kind of response than a simple increase in mean expression.

Works From a Single Sample

Because each tier is compared to the sample's own background, SCEPTR works from one sample with no replicates, no control, and no comparative data. Clinical isolates, irreplaceable field samples, pilot experiments, the first transcriptome of a non-model organism. If you have expression data, SCEPTR tells you what your transcriptome is investing in.

Who SCEPTR Is For

Built for researchers who need to understand what a transcriptome is doing, especially when standard approaches fall short.

Clinical & Parasitology Researchers
Single clinical isolates where each infection is unique and irreproducible. SCEPTR characterises functional investment from one sample, no replicates or control condition needed, and surfaces the individual genes driving each signal so you can connect enrichment to candidate drug targets.
parasite_protozoan
Cancer & Immunology Researchers
Tumour biopsies, infected cell lines, patient-derived samples. SCEPTR detects shape transitions: pathways reorganising from distributed expression into apex concentration between conditions, a qualitative change that magnitude-based methods miss.
cancer
human_host
vertebrate_host
Microbiology & AMR Researchers
Bacterial pathogens under different growth conditions, stress responses, antibiotic exposure. Gram-specific category sets capture LPS biogenesis, type III/VI secretion, sporulation, and antimicrobial resistance programmes with organism-appropriate keywords and GO terms.
bacteria_gram_negative
bacteria_gram_positive
Environmental & Marine Biologists
The 650+ MMETSP marine eukaryote transcriptomes. Dinoflagellate symbionts under thermal stress. De novo assemblies with sparse annotation. SCEPTR provides meaningful functional characterisation even when annotation coverage is low, and ships with an optional InterProScan step for Pfam-based annotation where UniProt coverage is too thin.
protist_dinoflagellate
general
Category Sets

Organism-specific functional categories validated against Gene Ontology, Swiss-Prot, and GO slim sets.

general
Universal categories for any organism
human_host
33 detailed human host pathways
vertebrate_host
17 broad categories for mouse, fish, birds
cancer
Hallmarks of cancer, EMT, immune evasion
bacteria
14 broad prokaryotic categories
bacteria_gram_negative
LPS, T3SS/T6SS, porins, siderophores
bacteria_gram_positive
Teichoic acids, sortase, sporulation
parasite_protozoan
Plasmodium, Toxoplasma, Leishmania
helminth_nematode
Cuticle, dauer, neuromuscular, ES products
helminth_platyhelminth
Tegument, neoblasts, lifecycle, egg biology
fungi
Cell wall, secondary metabolism, sporulation
plant
Photosynthesis, cell wall, hormones
protist_dinoflagellate
Symbiodinium, HAB species
insect
Cuticle, metamorphosis, chemosensation

All keywords validated through multi-layer provenance audit: 70.3% backed by Gene Ontology or Swiss-Prot controlled vocabularies. Bring your own categories with --category_set custom.

Two Ways to Use SCEPTR

Use the statistical method on its own, or let the framework handle everything from raw reads.

The Statistical Method

Bring any annotated expression table. Skip all preprocessing and go straight to enrichment profiling.

Your ranked gene list
Any expression-ranked genes with functional annotation
Continuous enrichment profiling
EC(k) computed at every gene rank, kernel-smoothed
Permutation significance testing
Per-category p-values from whole-profile permutation
DKL + functional allocation
Specialisation gradient and compositional budget analysis

The Automated Framework

Raw reads to interactive report in a single command. Nextflow + Docker for reproducibility.

QC & expression quantification
FastQC, MultiQC, Salmon pseudo-alignment
Protein prediction & annotation
TransDecoder / CDS translation, UniProt DIAMOND, GO terms
Contamination filtering
DIAMOND-based screening with optional host removal
Full SCEPTR analysis + report
Everything from the method track, packaged in an interactive HTML report
Citation

McCabe, J.S. and Janouškovec, J. (2026). SCEPTR: continuous enrichment profiling reveals functional architecture across the expression gradient.