SCEPTR

What is your transcriptome doing, and where is it investing its resources?
Tumour biopsies. Clinical parasite isolates. Drug-resistant pathogens. Environmental metatranscriptomes. One sample, no replicates, no control required. SCEPTR tells you what your organism is spending its transcriptional budget on.

# Method only (pip install)
pip install sceptr-profiling
sceptr profile --expression data.tsv --category-set bacteria -o results/

# Full framework (raw reads to report)
# Requires Nextflow and Docker:
curl -s https://get.nextflow.io | bash && sudo mv nextflow /usr/local/bin/
sudo apt-get install -y docker.io && sudo usermod -aG docker $USER

git clone https://github.com/jsmccabe1/SCEPTR.git && cd SCEPTR
bash setup_databases.sh && docker build -t sceptr:1.0.0 .
./run_sceptr.sh
14
Category Sets
4
Kingdoms Validated
0
Replicates Required
What SCEPTR Tells You

Not just which pathways are enriched, but how your cell allocates its entire transcriptional budget.

Which Programmes Dominate and Where

Continuous enrichment profiles show each functional category's fold enrichment across the full expression gradient. Translation at 9x in the top 50 genes looks fundamentally different from immune signalling distributed across hundreds of moderately expressed genes. SCEPTR classifies these patterns automatically as apex-concentrated, distributed, or flat, and tests each against a permutation null.

What the Cell Is Spending Its Budget On

The Functional Allocation Profile shows what proportion of the expression apex each programme commands. In a clinical P. falciparum isolate, Translation takes 41% of the apex budget at 3x its background share, with the validated drug target HGXPRT as the #2 most expressed gene. A category can be highly enriched yet occupy a small budget share if it is a small category. Both perspectives matter; SCEPTR provides both.

Whether the Patterns Are Real

Every enrichment profile is tested against a permutation-based null (1,000 shuffles of gene-category assignments, same smoothing applied to both). The report shows 95% null envelopes so you can see exactly where each category departs from random expectation. No arbitrary thresholds, no guesswork.

Which Genes Drive the Enrichment

Each category in the report lists the specific genes contributing to the enrichment, ranked by expression. For a parasitologist, that means seeing GAPDH and HGXPRT at the top of the Translation apex. For a virologist, ISG15 and MX1 driving the Interferon response. Expandable gene tables connect enrichment back to concrete gene biology.

How Architecture Shifts Between Conditions

Compare enrichment profiles between two conditions (mock vs infected, treated vs control) using gene-label permutation testing. SCEPTR detects not just magnitude changes but shape transitions: a programme reorganised from distributed expression into apex concentration upon infection represents a qualitatively different kind of response than a simple increase in mean expression.

Works From a Single Sample

Because each tier is compared to the sample's own background, SCEPTR works from one sample with no replicates, no control, and no comparative data. Clinical isolates, irreplaceable field samples, pilot experiments, the first transcriptome of a non-model organism. If you have expression data, SCEPTR tells you what your transcriptome is investing in.

Category Sets

Organism-specific functional categories validated against Gene Ontology, Swiss-Prot, and GO slim sets.

general
Universal categories for any organism
human_host
33 detailed human host pathways
vertebrate_host
17 broad categories for mouse, fish, birds
cancer
Hallmarks of cancer, EMT, immune evasion
bacteria
14 broad prokaryotic categories
bacteria_gram_negative
LPS, T3SS/T6SS, porins, siderophores
bacteria_gram_positive
Teichoic acids, sortase, sporulation
parasite_protozoan
Plasmodium, Toxoplasma, Leishmania
helminth_nematode
Cuticle, dauer, neuromuscular, ES products
helminth_platyhelminth
Tegument, neoblasts, lifecycle, egg biology
fungi
Cell wall, secondary metabolism, sporulation
plant
Photosynthesis, cell wall, hormones
protist_dinoflagellate
Symbiodinium, HAB species
insect
Cuticle, metamorphosis, chemosensation

All keywords validated through multi-layer provenance audit: 68.9% backed by Gene Ontology or Swiss-Prot controlled vocabularies. Bring your own categories with --category_set custom.

Who SCEPTR Is For

Built for researchers who need to understand what a transcriptome is doing, especially when standard approaches fall short.

Clinical & Parasitology Researchers
Single clinical isolates where each infection is unique and irreproducible. SCEPTR characterises functional investment from one sample - no replicates, no control condition needed. Expression apex genes include known drug targets like HGXPRT in P. falciparum.
parasite_protozoan
Cancer & Immunology Researchers
Tumour biopsies, infected cell lines, patient-derived samples. SCEPTR detects shape transitions: interferon genes reorganised from distributed expression to apex concentration upon SARS-CoV-2 infection, a qualitative change invisible to magnitude-based methods.
cancer
human_host
vertebrate_host
Microbiology & AMR Researchers
Bacterial pathogens under different growth conditions, stress responses, antibiotic exposure. Gram-specific category sets capture LPS biogenesis, type III/VI secretion, sporulation, and antimicrobial resistance programmes with organism-appropriate keywords and GO terms.
bacteria_gram_negative
bacteria_gram_positive
Environmental & Marine Biologists
The 650+ MMETSP marine eukaryote transcriptomes. Dinoflagellate symbionts under thermal stress. De novo assemblies with sparse annotation. SCEPTR works at annotation rates as low as 17% and provides meaningful functional characterisation where curated pathway databases are unavailable.
protist_dinoflagellate
general
Two Ways to Use SCEPTR

Use the statistical method on its own, or let the framework handle everything from raw reads.

The Statistical Method

Bring any annotated expression table. Skip all preprocessing and go straight to enrichment profiling.

Your ranked gene list
Any expression-ranked genes with functional annotation
Continuous enrichment profiling
EC(k) computed at every gene rank, kernel-smoothed
Permutation significance testing
Per-category p-values from whole-profile permutation
DKL + functional allocation
Specialisation gradient and compositional budget analysis

The Automated Framework

Raw reads to interactive report in a single command. Nextflow + Docker for reproducibility.

QC & expression quantification
FastQC, MultiQC, Salmon pseudo-alignment
Protein prediction & annotation
TransDecoder / CDS translation, UniProt DIAMOND, GO terms
Contamination filtering
DIAMOND-based screening with optional host removal
Full SCEPTR analysis + report
Everything from the method track, packaged in an interactive HTML report
Citation

McCabe, J.S. and Janouškovec, J. (2026). SCEPTR: continuous enrichment profiling reveals functional architecture across the expression gradient.