SCEPTR - Continuous Enrichment Profiling

What is your transcriptome doing, and where is it investing its resources?
Tumour biopsies. Clinical parasite isolates. Drug-resistant pathogens. Environmental metatranscriptomes. One sample, no replicates, no control required. SCEPTR tells you what your organism is spending its transcriptional budget on.

View on GitHub Quick Start How It Works

# Method only (pip install)
pip install sceptr-profiling
sceptr profile --expression data.tsv --category-set bacteria -o results/

# Full framework (raw reads to report)
# Requires Nextflow and Docker:
curl -s https://get.nextflow.io | bash && sudo mv nextflow /usr/local/bin/
sudo apt-get install -y docker.io && sudo usermod -aG docker $USER

git clone https://github.com/jsmccabe1/SCEPTR.git && cd SCEPTR
bash setup_databases.sh && docker build -t sceptr:1.0.0 .
./run_sceptr.sh

What SCEPTR Tells You

Not just which pathways are enriched, but how your cell allocates its entire transcriptional budget.

▲

Which Programmes Dominate and Where

Continuous enrichment profiles show each functional category's fold enrichment across the full expression gradient. A programme dominating the top 50 genes looks fundamentally different from one distributed across hundreds of moderately expressed genes. SCEPTR classifies these patterns automatically as apex-concentrated, distributed, or flat, and tests each against a permutation null.

○

What the Cell Is Spending Its Budget On

The Functional Allocation Profile shows what proportion of the expression apex each programme commands. Fold enrichment tells you which categories are disproportionately represented; budget share tells you where transcriptional resources are actually being spent. A small category can be highly enriched yet command a tiny share of the budget. Both perspectives matter; SCEPTR provides both.

✦

Whether the Patterns Are Real

Every enrichment profile is tested against a conditional permutation null that preserves each category's coarse expression composition. A significant call means the category shows finer rank structure than its own expression baseline would predict, not that the pathway is biologically activated. Single-sample SCEPTR is descriptive: it tells you where pathways sit on the expression hierarchy, not which ones are responding. For condition-level inference, use the shape-transition framework with replicates.

⚙

Which Genes Drive the Enrichment

Each category in the report lists the specific genes contributing to the enrichment, ranked by expression. Expandable gene tables connect every enrichment signal back to the concrete transcripts driving it, so you can follow a striking pattern all the way down to individual genes.

⇄

How Architecture Shifts Between Conditions

Compare enrichment profiles between two conditions (mock vs infected, treated vs control) using gene-label permutation testing. SCEPTR detects not just magnitude changes but shape transitions: a programme reorganised from distributed expression into apex concentration upon infection represents a qualitatively different kind of response than a simple increase in mean expression.

☉

Works From a Single Sample

Because each tier is compared to the sample's own background, SCEPTR works from one sample with no replicates, no control, and no comparative data. Clinical isolates, irreplaceable field samples, pilot experiments, the first transcriptome of a non-model organism. If you have expression data, SCEPTR tells you what your transcriptome is investing in.

Who SCEPTR Is For

Built for researchers who need to understand what a transcriptome is doing, especially when standard approaches fall short.

Clinical & Parasitology Researchers

Single clinical isolates where each infection is unique and irreproducible. SCEPTR characterises functional investment from one sample, no replicates or control condition needed, and surfaces the individual genes driving each signal so you can connect enrichment to candidate drug targets.

parasite_protozoan

Cancer & Immunology Researchers

Tumour biopsies, infected cell lines, patient-derived samples. SCEPTR detects shape transitions: pathways reorganising from distributed expression into apex concentration between conditions, a qualitative change that magnitude-based methods miss.

cancer

human_host

vertebrate_host

Microbiology & AMR Researchers

Bacterial pathogens under different growth conditions, stress responses, antibiotic exposure. Gram-specific category sets capture LPS biogenesis, type III/VI secretion, sporulation, and antimicrobial resistance programmes with organism-appropriate keywords and GO terms.

bacteria_gram_negative

bacteria_gram_positive

Environmental & Marine Biologists

The 650+ MMETSP marine eukaryote transcriptomes. Dinoflagellate symbionts under thermal stress. De novo assemblies with sparse annotation. SCEPTR provides meaningful functional characterisation even when annotation coverage is low, and ships with an optional InterProScan step for Pfam-based annotation where UniProt coverage is too thin.

protist_dinoflagellate

general

Category Sets

Organism-specific functional categories validated against Gene Ontology, Swiss-Prot, and GO slim sets.

general

Universal categories for any organism

human_host

33 detailed human host pathways

vertebrate_host

17 broad categories for mouse, fish, birds

cancer

Hallmarks of cancer, EMT, immune evasion

bacteria

14 broad prokaryotic categories

bacteria_gram_negative

LPS, T3SS/T6SS, porins, siderophores

bacteria_gram_positive

Teichoic acids, sortase, sporulation

parasite_protozoan

Plasmodium, Toxoplasma, Leishmania

helminth_nematode

Cuticle, dauer, neuromuscular, ES products

helminth_platyhelminth

Tegument, neoblasts, lifecycle, egg biology

fungi

Cell wall, secondary metabolism, sporulation

plant

Photosynthesis, cell wall, hormones

protist_dinoflagellate

Symbiodinium, HAB species

insect

Cuticle, metamorphosis, chemosensation

All keywords validated through multi-layer provenance audit: 70.3% backed by Gene Ontology or Swiss-Prot controlled vocabularies. Bring your own categories with --category_set custom.

Two Ways to Use SCEPTR

Use the statistical method on its own, or let the framework handle everything from raw reads.

▲

The Statistical Method

Bring any annotated expression table. Skip all preprocessing and go straight to enrichment profiling.

Your ranked gene list

Any expression-ranked genes with functional annotation

Continuous enrichment profiling

E_C(k) computed at every gene rank, kernel-smoothed

Permutation significance testing

Per-category p-values from whole-profile permutation

D_KL + functional allocation

Specialisation gradient and compositional budget analysis

⚙

The Automated Framework

Raw reads to interactive report in a single command. Nextflow + Docker for reproducibility.

QC & expression quantification

FastQC, MultiQC, Salmon pseudo-alignment

Protein prediction & annotation

TransDecoder / CDS translation, UniProt DIAMOND, GO terms

Contamination filtering

DIAMOND-based screening with optional host removal

Full SCEPTR analysis + report

Everything from the method track, packaged in an interactive HTML report