SCEPTR - Continuous Enrichment Profiling

What is your transcriptome doing, and where is it investing its resources?
Tumour biopsies. Clinical parasite isolates. Drug-resistant pathogens. Environmental metatranscriptomes. One sample, no replicates, no control required. SCEPTR tells you what your organism is spending its transcriptional budget on.

View on GitHub Quick Start How It Works

# Method only (pip install)
pip install sceptr-profiling
sceptr profile --expression data.tsv --category-set bacteria -o results/

# Full framework (raw reads to report)
# Requires Nextflow and Docker:
curl -s https://get.nextflow.io | bash && sudo mv nextflow /usr/local/bin/
sudo apt-get install -y docker.io && sudo usermod -aG docker $USER

git clone https://github.com/jsmccabe1/SCEPTR.git && cd SCEPTR
bash setup_databases.sh && docker build -t sceptr:1.0.0 .
./run_sceptr.sh

What SCEPTR Tells You

Not just which pathways are enriched, but how your cell allocates its entire transcriptional budget.

▲

Which Programmes Dominate and Where

Continuous enrichment profiles show each functional category's fold enrichment across the full expression gradient. Translation at 9x in the top 50 genes looks fundamentally different from immune signalling distributed across hundreds of moderately expressed genes. SCEPTR classifies these patterns automatically as apex-concentrated, distributed, or flat, and tests each against a permutation null.

○

What the Cell Is Spending Its Budget On

The Functional Allocation Profile shows what proportion of the expression apex each programme commands. In a clinical P. falciparum isolate, Translation takes 41% of the apex budget at 3x its background share, with the validated drug target HGXPRT as the #2 most expressed gene. A category can be highly enriched yet occupy a small budget share if it is a small category. Both perspectives matter; SCEPTR provides both.

✦

Whether the Patterns Are Real

Every enrichment profile is tested against a permutation-based null (1,000 shuffles of gene-category assignments, same smoothing applied to both). The report shows 95% null envelopes so you can see exactly where each category departs from random expectation. No arbitrary thresholds, no guesswork.

⚙

Which Genes Drive the Enrichment

Each category in the report lists the specific genes contributing to the enrichment, ranked by expression. For a parasitologist, that means seeing GAPDH and HGXPRT at the top of the Translation apex. For a virologist, ISG15 and MX1 driving the Interferon response. Expandable gene tables connect enrichment back to concrete gene biology.

⇄

How Architecture Shifts Between Conditions

Compare enrichment profiles between two conditions (mock vs infected, treated vs control) using gene-label permutation testing. SCEPTR detects not just magnitude changes but shape transitions: a programme reorganised from distributed expression into apex concentration upon infection represents a qualitatively different kind of response than a simple increase in mean expression.

☉

Works From a Single Sample

Because each tier is compared to the sample's own background, SCEPTR works from one sample with no replicates, no control, and no comparative data. Clinical isolates, irreplaceable field samples, pilot experiments, the first transcriptome of a non-model organism. If you have expression data, SCEPTR tells you what your transcriptome is investing in.

Category Sets

Organism-specific functional categories validated against Gene Ontology, Swiss-Prot, and GO slim sets.

general

Universal categories for any organism

human_host

33 detailed human host pathways

vertebrate_host

17 broad categories for mouse, fish, birds

cancer

Hallmarks of cancer, EMT, immune evasion

bacteria

14 broad prokaryotic categories

bacteria_gram_negative

LPS, T3SS/T6SS, porins, siderophores

bacteria_gram_positive

Teichoic acids, sortase, sporulation

parasite_protozoan

Plasmodium, Toxoplasma, Leishmania

helminth_nematode

Cuticle, dauer, neuromuscular, ES products

helminth_platyhelminth

Tegument, neoblasts, lifecycle, egg biology

fungi

Cell wall, secondary metabolism, sporulation

plant

Photosynthesis, cell wall, hormones

protist_dinoflagellate

Symbiodinium, HAB species

insect

Cuticle, metamorphosis, chemosensation

All keywords validated through multi-layer provenance audit: 68.9% backed by Gene Ontology or Swiss-Prot controlled vocabularies. Bring your own categories with --category_set custom.

Who SCEPTR Is For

Built for researchers who need to understand what a transcriptome is doing, especially when standard approaches fall short.

Clinical & Parasitology Researchers

Single clinical isolates where each infection is unique and irreproducible. SCEPTR characterises functional investment from one sample - no replicates, no control condition needed. Expression apex genes include known drug targets like HGXPRT in P. falciparum.

parasite_protozoan

Cancer & Immunology Researchers

Tumour biopsies, infected cell lines, patient-derived samples. SCEPTR detects shape transitions: interferon genes reorganised from distributed expression to apex concentration upon SARS-CoV-2 infection, a qualitative change invisible to magnitude-based methods.

cancer

human_host

vertebrate_host

Microbiology & AMR Researchers

Bacterial pathogens under different growth conditions, stress responses, antibiotic exposure. Gram-specific category sets capture LPS biogenesis, type III/VI secretion, sporulation, and antimicrobial resistance programmes with organism-appropriate keywords and GO terms.

bacteria_gram_negative

bacteria_gram_positive

Environmental & Marine Biologists

The 650+ MMETSP marine eukaryote transcriptomes. Dinoflagellate symbionts under thermal stress. De novo assemblies with sparse annotation. SCEPTR works at annotation rates as low as 17% and provides meaningful functional characterisation where curated pathway databases are unavailable.

protist_dinoflagellate

general

Two Ways to Use SCEPTR

Use the statistical method on its own, or let the framework handle everything from raw reads.

▲

The Statistical Method

Bring any annotated expression table. Skip all preprocessing and go straight to enrichment profiling.

Your ranked gene list

Any expression-ranked genes with functional annotation

Continuous enrichment profiling

E_C(k) computed at every gene rank, kernel-smoothed

Permutation significance testing

Per-category p-values from whole-profile permutation

D_KL + functional allocation

Specialisation gradient and compositional budget analysis

⚙

The Automated Framework

Raw reads to interactive report in a single command. Nextflow + Docker for reproducibility.

QC & expression quantification

FastQC, MultiQC, Salmon pseudo-alignment

Protein prediction & annotation

TransDecoder / CDS translation, UniProt DIAMOND, GO terms

Contamination filtering

DIAMOND-based screening with optional host removal

Full SCEPTR analysis + report

Everything from the method track, packaged in an interactive HTML report