Skip to content

Methods

Analysis Pipeline

Workflow

Data Collection (3 sources)
  → Preprocessing (ComBat-seq + TMM normalization + gene filtering)
    → Integration (Scanpy PCA + UMAP)
      → Spectrum Order Validation (PAGA trajectory + clinical features)
      → Tissue Deconvolution (TAPE)
      → DEG Analysis (EdgeR)
      → Pathway Enrichment (gseapy)
      → Molecular Validation (qPCR)

Inclusion/Exclusion Criteria

Three strict criteria governed sample selection:

  1. Only human skeletal muscle tissue accepted (excluding cell lines or organoids)
  2. Bulk RNA-seq performed using high-throughput techniques (no microarray or single-cell data)
  3. Raw count data preserved in original format (no transformed count format)

Data Sources

SourceSamplesAccession
GTEx803dbGaP phs000424.v8.p2
GEO291GSE115650, GSE175861, GSE184951, GSE201255, GSE202745, GSE140261
Helsinki127Local data (39 also in GSE151757)

Preprocessing

Batch Effect Adjustment

  • Method: ComBat-seq (negative-binomial regression-based batch adjustment)
  • R package: sva
  • Batch variables: Sequencing platform — 930 mRNA (polyA) vs. 291 total RNA (ribosomal)

Normalization

  • Method: TMM (Trimmed Mean of M-values) via conorm
  • Better suited for between-sample comparisons than TPM/FPKM

Gene Filtering

  • Initial gene sets: 16,953 candidate genes across all 1,221 samples
  • Filtering rule: muscle-specific gene counts must be > 0 in all samples
  • Final: 9,231 genes selected

Integration & Visualization

  • Tool: Scanpy (Python)
  • Pipeline: PCA → UMAP (single-cell-like analysis applied to bulk data)
  • Key insight: Similar expression patterns cluster together; myopathy muscles show a ribbon-like distribution rather than compact clusters

Spectrum Order Validation

In-Silico Validation

  • Pseudo-time analysis (PAGA) used to predict muscle deterioration transformation from healthy to myopathy
  • Trajectory prediction algorithms confirmed the severity spectrum order

Clinical Feature Mapping

Clinical features mapped onto UMAP to validate the spectrum order:

MyopathyClinical FeatureJonckheere Testp-value
CDMCTG repeat expansionJT = 1811.07e-03
LGMD R12Mercuri score (cMRI)JT = 4592.09e-06
LGMD R1210-meter walk testJT = 3690.011
LGMD R126-minute walk testJT = 1640.014
FSHDFat fraction (qMRI)JT = 1390.193
FSHDPathology scoreJT = 1470.36
FSHDClinical severity scoreJT = 1250.753

Differential Expression Analysis

  • Tool: EdgeR (R)
  • Reference: Genuinely healthy controls from GTEx (n = 234, accident + unexpected death)
  • Thresholds: |log2FC| > 0.5 and FDR < 0.05
  • Results: For general myopathy (n = 292) vs. genuinely healthy (n = 234): 200 up-regulated and 568 down-regulated genes

Cell-Type Deconvolution (TAPE)

  • Tool: TAPE (deep-learning-based autoencoder)
  • Reference datasets:
    • Tabula Sapiens (30,746 cells)
    • GSE143704 (22,058 cells)
  • Comparison: Five control groups vs. myopathy groups
  • Findings: Fewer vasculature cells, more adipocytes and COL1A+ fibroblasts in myopathy

Pathway Enrichment

  • Tool: gseapy (Python)
  • Databases: Human Phenotype Ontology, CellMarker Augmented, KEGG, GO, Reactome, WikiPathway
  • Key pathways: Muscle contraction, lipoatrophy, myotube cell involvement, FATZ binding

qPCR Validation

  • Tissue: Lower leg muscle biopsies from Helsinki (13 patients + 6 controls)
  • Method: RT-qPCR with SYBR Green, normalized to 18S
  • Validated genes:
    • General myopathy: MGST1, AOX1, FASN, PRKCD
    • IBM: CD163
    • Titinopathy: CYP4B1

Software Versions

ToolVersionPurpose
Python3.8.1Main analysis
R4.2.2DEG and statistics
ScanpyIntegration & UMAP
EdgeRDifferential expression
ComBat-seqBatch correction
TAPECell-type deconvolution
gseapyPathway enrichment
DescToolsJonckheere trend test
conormTMM normalization
最近更新