Pipeline:
sequence compilation across four piRNA databases →
Bowtie2 / STAR genome verification on
hg38 →
multi-database family clustering →
annotation pass (gene / TE / miRNA / SNV / RNA-editing / somatic) →
de novo cluster discovery (per-chromosome Poisson model) →
TE-aware sub-family splitting.
All scripts are open source.