Skip to content

tdhock/PeakSegFPOP-paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains the source code for our papers describing log-linear time algorithms for solving the PeakSeg constrained changepoint detection problem.

arXiv:1703.03352 A log-linear segmentation algorithm for peak detection in genomic data

Updated pre-print: jmlr-paper.pdf, jmlr-supplementary.pdf

The labeled data we used to compute peak detection accuracy are http://members.cbio.mines-paristech.fr/~thocking/chip-seq-chunk-db/

The fold IDs that we used in our four-fold cross-validation experiment to compute test AUC/accuracy are listed in http://members.cbio.mines-paristech.fr/~thocking/chip-seq-chunk-db/4foldcv-test-folds.csv

To recompute the figures from the arXiv paper, execute the code in the following R scripts.

To recompute other figures:

Note that the Makefile shows how to re-make some intermediate RData files based on the ChIP-seq benchmark data.

arXiv:1810.00117 Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data

The benchmark data we used in this paper are much larger than the previous paper. To re-compile the figures and paper you can type “make jss-paper.pdf”

TODOs

12 Aug 2022

2022-06-20-paris-time-complexity.tex makes 2022-06-20-paris-time-complexity.pdf slides with summary of results from three recent papers for talk in Paris.

Updated: 2022-09-28-tucson-slides.tex makes 2022-09-28-tucson-slides.pdf

Title: Time complexity analysis of recently proposed algorithms for optimal changepoint detection

Slides: https://github.com/tdhock/PeakSegFPOP-paper/blob/master/2022-09-28-tucson-slides.pdf

Abstract: Several dynamic programming algorithms have been recently proposed for efficiently solving various problems related to optimal changepoint detection in large data sequences measured over space or time. In this talk we will discuss three recent papers which each provide a theoretical or empirical time complexity analysis. Our time complexity analysis shows that these linear and log-linear time algorithms can be used for analysis of huge data sequences, with ten million observations or more, which are now common in areas such as genomics.

24 July 2019

jss-figure-more-evals.R creates

jss-figure-more-evals.png

2 July 2019

2019-useR-slides.Rnw for useR conf.

22 July 2019

jss.more.evals.R attempts to answer the question from a reviewer: does the number of GFPOP evals depend on the max number of peaks?

18 Oct 2018

  • new scheme for PDPA.infeasible.R: for background segments with the same mean either before or after, join peaks before and after. Before this file was using the small peaks in these infeasible models. Contrast with PDPA.peaks.R which simply ignores infeasible models.
  • new file PDPA.infeasible.error.compare.R which compares the label error of the two PDPA models with the CDPA model. Theoretically the PDPA.infeasible model with new join scheme should have more true positives and false positives than the PDPA.peaks model which simply ignores infeasible models. This file also shows that it reduces the min train error to be similar with CDPA.

28 Mar 2017

7 Feb 2017

Ideas about how to define a state graph for a solver of the Optimal Partitioning problem with affine constraints between adjacent segment means.

library(data.table)
### Define an affine constraint function
### g(from, to) = from.coef*from + to.coef*to + constant
### to be used as g(from, to) <= 0,
### e.g. from <= to is coded as 1*from -1*to + 0 <= 0.
affine.constraint <- function(from.coef, to.coef, constant){
  data.table(from.coef, to.coef, constant)
}

no.constraint <- affine.constraint(0, 0, 0)
non.increasing <- affine.constraint(-1, 1, 0)
non.decreasing <- affine.constraint(1, -1, 0)

state <- function(state.name){
  structure(data.table(state.name), type="state")
}

change <- function(from, to, constraint, penalty=NA){
  structure(data.table(from, to, penalty, constraint), type="change")
}

loss <- function(loss.name){
  structure(data.table(loss.name), type="loss")
}

start <- function(...){
  structure(data.table(state.name=c(...)), type="start")
}

end <- function(...){
  structure(data.table(state.name=c(...)), type="end")
}

unconstrained <- list(
  state("anything"),
  change("anything", "anything", no.constraint))

unconstrained.Gaussian <- c(unconstrained, list(
  loss("Gaussian")))

unconstrained.Poisson <- c(unconstrained, list(
  loss("Poisson")))

PeakSegFPOP <- list(
  loss("Poisson"),
  state("peak"),
  state("background"),
  start("background"),
  change("background", "peak", non.decreasing, penalty=0),
  change("peak", "background", non.increasing),
  end("background"))

PeakSegFPOP.start.or.end.up <- list(
  loss("Poisson"),
  state("peak"),
  state("background"),
  change("background", "peak", non.decreasing),
  change("peak", "background", penalty, non.increasing))

reduced.isotonic.regression <- list(
  state("anything"),
  change("anything", "anything", non.decreasing))

unimodal.regression <- list(
  state("can.change.up.or.down"),
  state("can.change.down"),
  change("can.change.up.or.down", "can.change.up.or.down", non.decreasing),
  change("can.change.up.or.down", "can.change.down", non.increasing),
  change("can.change.down", "can.change.down", non.increasing))

unimodal.at.least.one.up <- c(unimodal.regression, list(
  state("start"),
  start("start"),
  change("start", "can.change.up.or.down", non.decreasing)))

unimodal.at.least.one.up.and.down <- c(unimodal.at.least.one.up, list(
  end("can.change.down")))

checkModel <- function(model.list){
  type.vec <- sapply(model.list, attr, "type")
  model.info <- sapply(unique(type.vec), function(type){
    do.call(rbind, model.list[type.vec==type])
  })
  ## TODO error checking.
  model.info
}
checkModel(unimodal.at.least.one.up.and.down)
checkModel(PeakSegFPOP)

## TODO functions for plotting, solving.
GFPOP(model, data.vec, weight.vec, penalty)

26 Jan 2017

Guillaume’s group meeting presentation slides http://members.cbio.mines-paristech.fr/~thocking/HOCKING-PeakSegFPOP-pipeline-slides.pdf

10 Nov 2016

Data viz with smooth transitions, clarified titles.

Revised interactive data viz.

9 Nov 2016

Interactive data viz to explain supervised penalty learning for peaks.

15 Aug 2016

Test accuracy and AUC data viz, explains why Segmentor gets such a high test accuracy (it has a low true positive and false positive rate) http://bl.ocks.org/tdhock/raw/886575874144c3b172ce6b7d7d770b9f/

10 Aug 2016

3 Aug 2016

13 May 2016

figure-cDPA-PDPA-all.R visualizes the optimality and feasibility of the PDPA and cDPA models, and shows the interval counts in the PDPA http://bl.ocks.org/tdhock/raw/4582904f843cc60639fdfeb9651cac73/

12 May 2016

figure-cDPA-PDPA.R shows the difference between the cDPA and PDPA on real data: the cDPA recovers a sub-optimal solution that obeys the strict inequality peak constraint, and the PDPA recovers the optimal solution for the non-strict inequality peak constraint. http://bl.ocks.org/tdhock/raw/24aa6387901edab1577ce24f1e736ff3/

10 May 2016

4 May 2016

About

Functional pruning optimal partioning algorithm for PeakSeg constrained segmentation model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages