Real-time definition of non-randomness in the distribution of genomic events

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Ulrich Abel - , German Cancer Research Center (DKFZ), Tumor Center Heidelberg-Mannheim (Author)
  • Annette Deichmann - , German Cancer Research Center (DKFZ) (Author)
  • Cynthia Bartholomae - , German Cancer Research Center (DKFZ) (Author)
  • Kerstin Schwarzwaelder - , German Cancer Research Center (DKFZ) (Author)
  • Hanno Glimm - , National Center for Tumor Diseases Dresden, National Center for Tumor Diseases (NCT) Heidelberg (Author)
  • Steven Howe - , Tumor Center Heidelberg-Mannheim (Author)
  • Adrian Thrasher - , University College London, Great Ormond Street Hospital for Children NHS Trust (Author)
  • Alexandrine Garrigue - , INSERM - Institut national de la santé et de la recherche médicale (Author)
  • Salima Hacein-Bey-Abina - , INSERM - Institut national de la santé et de la recherche médicale, Necker–Enfants Malades Hospital (Author)
  • Marina Cavazzana-Calvo - , INSERM - Institut national de la santé et de la recherche médicale, Necker–Enfants Malades Hospital (Author)
  • Alain Fischer - , INSERM - Institut national de la santé et de la recherche médicale, Necker–Enfants Malades Hospital (Author)
  • Dirk Jaeger - , German Cancer Research Center (DKFZ) (Author)
  • Christof von Kalle - , German Cancer Research Center (DKFZ), Cincinnati Children's Hospital Medical Center (Author)
  • Manfred Schmidt - , German Cancer Research Center (DKFZ) (Author)

Abstract

Features such as mutations or structural characteristics can be non-randomly or non-uniformly distributed within a genome. So far, computer simulations were required for statistical inferences on the distribution of sequence motifs. Here, we show that these analyses are possible using an analytical, mathematical approach, For the assessment of non-randomness, our calculations only require information including genome size, number of (sampled) sequence motifs and distance parameters. We have developed computer programs evaluating our analytical formulas for the real-time determination of expected values. and p-values. This approach permits a flexible cluster definition that can be applied to most effectively identify non-random or non-uniform sequence motif distribution. As an example, we show the effectivity and reliability of our mathematical approach in clinical retroviral vector integration site distribution.

Details

Original languageEnglish
Article numbere570
JournalPloS one
Volume2
Issue number6
Publication statusPublished - 27 Jun 2007
Peer-reviewedYes

External IDs

PubMed 17593969

Keywords

ASJC Scopus subject areas