Mining large distributed log-data in near real-time

Publikation: Beitrag zu KonferenzenPaperBeigetragen

Beitragende

Abstract

Analyzing huge amounts of log data is often a difficult task, especially if it has to be done in real time (e.g., fraud detection) or when large amounts of stored data are required for the analysis. Graphs are a data structure often used in log analysis. Examples are clique analysis and communities of interest (COI). However, little attention has been paid to large distributed graphs that allow a high throughput of updates with very low latency.

In this paper, we present a distributed graph mining system that is able to process around 39 million log entries per second on a 50 node cluster while providing processing latencies below 10 ms. We validate our approach by presenting two example applications, namely telephony fraud detection and internet attack detection. A thorough evaluation proves the scalability and near real-time properties of our system.

Details

OriginalspracheEnglisch
Seiten1-8
PublikationsstatusVeröffentlicht - 2011
Peer-Review-StatusNein

Konferenz

Titel Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques (SLAML/SOSP) (SLAML '11), ACM, 2011
Kurztitel(SLAML '11
Veranstaltungsnummer
Dauer23 Oktober 2011
BekanntheitsgradInternationale Veranstaltung
Ort
StadtCascais
LandPortugal

Schlagworte

Forschungsprofillinien der TU Dresden

DFG-Fachsystematik nach Fachkollegium

Schlagwörter

  • Log processing, distriuted graphs, COI