Mining large distributed log-data in near real-time

Research output: Contribution to conferencesPaperContributed

Contributors

Abstract

Analyzing huge amounts of log data is often a difficult task, especially if it has to be done in real time (e.g., fraud detection) or when large amounts of stored data are required for the analysis. Graphs are a data structure often used in log analysis. Examples are clique analysis and communities of interest (COI). However, little attention has been paid to large distributed graphs that allow a high throughput of updates with very low latency.

In this paper, we present a distributed graph mining system that is able to process around 39 million log entries per second on a 50 node cluster while providing processing latencies below 10 ms. We validate our approach by presenting two example applications, namely telephony fraud detection and internet attack detection. A thorough evaluation proves the scalability and near real-time properties of our system.

Details

Original languageEnglish
Pages1-8
Publication statusPublished - 2011
Peer-reviewedNo

Conference

Title Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques (SLAML/SOSP) (SLAML '11), ACM, 2011
Abbreviated title(SLAML '11
Conference number
Duration23 October 2011
Degree of recognitionInternational event
Location
CityCascais
CountryPortugal

Keywords

Research priority areas of TU Dresden

DFG Classification of Subject Areas according to Review Boards

Keywords

  • Log processing, distriuted graphs, COI