Mining large distributed log-data in near real-time
Publikation: Beitrag zu Konferenzen › Paper › Beigetragen
Beitragende
Abstract
Analyzing huge amounts of log data is often a difficult task, especially if it has to be done in real time (e.g., fraud detection) or when large amounts of stored data are required for the analysis. Graphs are a data structure often used in log analysis. Examples are clique analysis and communities of interest (COI). However, little attention has been paid to large distributed graphs that allow a high throughput of updates with very low latency.
In this paper, we present a distributed graph mining system that is able to process around 39 million log entries per second on a 50 node cluster while providing processing latencies below 10 ms. We validate our approach by presenting two example applications, namely telephony fraud detection and internet attack detection. A thorough evaluation proves the scalability and near real-time properties of our system.
In this paper, we present a distributed graph mining system that is able to process around 39 million log entries per second on a 50 node cluster while providing processing latencies below 10 ms. We validate our approach by presenting two example applications, namely telephony fraud detection and internet attack detection. A thorough evaluation proves the scalability and near real-time properties of our system.
Details
Originalsprache | Englisch |
---|---|
Seiten | 1-8 |
Publikationsstatus | Veröffentlicht - 2011 |
Peer-Review-Status | Nein |
Konferenz
Titel | Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques (SLAML/SOSP) (SLAML '11), ACM, 2011 |
---|---|
Kurztitel | (SLAML '11 |
Veranstaltungsnummer | |
Dauer | 23 Oktober 2011 |
Bekanntheitsgrad | Internationale Veranstaltung |
Ort | |
Stadt | Cascais |
Land | Portugal |
Schlagworte
Forschungsprofillinien der TU Dresden
DFG-Fachsystematik nach Fachkollegium
Schlagwörter
- Log processing, distriuted graphs, COI