Data Cleaning Technique for Security Logs Based on Fellegi-Sunter Theory
Diana Martínez-Mosquera, Sergio Luján-Mora, Gabriel López, Lauro Santos
The 10th SIGSAND/PLAIS EuroSymposium on Systems Analysis and Design, p. 3-12, Gdansk (Poland), September 22 2017. ISBN: 978-3-319-66995-3. https://doi.org/10.1007/978-3-319-66996-0_1
(EURO'17) Congreso internacional / International conference
Information security is one of the most important aspects an organization should consider. Due to this matter and the variety of existing vulnerabilities, there are specialized groups known as Computer Security Incident Response Team (CSIRT), that are responsible for event monitoring and for providing proactive and reactive support related to incidents. Using as a case study a CSIRT of a university with 10,000 users, and considering the high volume of events to be analyzed on a daily basis, it is proposed to implement a Big Data ecosystem. One of the most important activities for the information processing is the data cleaning phase, it will remove useless data and help to overcome storage limitations, since CSIRT is actually limited to a small time-frame, usually a few days and cannot analyze historical security events. Focusing on this cleaning phase, this article analyzes an intuitive technique and proposes a comparative technique based on the Fellegi-Sunter theory. The main conclusion of our research is that some data could be safely ignored helping to reduce storage size requirements. Moreover, increasing the data retention will enable to detect some events from historical data.