Data Cleaning Technique for Security Big Data Ecosystem
Diana Martínez-Mosquera, Sergio Luján-Mora
Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security (IoTBDS 2017), p. 380-385, Porto (Portugal), April 24-26 2017. ISBN: 978-989-758-245-5. https://doi.org/10.5220/0006360603800385
(IOTBDS'17) Congreso internacional / International conference
The information networks growth have given rise to an ever-multiplying number of security threats; it is the reason some information networks currently have incorporated a Computer Security Incident Response Team (CSIRT) responsible for monitoring all the events that occur in the network, especially those affecting data security. We can imagine thousands or even millions of events occurring every day and handling such amount of information requires a robust infrastructure. Commercially, there are many available solutions to process this kind of information, however, they are either expensive, or cannot cope with such volume. Furthermore, and most importantly, security information is by nature confidential and sensitive thus, companies should opt to process it internally. Taking as case study a university's CSIRT responsible for 10,000 users, we propose a security Big Data ecosystem to process a high data volume and guarantee the confidentiality. It was noted during implementation that one of the first challenges was the cleaning phase after data extraction, where it was observed that some data could be safely ignored without affecting result's quality, and thus reducing storage size requirements. For this cleaning phase, we propose an intuitive technique and a comparative proposal based on the Fellegi-Sunter theory.