Saturday, November 18, 2017

Exploiting Machine Generated Data - Chapter 1

Exploiting Machine Generated Data 



During the past few decades, the amount of machine data generated per millisecond has increased rapidly throughout the world. Machine-generated data is defined as, “information automatically generated by a computer process, application, or other mechanism without the active intervention of humans.” [- Wikipedia]. Web server logs, call detail records, financial instrument trades and network event logs are some examples for machine-generated data.

These data consist of a wealth of information that can be utilized for myriads of purposes. Therefore, computer scientists have been continuously working on developing highly scalable tools to extract this valuable insight from the data. But most of the time, the information extracted from these tools and algorithms are not presented in a way that can be used for decision making purposes of operational managers at an organization level in real time. However, if a real-time analytics system with an effective presentation level were to be built, it would certainly help operational managers of organizations to gain a deeper understanding of the uncovered information from machine data, reduce time to recognize vital events and take advantage of the live feeds and historical data to identify anomalies and make more effective decisions.

Web logs  as machine generated data 

In this article, web log records are taken as the machine generated data. These log files contain millions of records. A web log file records activity information when a web user submits a request to a web server. A log file can be located in three places; web servers, web proxy servers and client browsers . Each of these records contains an IP address, a HTTP status code, date, request address, number of bytes transmitted, user agent etc . These low-level data contain extremely valuable information such as security attacks, access information, bandwidth information, etc.

Moreover, by exploiting malicious user behavior patterns from these web logs, we can identify the possible security breaches that could occur and presenting them in real-time as “warnings”. This would help the system administrators to take measured actions at that moment to prevent these security breaches and enable them to drill down and analyze the historical security breach information. This, in turn, would assist them to distinguish system back-doors and loop holes. 

To be continued ..............