The article lists down reasons why one would want to use tools such as Splunk which primarily analyses logs (server, application etc) and create reports/events to be processed by different stakeholders.
Log Management Tools – High Level Architecture
Log management tools primarily aggregate logs from different servers including application, database, messaging servers etc and send the same to a centralized server which then analyzes and index the logs in the database. The end user can then log onto the console of these tools and analyze the reports created on top of these logs. Following diagram represents a very high level architecture along with end users classification:
Following are some of the different elements of such tools:
- Log agents which aggregate the logs
- Indexing engine which index the logs
- Log management server which processes the users request for the reports
- Deployment engine which is used to install log agents
Different classification of Reports
At a high-level, log analyzer tools such as SPLUNK, Logstash, SumoLogic etc could help to generate following different kind of reports on top of log files aggregated from different sources (servers):
- Technical reports which can give information on application/server related errors, application performance etc.
- Business reports, primarily derived ones which could be of help to business analysts/product owners
Why use these tools?
Following are listed different classes of data which could be read from these log analyzer/management tools:
Server-related Errors
Errors generated by servers when processing request & response. For example, one of the most popular metrics with Apache server is to get a count of 401, 403, 404, 500 etc. The metrics is particularly helpful for application developers to figure out quickly if any page is not found, or some unknown errors leading to 500. These metrics & related reports could be helpful primarily to IT admin staff who want to minimize business downtime and achieve and availability of 99.99 %.
Application-related Exceptions/Errors
Errors generated by applications when processing user requests/response. This could be used for identifying exceptions arising as a result of processing of users’ requests and could help developers pro-actively fix the problem. These metrics would be of interest to both, service management staff as well as developers. Service management staff could create reports & notification on top of application exceptions and create tickets/incidents as and when these exceptions arise.
Application Performance
Log analyzer tools such as Splunk come very handy in deriving performance related metrics if application could write about time taken to process the request, in the log files. There are tools such as Perf4J (in Java) which helps to capture cross-cutting concerns including time taken by request to execute with one or more methods. Although that comes with the penalty of a time-related write in the log file, it helps a great deal in performance monitoring with the help of log capture. These metrics would be of interest to both, service management staff as well as developers.
Business Reports
Business analysts could as well get various different reports based on who all are using what all functionality. The information on functionality usage could as well be derived from analytics tools such as Google analytics. However, if the business application is SAAS based, one could track usage of different clients and their end users vis-a-vis functionality used by them and related issues, if any. These metrics & related reports would be of concerns to business analysts and product owners.
Application Security Concerns Following are different usage of log analyzers in relation with security:
- Information on admin users and their frequency of logins into the system
- Capture API usage and do risk analysis
These metrics and related reports would be of primary interest to security & risk management team and also, developers.
View Comments
You left out LogZilla - http://www.logzilla.net - costs MUCH less than Splunk and can handle around 1B events/day on a single server.
thanks for the suggestion. Found it to be very interesting tool. Just added an entry for it.
I have been playing around a bit with Logstash and am fairly convinced that it suits my need. Since, it uses elastic search and Kibana for the UI making the more easier. The UI of Kibana is highly customizable and you can have the thing up and running in just few mins of time.
Thanks for the update, Ravi.
Thanks for the article.
NXLog is another free and open source log management system that was left out but deserves a place in the list, since it can collect logs from Windows, Linux, Android and more operation systems, and it provides high-performance even when scaling to thousands servers and on. If interested, check it out here:
https://nxlog.co/products/nxlog-community-edition