Log Management Tools – High Level Architecture
Log management tools primarily aggregate logs from different servers including application, database, messaging servers etc and send the same to a centralized server which then analyzes and index the logs in the database. The end user can then log onto the console of these tools and analyze the reports created on top of these logs. Following diagram represents a very high level architecture along with end users classification:
Following are some of the different elements of such tools:
- Log agents which aggregate the logs
- Indexing engine which index the logs
- Log management server which processes the users request for the reports
- Deployment engine which is used to install log agents
Different classification of Reports
At a high-level, log analyzer tools such as SPLUNK, Logstash, SumoLogic etc could help to generate following different kind of reports on top of log files aggregated from different sources (servers):
- Technical reports which can give information on application/server related errors, application performance etc.
- Business reports, primarily derived ones which could be of help to business analysts/product owners
Who will get benefited?
Following are different classes of stakeholders who would get benefited by looking at the reports created these log analyzer tools using log files:
- IT Service Management personnel who are working in problem and incident management areas. They could login into these tools and look for daily problems/incident reports which can be derived as a result of some of the following use-cases such as server errors, application errors etc.
- IT Admin staff who are bothered about maintaining 99.99 %availability of the servers.
- Application developers who want to avoid logging into servers and look at the traditional hard-to-read console reading the log files, and instead, use web interface to read log files.
- Business users who could create reports covering areas such as most used functionality etc. Although, they could also achieve most part of it using analytic tool such as Google.
- Security team who want to constantly examine threats based on logs.
Why use these tools?
Following are listed different classes of data which could be read from these log analyzer/management tools:
Errors generated by servers when processing request & response. For example, one of the most popular metrics with Apache server is to get a count of 401, 403, 404, 500 etc. The metrics is particularly helpful for application developers to figure out quickly if any page is not found, or some unknown errors leading to 500. These metrics & related reports could be helpful primarily to IT admin staff who want to minimize business downtime and achieve and availability of 99.99 %.
Errors generated by applications when processing user requests/response. This could be used for identifying exceptions arising as a result of processing of users’ requests and could help developers pro-actively fix the problem. These metrics would be of interest to both, service management staff as well as developers. Service management staff could create reports & notification on top of application exceptions and create tickets/incidents as and when these exceptions arise.
Log analyzer tools such as Splunk come very handy in deriving performance related metrics if application could write about time taken to process the request, in the log files. There are tools such as Perf4J (in Java) which helps to capture cross-cutting concerns including time taken by request to execute with one or more methods. Although that comes with the penalty of a time-related write in the log file, it helps a great deal in performance monitoring with the help of log capture. These metrics would be of interest to both, service management staff as well as developers.
Business analysts could as well get various different reports based on who all are using what all functionality. The information on functionality usage could as well be derived from analytics tools such as Google analytics. However, if the business application is SAAS based, one could track usage of different clients and their end users vis-a-vis functionality used by them and related issues, if any. These metrics & related reports would be of concerns to business analysts and product owners.
Following are different usage of log analyzers in relation with security:
- Information on admin users and their frequency of logins into the system
- Capture API usage and do risk analysis
These metrics and related reports would be of primary interest to security & risk management team and also, developers.
List of Different Log Analyzers
Following is a list of different log analyzers (commercial & open source):
- Agentic Reasoning Design Patterns in AI: Examples - October 18, 2024
- LLMs for Adaptive Learning & Personalized Education - October 8, 2024
- Sparse Mixture of Experts (MoE) Models: Examples - October 6, 2024
You left out LogZilla – http://www.logzilla.net – costs MUCH less than Splunk and can handle around 1B events/day on a single server.
thanks for the suggestion. Found it to be very interesting tool. Just added an entry for it.
I have been playing around a bit with Logstash and am fairly convinced that it suits my need. Since, it uses elastic search and Kibana for the UI making the more easier. The UI of Kibana is highly customizable and you can have the thing up and running in just few mins of time.
Thanks for the update, Ravi.
Thanks for the article.
NXLog is another free and open source log management system that was left out but deserves a place in the list, since it can collect logs from Windows, Linux, Android and more operation systems, and it provides high-performance even when scaling to thousands servers and on. If interested, check it out here:
https://nxlog.co/products/nxlog-community-edition