Monitoring is the last stage in the software development life cycle according to the DevOps model. It’s a stage, it should be added, that’s often neglected or even abandoned completely. In turn, DevOps … this is a whole other, lengthy story. I assume that this concept is familiar to you, but if not, to cut a long story short:
DevOps is a set of practices that we use in creating and developing software. One of the results of using these techniques is a reduction in the barriers between people with different roles in the project (e.g. between developers and administrators, as well as between administrators and project managers).
All of this serves to ensure faster adaptation to the changes that are inevitable in any long-term software project. The basis of DevOps is a predictable, looped and maximally automated project lifecycle. However, this doesn’t mean that everyone working on the project does everything at once, and that the responsibility for the tasks of individual team members is blurred. This is a misunderstanding of the DevOps concept.
What does monitoring have to do with it?
The monitoring stage fits naturally into DevOps. Adapting to changes – for example, responding to the changing needs of application users over time – will not be possible without having full knowledge of the software’s operation and the interactions of its users.
Monitoring leads to the acquisition of so-called situational awareness. The term is borrowed from training programs for uniformed services, and basically means being prepared in advance for a threat. Continuous monitoring of our applications therefore reduces the risk that something will go wrong.
Ok, but what is monitoring in practical terms?
It primarily covers the technical parameters of the application (the number of HTTP requests handled, the level of RAM usage, the level of CPU usage, the amount of space used by the application data) useful for programmers.
In addition, there are also parameters specific to the tasks performed by the application (number of users logged in, number of tasks handled, number of products sold, number of files generated) useful for project owners.
Implementing application monitoring tools brings many benefits. Here are the most important of these:
knowledge of the current availability of the application – if the application goes offline, we’ll find out before users start reporting it to us
access to data on the resources used by the application – depending on the time and other factors, such as the number of active users
help in diagnosing application performance problems
obtaining information about changes in the operation of the application, depending on changes in the operating environment (a change of configuration on the server infrastructure provider’s side may radically affect the operation of the application)
insight into the occurrence of events specific to the business logic of a given application (how many users performed action X in the period from Y to Z)
ability to bring people with different roles in the project up to the same level of knowledge as regards the condition of the application
possibility of further planning based on data
greater transparency for the project owner, who can check him/herself how the application works, which increases trust between stakeholders
Last but not least – on the basis of data collected from the monitoring system, we can isolate trends occurring in the observed system and then react to them in advance.
However, going slightly beyond the bounds of DevOps itself, examples of good practices in software development projects show that it’s worth supplementing monitoring with feedback from application users. In this way, we will obtain the full spectrum of data, enabling us to take the right actions going forward.
One of the basic principles related to monitoring is:
“You can’t manage what you don’t measure.”
Therefore, for the best application management practices, we need appropriate tools to measure various aspects of the application’s operation. Let’s take a closer look at these.
Alerting – what is it good for?
The purpose of monitoring is certainly not to constantly stare at the dashboard with charts showing the application’s operation. But alerting is like that. It’s a function notifying us that an event has occurred which should worry us or which simply needs our attention (e.g. it requires taking a specific action). Most of the currently popular tools provide the option to notify us of events via:
email messages
notifications in the company communication tools (slack/mattermost)
phone and SMS notifications
Efficient alerting allows us to avoid flooding our email inbox with thousands of messages about the incident. In addition, thanks to monitoring tools, we should have a complete picture of the situation when such an event actually occurs. It’s also useful to be able to view data in reverse – from the starting point, which is the selected event.
“Moar” data!
Any tool that provides access to more data about the operation of an application can become part of our monitoring system. For example, the well-known Google Analytics tool fulfils such a role. In this article, however, we’ll focus on tools that work largely on the server side, meaning that they are more reliable and monitor the operation of the application in pseudo real-time.
Grafana and Prometheus
These two tools perfectly complement each other and are very often used in parallel. Prometheus is used to collect data and save it in a special format (supports several different types of databases). Grafana, on the other hand, is a tool for visualising data collected from various sources (including databases created by Prometheus).
Implementation consists in adding an API endpoint to a given application, which returns current statistics on the operation of the application. This endpoint is polled at regular intervals (e.g. every few seconds) by Prometheus, which then saves the downloaded data in a format that allows for its analysis at a later date.
Before the actual implementation, however, you should consider what measurements you want to take (and then visualise). It will also be necessary to modify (or implement instrumentation to) the application itself to equip the endpoint with the data we want. The ready-made libraries available for various programming languages come in useful here. They enable the collection of measurements at the level of the application source code.
The second element is Grafana. You can use it to visualise many different data sources (not only Prometheus) – even those downloaded from MySQL and PostgreSQL. Grafana allows you to create custom dashboards with charts generated based on previously collected data.
Both tools belong to the world of free software. In the case of Grafana, it is possible to use the SaaS service.
Advantages for the team:
the ability to gather custom types of metrics specific to the business logic of a given application
Grafana has ready-to-import sets of dashboards for popular tech stacks
both Prometheus and Grafana include modules that enable alerts when configurable limit values of any metrics are exceeded
Disadvantages:
implementation definitely requires programming work, as it involves modification of the application itself
in the case of using the self-hosted version: investment in infrastructure and time spent on implementing both tools
this solution requires prior learning by technical staff and the transfer of knowledge to those who will ultimately use it
When you work with our team, however, you don’t need to pay too much attention to the above-mentioned drawbacks. Why not? Well, 1000software.house is an experienced programming team, including when it comes to implementing and configuring this solution from A to Z. We will gladly and – most importantly – effectively take care of the issue of software monitoring 🙂
Elastic Stack
This is a set of tools that work in tandem, written mainly in Java (therefore with high hardware requirements). Elastic Stack consists of:
Elasticsearch – NoSQL database designed to scale across multiple servers and store massive amounts of data
Kibana – a tool for visualisation of data stored in the Elasticsearch database
Beats and Logstash – components for extracting data from application logs, performing initial analysis and sending them to Elasticsearch
If we want to use Elastic Stack in a self-hosted version, the fact that its tools are produced by one company is significant; thanks to this, start-up is not a problem even for an intermediate-level system administrator. However, problems may arise when we want to store a large amount of data or handle a large amount of traffic. In this case, we need to be aware that maintaining this solution in an efficient way requires more attention. So, if your team doesn’t have a full-time systems administrator, it’s better to go with the SaaS version.
Kibana has a wide range of data displays. For example, it can be used to visualise geographic coordinates relatively easily in the form of heat maps. It is impossible to describe in just a few words the possibilities that Elastic Stack offers. Therefore, I encourage you to have a read through the extensive documentation for this project on your own.
Advantages for the team:
extensive documentation on the product developers’ website
anyone who has access to the Kibana panel can analyse the data on their own and create their own visualisations
a very extensive and universal tool – the ability to collect all kinds of data
well-integrated components – a clear path to guide you through start-up
Disadvantages:
in a self-hosted version – more work for administrators in case of heavy traffic and large amounts of data
the universality of this tool means that its implementation requires many decisions and well-thought-out use
if using the free version of the software, you have to take into account the limitations concerning access management and authorisations for many users (important for larger enterprises and corporations)
Elastic Stack – similarly to Grafana and Prometheus – is a tool that must first be learned by technical staff in order to pass the knowledge on to other team members.
Sentry
The primary function of Sentry is collecting information and receiving configurable alerts about unexpected errors occurring in our applications. Sentry allows you to log errors occurring both on the server side and the web browser. Integration with Sentry is relatively simple – there are ready-to-use libraries that are available in virtually every popular programming language, and the integration itself consists in pasting a ready-made fragment of code into the appropriate place in the application.
In addition to collecting information about errors and exceptions, Sentry allows you to log events specific to a given application and you can – with some limitations, of course – install it in your own infrastructure. However, it should be remembered that this is a proprietary product (with a long history of open-source), which has full functionality only on the most expensive paid plans.
Advantages for the team:
relatively simple integration with existing projects
transparent alerting of errors occurring in the applications
notification of new errors via email in a smart and customisable way
Disadvantages:
complete information about errors in Sentry is often difficult to interpret by non-technical staff
when using the SaaS service:
advanced data analysis only on higher plans
history of collected errors – on the lowest plan, only up to 3 months back
Is there anything else?
Of course there is! We have presented only a few tools from the entire spectrum available. Below you’ll find links to other useful services that we know, have tested and consider worthy of your time:
uptimerobot – a small tool for testing the application availability status and the validity of TLS certificates. Has the ability to send alerts via SMS (on a paid plan)
datadog – one of the most popular services for comprehensive monitoring of all types of systems and applications
your infrastructure provider – because why not? We often forget that most server infrastructure providers offer the possibility to view consumption charts for selected resources. It’s a bit too much to expect that these would show data specific to our applications, and also be equipped with alerting functions. However, even these types of charts help to determine the overall condition of our application.
And that’s about it for now. I hope that you found this article helpful and that it cleared up many issues related to software monitoring. Finally, I’d like to thank you for reading and invite you to contact us if you already know that monitoring is an essential element in running an online business. The experts at 1000software.house and I will be happy to take care of this for you!
Useful links/references:
https://grafana.com
https://prometheus.io
https://www.elastic.co
https://uptimerobot.com
https://sentry.io
https://en.wikipedia.org/wiki/DevOps
https://www.digitalocean.com/community/tutorials/an-introduction-to-metrics-monitoring-and-alerting