We developed and released our own Drupal Monitoring framework (https://www.drupal.org/project/monitoring) and have actively used and maintained it since 2013. This is one of the most used projects that supports site maintainers to monitor the health of their projects. Around 1500 projects rely on this to collect insights directly from Drupal such as built-in status checks, frequency of certain errors, performance indicators and other metrics. It is extensible and custom metrics and checks can be implemented to ensure integrations and other functionalities are working as expected.
We also ensure that projects are available and fast using health checks, monitor SSL expirations and more.
Over the last 10 years, we’ve been using a number of self-hosted solutions to track, aggregate and visualize all this information. We’ve used Icinga, Sensu, Grafana and others. Most recently, we migrated to SigNoz, an all-in-one solution for metrics, log aggregation and performance traces, based on the OpenTelemetry standard. We keep up to date with the latest best practices around monitoring and update our workflow accordingly.
If a project is unresponsive or a metric reaches a critical threshold, our team is alerted and can immediately start to investigate. We inform our clients about these incidents and their resolution, often before they even register the problem themselves.
We are very proud of our work making sure that we catch as many issues as possible on our websites so that they can all -even the most insignificant at first glance- be fixed, before they become a larger issue.