Logging, monitoring and alerting is a very important piece when deploying software. In my experience, it’s important that best practices are implemented at an early stage of a project. Whenever code is changed we should make sure that we are:
- Logging everything that could be out of interest
- Monitoring different metrics, such as errors
- Alerting the developers whenever an error occur
If done correctly, any error would be apparent to the developer / business at an early stage and could be solved right away. If not, “silent errors” will occur and cause your application to fail and result in a poor user experience.
1) 🧾 Logging
As a developer
I want to be able to see a log of notable events
So that I can analyse and debug what's happening
2) ☕️ Monitoring
As a developer
I want to have an overview of important metrics
So that I can see how we are doing
3) 🚨 Alerting
As a developer
I want to be notified if a particular event occurs
So that I can act on that information
Getting rid of silent errors
Silent errors are errors that do not trigger any kind of reporting, and therefore occur without you knowing about it. They are usually reported by the users themselves - which is a very bad user experience and poor for business.
Errors will happen, let’s stay on top of them
When building an application you are often dealing with a third party that you are not in control of. In case a third party API throws an error we should deal with it accordingly and send an alert so that we can fix the issue if possible. Monitoring the HTTP status codes per endpoint is also very helpful.
Don’t fix the silent error right away
Whenever you are made aware of silent error, the first instinct might be to fix the problem right away. I will argue that the first action should be to gracefully deal with the error and make sure that we are logging it in an appropriate way.
- Can I find this error in the logs?
- Should I monitor this error?
- Should I have received an alert?
By addressing the “logging” aspects first, you should have a better visibility of the error before you go ahead and fix it. This will make the codebase much more stable and the developer experience better.
A practical example
Let’s say that a third party API failed, resulting in the application crashing I recommend you to:
- Make sure to gracefully deal with the error
- Make sure that you log the error
- Make sure that you get an alert when the error occurs
- If suitable; start monitoring the error (and possibly similar errors)
..and only when this is done you can fix the problem.
Of course there are scenarios where a hotfix is necessary, but ignoring the steps above will most likely result in a codebase where errors similar to the one that showed up will occur in the future. If you make sure to address the logging, you ensure that you stay on top of errors in the future and can fix them as soon as possible. The time spent setting up a good standard for logging will result in a better product, happier developers and more time to work on actual features.
What tools to use?
There are tons of tools out there for logging, monitoring and alerting - and what’s best for you depends on your tech stack. For one of our latest projects which is a web application built with Next.js
and deployed through Vercel we are using the tools below.
1) Logging to LogFlare
We are logging to LogFlare using Vercel where we have configured LogFlare as a log drain. You can find more about the Logflare integration with Vercel here.
2) Monitoring with Google Data Studio
We are using Google Data Studio to generate reports that we can monitor. We are doing this by connecting Logflare as a datasource. It’s super easy to create charts for the metrics you need to monitor.
3) Error Reporting to Sentry
We are tracking errors using Sentry. We will catch any runtime errors in our React application as well as our backend API. We also manually log specific errors that might occur. Whenever a new error occur an alert will be sent to Slack.