Press [s] to read the speaker notes (will open a new window).
We will talk about logs…
… even if it's not “trendy ”
… because they are valuable
… and often used inconsistently
Every "real-world" application has some kind of logging infrastructure.
Sometimes it's quick and simple (error_log), sometimes it's more sophisticated and configurable (Monolog).
Unfortunately, many applications use logging inconsistently. The logging infrastructure is usually
grown organically and not thought through. Usually, it's not the most loved and fun aspect of the development.
Lots of us log blindly and without thinking of who will read these logs and why.
In this talk, we will take some time to see how to make our logs more valuable (and why).
Log exploitation (with tools such as ELK stacks/Graylog/… ) will not be addressed.
How can we write useful logs?
Definition: to log
“to make an official record of things that happen”
Macmillan Dictionary
First of all, let's define what we talk about when we say "logs".
As we see in this definition, logs can be almost anything. Any event that is relevant
to explain the behavior of an application can be "logged" or saved somewhere for later
use.
What that tells us is that our logs might have different purposes and we have to take
that into account when we design our logging strategy.
Logging strategies
performance and resources
errors
user actions
I think that logs can serve different purposes. The most common being:
* performance and resources: memory consumption, CPU/IO/network usage, external API calls and their response times, … Helps to monitor, debug and improve the general performance of the system.
* errors: exceptions, warnings, unexpected responses, … Helps detecting and troubleshooting issues with the system or its environment.
* user actions: actions performed by your users. Useful to help users when they come across an issue.
* business events: some log entries can be transformed into business-related metrics and statistics. Can be used as a poor man’s BI tool.
Therefore, the first step to write useful logs is to determine what they will be used for and by who.
Log, log, log!
Ensure that all the relevant events are logged!
… relevant for who and why?
In terms of logging strategy, we (TEA) are mainly interested by user actions of the platform and
potential errors.
For our needs, relevant events include business/user events such as
orders, refunds, API response times, errors, books integrations, …
To help you decide if something is worth logging or not, ask yourself
the following question: why logging this would be useful? to whom?
Exercise: ask yourself "what does my application log?"
Spoiler: if you want to answer by "everything", you have a problem.
Log messages
Logs should be explicit.
Order "REF-42 " failed: payment gateway unreachable .
Write sentences describing the event and including the relevant details.
Think about the person who will read the log.
An order failed? Which one? Why? When?
The first version is almost useless as it provides no information
that can be used to explain what happened and what action should be taken.
When a developer writes a log, he has in mind the context (in terms of code
and application flow) in which the log is to be inserted. We often tend to
write log messages that rely on this context… that is NOT present anymore when
the log entry is read.
To avoid having cryptic logs emitted by the application, log messages have to
be the most explicit possible.
Chronological logs
Logs should be timestamped.
Order "REF-42" failed on 12/12/17 at 14h30 : payment gateway unreachable.
The "when" question wasn't answered by the last log entry.
Timestamps help us correlate logs with other events (was there
a global outage in our banking service at that time?)
That's why we include dates in our log entries.
But there is still an issue here: the date format is ambiguous!
Human readable, machine parseable
Logs should be human readable and unambiguously parseable .
[2017-12-12T14:30:51.965Z] Order "REF-42" failed on 12/12/17 at 14h30 : payment gateway unreachable.
In order to avoid ambiguities, we will only use standard formats (ISO 8601 for instance)
and we will always indicate the date as first information of the log entries (for consistency).
Human readable, machine parseable
Logs should be human readable and unambiguously parseable .
[2017-12-12T14:30:51.965Z] Order "REF-42 {order_ref} " failed: payment gateway unreachable. {reason} -- {"order_ref": "REF-42", "reason": "payment gateway unreachable"}
Let the tool used for logs exploitation decide if its better to interpolate these
placeholders or not.
Also, for some languages string concatenation can be relatively expensive. Especially
if there are a lot of logs.
For error reporting tools (sentry, airbrake, …), non interpolated logs will be easier
to aggregate and to parse.
If you use things like logstash, the only way to extract the order ref and reason is by using
regex: it's slow (and can be complex, depending on the log entry to parse).
Human readable, machine parseable
JSON is fine.
NEVER write multi-line logs.
Using JSON is easy for both developers and machines. But really, any plain-text, normalized format will do.
Just ensure that you NEVER
Context matters
[2017-12-12T14:30:51.965Z] Order {order_ref} failed: {reason} -- {"order_ref": "REF-42", "reason": "payment gateway unreachable", "customer_id": 42, "payment_transaction_id": 4224 }
Context matters
Never log sensitive data.
Passwords, tokens, email (depending on your application, it can be considered sensitive data)
Divide and conquer: severity levels
Logs should be categorized by severity level.
[2017-12-12T14:30:51.965Z] [WARNING] Order {order_ref} failed: {reason} -- {"order_ref": "REF-42", "reason": "payment gateway unreachable"}
All log entries do not have the same importance or severity.
Some of them require an immediate action, whereas some others
are just here to ease debugging and can be stripped in production.
Be sure to include this information in your log output and properly
configure your application depending on its environment.
Divide and conquer: severity levels
As defined by the RFC 5424 (Syslog protocol)
debug
Detailed debug information
info
Interesting events (user logs in, SQL queries, …)
notice
Normal but significant events
warning
Exceptional occurrences that are not errors
error
Runtime errors that do not require immediate action
critical
Critical conditions, action must be taken
alert
Action must be taken immediately
emergency
Emergency: system is unusable.
Divide and conquer: subsystems
Logs should be categorized by subsystem.
[2017-12-12T14:30:51.965Z] [WARNING] [metadata] Export generation failed: {reason} -- {"reason": "export directory not writable", "export_ref": 42, "export_directory": "/home/sftp/exports"}
tag the subsystem that emits the log
Actionable logs
[2017-12-12T14:30:51.965Z] [WARNING] [metadata] Export generation failed: {reason} -- {"reason": "export directory not writable", "export_ref": 42, "export_directory": "/hom/sftp/exports", "_action": "The export could not be generated because the target directory is not writable. Fix the permissions on the directory and retry the export (`./bin/console tea:export --request=42 --retry`)" }
Correlate logs
[2017-12-12T14:30:51.965Z] [WARNING] [metadata] Export generation failed: {reason} -- {"reason": "export directory not writable", "export_ref": 42, "export_directory": "/hom/sftp/exports"} {"request_id": "7d5e8092-a13d-45b8-adb4-b26c18806825"}
See Monolog's "processors "
to identify the events associated to a given request.
In the same application, but even among applications and servers (business
applications or infrastructure servers such as nginx and such)
For this, a UUID is fine. The earlier it is generated, the better.
I often include two sets of metadata in my log entries:
* the first represents a log context and is associated to the event itself ;
* the other represents extra metadata automatically added to any event emitted by the application.
Monolog handles that nicely using "Processors": https://github.com/Seldaek/monolog/blob/master/doc/02-handlers-formatters-processors.md#processors
Summary
determine a logging strategy : what? for who? why?
write explicit and unambiguous messages
use standard notations
include as much relevant context as you can