Today’s apps and services run across hundreds and thousands of servers, and managing the logs they produce means following best practices to avoid being overwhelmed. Aside from being deliberate in how you create your log messages, following best practices also involve using log management tools with the capabilities you need to monitor your infrastructure and troubleshoot issues.
In this article, I’ll cover a list of best practices developers responsible for all but the smallest software should be following. This list covers everything from carefully choosing which data to include in log messages to the place where log files are stored. There’s nothing more useless than a log file without the data you need or one you can’t access quickly when you need it.
7 Best Practices for Log Management
- Aggregate Your Logs
- Use Structured Log Messages
- Include Context in Log Messages
- Store as Much Data as You Can
- Monitor in Real-Time
- Apply Access Controls
- Use a Cloud-Based Log Aggregator and Analyzer
1. Aggregate Your Logs
Long gone are the days when you could SSH into each individual server to inspect its logs. With the massive scale at which many apps and services run today, you need to pull them into a central location to make analysis easier. Cloud-based log aggregation services such as SolarWinds® Papertrail™ and SolarWinds® Loggly® are ideally suited for this. Whether your logs are coming from standard servers, mobile phones, or embedded devices like routers, cloud-based log aggregation services can ingest your log files and automatically parse them. These tools provide a single interface so you can search across your logs from one place.
2. Use Structured Log Messages
When you’re developing a new feature or fixing a bug, it’s tempting to write whatever log message you think makes sense without giving much thought to using it later. And if you’re dealing with legacy apps, you’ve probably inherited a code base with a mess of free-text log statements. However, it’s difficult to parse log messages without a structure because parsers don’t know how to break up the strings into individual pieces.
Structured logging provides a way to clearly separate the components of your log messages, making them much simpler to search later. Here’s an example using structured logging from the NLog framework:
logger.Debug(“{shopitem} added to basket by {user},” new { Id=6, Name = “Jacket,” Color = “Orange” }, “Kenny”);
Here, the template message is clearly separated from the data applied to it. Many tools can use this to make searching faster and easier.
3. Include Context in Log Messages
If you need to search through more than a handful of log messages, unique identifiers such as IP addresses and user IDs are vital in helping you filter out unwanted data. Including context is much easier once you’re using structured logging because the fields providing context can be specified by your format.
But context is more than just the usual set of identifiers; including context also means including data to differentiate one event from another. This can be anything from the source line where the log message was generated to the expression in your code that caused a check to fail and an error condition to be detected. These details will help you out in the future when you’re staring at logs and troubleshooting problems.
4. Store as Much Data as You Can
Logs are used in two ways: as the basis for real-time metrics and other statistics capable of showing you the health of your services and as a way to look into the past execution of your code. This second use case requires you to store as much of your log data as you can, as you never know when you might need it. Uncovering insights into the evolution of your app’s behavior and performance over time also requires you to be able to visualize log data over a period of days or weeks to spot patterns and trends.
And let’s not forget log retention policies, which often require you to keep your log data around for years to comply with audit requirements and related rules. In these situations, purging old logs too quickly can have dramatic consequences.
Of course, with huge volumes of log data, you’ll need a way to store it so your files are accessible to the entire team. The way you store it should also allow you to increase capacity. Storing your logs in the cloud allows your storage capacity to scale in line with your log data demands without compromising on how quickly you can access it—adding more capacity or log data doesn’t slow down retrieval. Storage services like Amazon S3 can enable you to keep your log data safe in the cloud by encrypting it at rest using AES-256 encryption.
5. Monitor in Real-Time
To build real-time metrics systems and troubleshoot problems as they happen, you need a way to monitor your logs in real-time. There’s no way to see how your app is behaving (or not behaving if you’re chasing a bug) down to the second or millisecond without constantly updating your view of events, and this ability is critical when you’re investigating service outages or catastrophic failures affecting your users.
Cloud-based log tools allow you to view incoming logs as they’re received using something similar to the tail command. This is how the live tail feature in Papertrail works. The live tail feature allows you to search through incoming logs and pause, filter, and scroll while events are being received. Whenever the data changes, your view of the events is updated to reflect those changes. This can help ensure you’re always looking at the latest data.
6. Apply Access Controls
As teams grow and multiple members need access to a specific subset of log files, you’re going to need to consider establishing access control for those files. It’s unusual for every team member to need equal access to every file, so log management tools provide ways to assign access rights to users for individual log files. If this is too complex for your use case, you can group log files together based on things like the type of server sending them (did they come from production or staging?) or the geographical origin of the logs (did they come from a US data center or an APAC data center?).
Deciding who should be able to purge old logs is crucial, as only trusted members of your team should be able to delete log files. Log management tools help you configure all this and more.
7. Use a Cloud-Based Log Aggregator and Analyzer
Logs power many of the systems we take for granted every day, such as real-time monitoring and alerting and diagnostic data. Managing logs effectively means applying best practices when deciding what log messages to write, where to store them, and how to search through them.
Building the infrastructure to aggregate and analyze logs at scale is difficult, so you should take advantage of existing solutions such as Papertrail and Loggly. These solutions provide an easy way to put these best practices in place. Cloud-based log management tools accept logs from various hardware and in several popular file formats. These files can be parsed automatically and quickly searched using intuitive search interfaces and query syntaxes.
Including as many details as possible in your log messages and using structured logging will pay dividends when you need to use those logs to investigate and analyze problems. Finally, keeping your log files safe and easily accessible is possible with services like Amazon S3, which can scale to virtually unlimited capacity while keeping access times fast.
Sign up for a free trial version of these solutions and figure out which one is most suitable for you. You can start monitoring your logs for free with Papertrail and Loggly.