Modern Application Log Handling with Structured Logs

Modern Application Log Handling with Structured Logs

Logs are one of the three pillars of observability. They allow you to understand what happened in your application and are valuable in debugging issues.

However, as the number of users of an application increases, the volume of the generated logs can grow exponentially. This makes it harder to query and find the log entries that are related to the issue at hand. Additionally, the more log messages are generated the higher the storage needs and the associated costs.

In order to make querying logs more precise, most log management tools nowadays support structured logs. That is, applications can emit log messages in JSON or key-value formats and the log management tool will parse them and allow querying them by key. This makes it easier to target specific log messages in your queries than searching for raw text.

If you add the proper context as key-value pairs to a log message, it is now easier to find all log messages that are related to that context. In terms of a request-based process, such as a web application, some of the most useful information in this context is a request ID that is assigned to the request and is added to all log messages related to this request.

Structlog


In Python a library that makes this simple is structlog. Structlog allows you to emit structured logs (as well as easier to read formats for local development). All application logs can then be forwarded to your log management tool for querying.

Structlog uses a list of processors to process each log message, where each processor can filter, change or format the log message.

Here is an example of how to configure and use structlog:

import structlog
    
structlog.configure(
    processors=[
        structlog.stdlib.add_log_level,
        structlog.processors.JSONRenderer()
    ],
)
    
logger = structlog.get_logger()
    
logger.info("user-created", user_id="10")

which emits the following message:

{"user_id": "10", "event": "user-created", "level": "info"}

Every log message can have as many key-value arguments and all of them will be added to the output. This way you can search for log messages with a specific key and value.

As mentioned above, one type of key that is useful to have for request-based processes, such as a web application, is a requestID associated with the request. However, adding that ID to all log messages can be tedious, requiring you to pass it to functions as an argument or make it available as a thread-local variable and remember to append it to all log messages your application uses.

One way to solve that problem with structlog is to define context variables that will be added automatically to all log messages emitted. The only trick is to clear the context before each request starts, so that each request uses its own logging context. Here is an example of how to do that for a hypothetical process that uses threads to handle requests:

import contextlib
import time
import uuid
import structlog
import structlog.contextvars

logger = structlog.get_logger()

@contextlib.contextmanager
def new_logging_context():
    structlog.contextvars.clear_contextvars()
    yield


def setup_logging():
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.stdlib.add_log_level,
            structlog.processors.JSONRenderer()
        ],
    )


def handle():
    logger.info("starting-to-handle-request")
    time.sleep(0.2)
    logger.info("finished-handling-request")
    

def request():
    with new_logging_context():
        structlog.contextvars.bind_contextvars(request_id=str(uuid.uuid4()))
        handle()


def main():
    setup_logging()
    threads = []
    count = 0
    while count < 5:
        thread = threading.Thread(target=request)
        thread.start()
        threads.append(thread)
        count += 1
    for thread in threads:
        thread.join()


main()

A sample output of the above example is:

{"request_id": "d3eaf4fb-058f-443a-93b4-d0dcb113e2f3", "event": "starting-to-handle-request", "level": "info"}
{"request_id": "3f32645c-edbd-42fa-ac4d-e5d7c5f273ef", "event": "starting-to-handle-request", "level": "info"}
{"request_id": "4df3046c-d0e8-408b-81f6-38f83ca32877", "event": "starting-to-handle-request", "level": "info"}
{"request_id": "aa9a2398-539b-456b-a15e-6942be7b5fd6", "event": "starting-to-handle-request", "level": "info"}
{"request_id": "15f1ffc7-7d24-457b-84eb-919e4a02ff87", "event": "starting-to-handle-request", "level": "info"}
{"request_id": "aa9a2398-539b-456b-a15e-6942be7b5fd6", "slept_for": 0.17411926914473186, "event": "finished-handling-request", "level": "info"}
{"request_id": "4df3046c-d0e8-408b-81f6-38f83ca32877", "slept_for": 0.4072749644353413, "event": "finished-handling-request", "level": "info"}
{"request_id": "d3eaf4fb-058f-443a-93b4-d0dcb113e2f3", "slept_for": 0.5612234200637156, "event": "finished-handling-request", "level": "info"}
{"request_id": "3f32645c-edbd-42fa-ac4d-e5d7c5f273ef", "slept_for": 0.8508387844837196, "event": "finished-handling-request", "level": "info"}
{"request_id": "15f1ffc7-7d24-457b-84eb-919e4a02ff87", "slept_for": 0.9082839638874155, "event": "finished-handling-request", "level": "info"}

In the above example we added a new processor at the beginning of the list, structlog.contextvars.merge_contextvars. This processor will automatically add any context variables to the log message.

Additionally, the application uses a context manager, new_logging_context that calls structlog.contextvars.clear_contextvars to clear the logging context before each loop.

Reducing the Number of Log Messages

Using structured logs and adding the appropriate context to all log messages helps with finding the right logs and quickly debugging any issues. But an application may still generate a lot of logs, making their management (querying and storage) expensive.

One approach to reducing the size of the logs is to emit a single log message per request that summarizes what happened with the request. One way to do this is to use result objects and return them from the called functions.

In the example below the function handle returns an instance of the Result class that has information about what happened when it ran. That instance is used by the main function to log an appropriate log message.

from dataclasses import dataclass, asdict
import contextlib
import random
import structlog
import structlog.contextvars
import time
import uuid

logger = structlog.get_logger()

@contextlib.contextmanager
def new_logging_context():
    structlog.contextvars.clear_contextvars()
    yield


def setup_logging():
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.stdlib.add_log_level,
            structlog.processors.JSONRenderer()
        ],
    )


def step_one():
    return True


def step_two():
    return random.choice([True, False])


@dataclass
class Result:
    step_one_result: bool = None
    step_two_result: bool = None

    def to_log_dict(self):
        return asdict(self)
    
@dataclass
class Result:
    created_user: bool = None
    assigned_to_plan: bool = None

    def to_log_dict(self):
        return asdict(self)


def handle():
    result = Result()
    result.created_user = create_user()
    result.assigned_to_plan = assign_plan()
    return result


def main():
    setup_logging()
    count = 0
    while count < 5:
        with new_logging_context():
            result = handle()
            logger.info("handle-request", **result.to_log_dict())
            count += 1


main()

A sample output for the above script is:

{"created_user": true, "assigned_to_plan": true, "event": "handle-request", "level": "info"}
{"created_user": true, "assigned_to_plan": false, "event": "handle-request", "level": "info"}
{"created_user": true, "assigned_to_plan": true, "event": "handle-request", "level": "info"}
{"created_user": true, "assigned_to_plan": false, "event": "handle-request", "level": "info"}
{"created_user": true, "assigned_to_plan": false, "event": "handle-request", "level": "info"}

This approach has the added advantage that even for complex operations you can tell at a glance what happened with a request instead of having to go through many log messages and reconstructing the steps the function took.