prometheus

Understanding Prometheus

In this article we will go through the main characteristics of Prometheus monitoring system. We will take you through its history first, then its design and its main components. You will understand the main benefits of using Prometheus over the rest of alternatives in the market.

Let’s start!

History

The name Prometheus comes from Greek mythology. Prometheus was the Titan God of fire and it’s known from stealing fire from the Gods and giving it to humanity in the form of technology, knowledge and civilisation.

Prometheus is also well known for his intelligence and for being a champion of mankind.

Coming back to our main topic, monitoring systems, we could see the correlation there. In some sense, Prometheus monitoring system has also stolen fire from the Gods to give it to us in the form of technology and knowledge to be able to monitor our systems easily!

Prometheus was designed from scratch after years of experience with monitoring systems, therefore, it has a tactical advantage over its oldest competitors. It is also one of the members of the Cloud Native Computing Foundation (CNCF), being the second member to join the initiative after Kubernetes.

It was initially developed at SoundCloud as a need to fulfil observability requirements that weren’t possible to achieve using existing tools such as statsd or graphite. The project was open-sourced from the very beginning and it was inspired in Borgmon, a monitoring tool used at the time at Google.

Some Google’s former employees with experience using Borgmon were involved in the development of Prometheus, that’s where its main influence comes from.

Now that we know something about how Prometheus was started, let’s try to understand how it was designed!

Design

Prometheus architecture, without going into too much details, is composed of three main components: scraping service, alert manager and http api.

You can find below a basic diagram of how Prometheus system is structured, we’ll cover briefly each of its components later.

prometheus
Image Credit: Author

For now, let’s try to understand some basic concepts about Prometheus design.

Prometheus has been designed in a way that favours availability over accuracy in their metrics, it has also been designed in a way that a single instance of Prometheus can handle millions of metrics, or time series.

A metric in Prometheus is basically a group of time series. What is a time series then? We could say it’s a data stream of values, grouped by the same metric name and labels, which are associated with its corresponding timestamp. Therefore, a metric is composed by a set of time series.

Every Prometheus node is independent, this means that every node contains its own local storage. Initially we could think that this is something bad, but is it actually bad?

Actually Prometheus simplifies things considerably by making nodes independent, we can monitor a considerable number of applications with just one node and we don’t need to worry about merging different time series from different nodes.

Now that we have a basic understanding around its components, let’s see how Prometheus stores data!

Storage

It’s very important to understand how Prometheus stores time series, as this gives you an understanding of what kind of metrics you should or should not register in Prometheus. It also helps to understand why a certain query could be slow.

Prometheus metrics are grouped in blocks of 2 hours. For every two hours we have a directory that will contain all the time series for that timeframe.

Each of these directories will contains a chunks subdirectory, a metadata file called meta.json and an index file that maps metrics and labels to the corresponding file under chunks subdirectory.

./data
β”œβ”€β”€ 01BKGTZQ1SYQJTR4PB43C8PD98
β”‚   β”œβ”€β”€ chunks
β”‚   β”‚   └── 000001
β”‚   β”œβ”€β”€ tombstones
β”‚   β”œβ”€β”€ index
β”‚   └── meta.json
└── wal
    β”œβ”€β”€ 000000002
    └── checkpoint.00000001
        └── 00000000

The samples in the chunks are grouped together into one or more segments of up to 512MB each.

When series are deleted, records to be deleted are persisted in a tombstones file so they can be deleted at some point.

One important thing to know is that the current block for incoming samples (of up to 2 hours since the block was created) is not fully persisted, it’s persisted against a WAL (write-ahead log) that can be replayed when Prometheus restarts or crashes to avoid losing any data.

WAL files are persisted in the wal directory in segments of 128MB. These files contain raw data that hasn’t been compacted yet, therefore, they are considerably bigger than chunk files. Prometheus will maintain a minimum of three WAL files, although in larger systems it could use a much higher number of WAL files to keep at least two hours of raw data.

These initial two-hour blocks will eventually be compacted and converted into chunks by background jobs.

It’s very important to mention that local storage is not clustered or replicated, therefore periodic backups and the use of RAIDs to make data redundant are recommended. This is to save our data from being lost in the case of not being able to recover our Prometheus node.

Alternatively, external storage could also be used with a prior thorough analysis of the possible options, together with the implications of choosing each of them.

This is the very basics of Prometheus storage, if you’re interested in knowing more about how everything works, you can read our article “How a Prometheus Query Works”. Now let’s see some of the limitations we’ll encounter when working with Prometheus!

Limitations

One of the limitations of Prometheus is that, due to being a pull-based system, the timestamps for each of the scrapes could vary for each time series. This means that the time between two scrapes will never be constant, although most of the times it’ll be a small difference, depending on the network conditions the difference could be higher.

One more thing to keep in mind is that for gauge metrics, values that can go up and down during time, we could miss values if the scraping interval is higher than the frequency at which gauges are being generated.

For example, in the image below you will be able to see what would happen if our scraping interval is configured to run every t seconds and there are intermediate values generated between each scrape.

prometheus - gauges limitation
Image Credit – Author

As you can see, the resulting graph in our dashboards could be missing important details, so please always keep this in mind when using gauges and set your Prometheus scrape interval accordingly.

The good news is that you won’t have to worry about this when using other metric types like counters, as they constantly increase in time.

Another important limitation in Prometheus is query performance. Queries could get quite slow in Prometheus, this is mainly due to the fact that Prometheus stores each time series in a set of append-only files, so the time that it takes to run a query will depend on the number of files and different time series we’ll have to access, and also in the amount of timed values we have to fetch for each time series. To simplify things, a query’s performance will depend mainly on the cardinality of its labels and in the number of blocks included within the timeframe we’re querying data from.

This means that we should always keep in mind when using labels, that using high cardinality values for labels is not recommended, as this will increase the number of time series in a metric hugely and it’ll affect query performance. The same applies to query very wide time ranges, as it’ll include multiple blocks in the query.

Let’s look now at one of its main components, the scraping service!

Scraping service

The scraping service is probably one of the most important parts of Prometheus system, without it we wouldn’t be able to have metrics at all.

Prometheus have been designed as a pull-based system, what means that prometheus is responsible for gathering metrics from all the configured services or applications, something we call “scraping”.

What are the advantages of this approach? Well, the main advantage is that your instances won’t care if Prometheus is running or not. Prometheus is responsible for scraping all the configured targets, the only thing that applications will have to do is just expose their metrics. Once that’s done, Prometheus should be able to scrape metrics from these applications.

In order to be able to get these metrics, we’d have to configure the scraping in Prometheus config. There are two things we’d have to configure: the scraping interval and the scraping configurations themselves.

To tell Prometheus how frequently do we want to scrape metrics, we have to define this in the global section in the following way:

global:
  scrape_interval:     15s

Once we’ve done that, we need to specify what metrics endpoints do we want Prometheus to monitor. For example:

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets:
          - localhost:9090

  - job_name: 'my-service-metrics'
    metrics_path: /private/metrics
    scrape_interval: 15s
    scrape_timeout: 10s
    honor_labels: true

You can see above how we’ve defined a scrape config to tell Prometheus to monitor itself by scraping its own metrics. Keep in mind that, by default, Prometheus will scrape metrics on /metrics endpoint.
You can see how we’ve overriden that in the previous example by specifying /private/metrics as the endpoint to be scraped for my-service-metrics job.

You can also override scrape_interval for a particular job if for any reason you need to use different scrape intervals than the one specified in the global configuration.

You can check the official docs for more information.

Alert Manager

Alert Manager is the component responsible for handling alerts, but not for detecting them, we should have this clear from the very beginning.

We have to configure alerts to tell Prometheus what kind of metrics to look for to check that everything is working well, if any of these metrics breaches the limits configured in our alert, Prometheus should notify alert manager about an alert being triggered.

Alert manager is responsible for grouping these alerts, detecting possible duplicates and sending notifications through any of the 3rd-party integrations available in the market for Prometheus’ alert manager.

The first thing we’ll have to do is telling Prometheus where to find the configuration for our alerts and where to find alert manager to be able to notify it. This can be done by using the following:

rule_files:
  - /etc/prometheus/rules/alert-rules/*.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets: [ 'my-alertmanager']

Once we have that, it’s time to configure our first alert. An alert example could be something like this:

groups:
- name: example
  rules:
  - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
      severity: page
    annotations:
      summary: High request latency

You can find more about alerting rules here.

HTTP API

Another important component in Prometheus is its HTTP API. This API is the main entry point to fetch the data we need from our metrics; either from our dashboards, from Prometheus Expression Browser or from any other HTTP client.

Although the API makes it easier to fetch metrics, Prometheus has created its own query language to fetch metrics, this means that we’ll have to pass a PromQL query to the API in order to get some metrics from our Prometheus server.

In the screenshot shown below, you can see how Prometheus Expression Browser calls Prometheus API:

prometheus - expression browser

You can see in the browser console how it calls /api/v1/query?query=...&time=... endpoint in the API. Prometheus Expression Browser can be found under /graph path in your Prometheus server if you need to play around with any queries.

If you want to call the API directly, you could do something like this:

curl 'http://localhost:9090/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z'

The time parameter also accepts a timestamp to define the time that you’re interested in fetching metrics from.

If you are interested in learning more deeply about Prometheus monitoring system, we highly recommend this book: “Prometheus: Up & Running”.

Conclusion

In this article we’ve learned how Prometheus can help us monitoring our services or applications and how it’s been designed to serve that purpose.

In future articles we’ll look at more specific aspects about Prometheus and how can we use Prometheus in a better way for our own advantage.

That’s all from us today! We hope you’ve found this article useful and hopefully learned something new.

Please follow us if you’re interested in reading more articles like this one!

Thanks for reading us!