Skip to content

Automate Metric Documentation #890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Automate Metric Documentation #890

wants to merge 2 commits into from

Conversation

stephaniehingtgen
Copy link

@stephaniehingtgen stephaniehingtgen commented Jul 2, 2021

@bwplotka @kakkoyun
Overview
This PR implements a new function, WriteMetricsMarkdown, which creates a markdown file containing all of the registered metrics with their corresponding help text and labels.

Explanation
The goal for this function to enable developers to automatically generate and keep an up to date METRICS.md file.

Previously to generate this type of file, a developer would have to hit the metrics endpoint for the help text and then manually use that to generate a markdown file. This also wouldn't guarantee that all of the metrics were covered as the vectors may not show up in the metrics endpoint if it has not yet been used.

With the addition of this function, this could be automated for developers, and would ensure completeness as this function returns all of the metrics (including the vectors that have not yet been used). It would also ensure that documentation stays up to date as metrics are added.

Example
This code would be added to a user's code base:

registry := prometheus.NewRegistry()
histogram := prometheus.NewHistogramVec(
		prometheus.HistogramOpts{
			Name: "example_histogram",
			Help: "example histogram",
		},
		[]string{"example"},
	)
registry.MustRegister(histogram)
prometheus.WriteMetricsMarkdown(registry, "METRICS.md", []string{})

The result would be a markdown file like this:

# Metrics

## `example` Metrics
| *Metric* | *Description* | *Labels* |
|--|--|--|
| `example_histogram` | example histogram | `example` |

## `go` Metrics
| *Metric* | *Description* | *Labels* |
|--|--|--|
| `go_gc_duration_seconds` | A summary of the pause duration of garbage collection cycles. | |
| `go_goroutines` | Number of goroutines that currently exist. | |
| `go_info` | Information about the Go environment. | |
| `go_memstats_alloc_bytes` | Number of bytes allocated and still in use. | |
| `go_memstats_alloc_bytes_total` | Total number of bytes allocated, even if freed. | |
| `go_memstats_buck_hash_sys_bytes` | Number of bytes used by the profiling bucket hash table. | |
| `go_memstats_frees_total` | Total number of frees. | |
| `go_memstats_gc_cpu_fraction` | The fraction of this program's available CPU time used by the GC since the program started. | |
| `go_memstats_gc_sys_bytes` | Number of bytes used for garbage collection system metadata. | |
| `go_memstats_heap_alloc_bytes` | Number of heap bytes allocated and still in use. | |
| `go_memstats_heap_idle_bytes` | Number of heap bytes waiting to be used. | |
| `go_memstats_heap_inuse_bytes` | Number of heap bytes that are in use. | |
| `go_memstats_heap_objects` | Number of allocated objects. | |
| `go_memstats_heap_released_bytes` | Number of heap bytes released to OS. | |
| `go_memstats_heap_sys_bytes` | Number of heap bytes obtained from system. | |
| `go_memstats_last_gc_time_seconds` | Number of seconds since 1970 of last garbage collection. | |
| `go_memstats_lookups_total` | Total number of pointer lookups. | |
| `go_memstats_mallocs_total` | Total number of mallocs. | |
| `go_memstats_mcache_inuse_bytes` | Number of bytes in use by mcache structures. | |
| `go_memstats_mcache_sys_bytes` | Number of bytes used for mcache structures obtained from system. | |
| `go_memstats_mspan_inuse_bytes` | Number of bytes in use by mspan structures. | |
| `go_memstats_mspan_sys_bytes` | Number of bytes used for mspan structures obtained from system. | |
| `go_memstats_next_gc_bytes` | Number of heap bytes when next garbage collection will take place. | |
| `go_memstats_other_sys_bytes` | Number of bytes used for other system allocations. | |
| `go_memstats_stack_inuse_bytes` | Number of bytes in use by the stack allocator. | |
| `go_memstats_stack_sys_bytes` | Number of bytes obtained from system for stack allocator. | |
| `go_memstats_sys_bytes` | Number of bytes obtained from system. | |
| `go_threads` | Number of OS threads created. | |

Notes
If there is concern over the additional map and memory, I can modify this to not include uninitialized vectors and make it a requirement that those are initialized before running this function. Then I could use the Gather function and loop through that.

Signed-off-by: Stephanie Hingtgen <[email protected]>
@stephaniehingtgen stephaniehingtgen changed the title Automatic Metric Documentation Automate Metric Documentation Jul 2, 2021
@stephaniehingtgen
Copy link
Author

The test failing on CI is TestSummaryDecay, which doesn't include code that I touched. It also is passing locally for me. Would someone be able to run that test again?

@bwplotka
Copy link
Member

Thanks for this and your work!

I love this and its use case - it is definitely important. However, I believe there are already tools for it. It was written by our friend @yeya24 https://github.com/yeya24/promlinter We might need to just document better how to use this, and popularize it all. We can even make a small integration test! If there is something missing we could contribute there. What do you think?

What I don't like specifically in this PR is the addition of markdown map in heavily used and optimized registry structure. I would optimize here for efficiency of struct than some extra CPU to calculate all with existing information if possible.

Having external tool is even better IMO (:

@bwplotka
Copy link
Member

I might think we can add some extra opts to render nice markdown for https://github.com/yeya24/promlinter 🤗

@yeya24
Copy link
Contributor

yeya24 commented Jul 24, 2021

Actually, I love this idea. Having linting/markdown support in the instrumentation library solves the limitation of the static linter (no runtime support so dynamic instrumentation is not supported).

But having prometheus.WriteMetricsMarkdown(registry, "METRICS.md", []string{}) line in the code seems weird. Ideally, it is better for a CLI tool to do this.

For the extra performance cost, I feel it is okay as one Prometheus registry usually doesn't have too many metrics. @stephaniehingtgen Do you have any benchmarks showing how much extra memory does it cost?

@bwplotka
Copy link
Member

But having prometheus.WriteMetricsMarkdown(registry, "METRICS.md", []string{}) line in the code seems weird. Ideally, it is better for a CLI tool to do this.

What do you propose? (:

@bwplotka
Copy link
Member

bwplotka commented Aug 2, 2021

Open questions right now:

  • Can we use outside tool to generate it?
  • How deal with wrapped metrics (label or prefix?)

@stephaniehingtgen
Copy link
Author

stephaniehingtgen commented Aug 6, 2021

@stephaniehingtgen Do you have any benchmarks showing how much extra memory does it cost?

By adding this after the WriteMetricsMarkdown function:

var m runtime.MemStats
runtime.ReadMemStats(&m)
// For info on each, see: https://golang.org/pkg/runtime/#MemStats
fmt.Printf("Alloc = %v b", m.Alloc)
fmt.Printf("\tTotalAlloc = %v b", m.TotalAlloc)
fmt.Printf("\tSys = %v b", m.Sys)

I get this as the result:
(before):
Alloc = 2798344 b TotalAlloc = 6337688 b Sys = 75580424 b

(after):
Alloc = 2841664 b TotalAlloc = 6362824 b Sys = 75318280 b

This is a registry with 70 metrics registered.


But having prometheus.WriteMetricsMarkdown(registry, "METRICS.md", []string{}) line in the code seems weird. Ideally, it is better for a CLI tool to do this.

Totally open to changing the function name & parameters - let me know what you think would be better :)


  • Can we use outside tool to generate it?

From what I can tell, it seems like there are two things that the outside tool cannot do:

  1. Create documentation automatically: it seems the tools have to be run separately rather than having it done automatically when the program is run. This leaves a chance that a developer may forget to run the tool to update the markdown, which would make documentation out of date.
  2. Be complete: If a vector has not yet be used, the vector will not show up in /metrics, and thus it will missed.

Please correct me if I'm wrong on either of those fronts.


  • How deal with wrapped metrics (label or prefix?)

Could you clarify this a bit? I'm not exactly sure what this is asking.

@bwplotka
Copy link
Member

Thanks for checking on those!

Indeed performance is fine.

Can we use outside tool to generate it?
From what I can tell, it seems like there are two things that the outside tool cannot do:

Create documentation automatically: it seems the tools have to be run separately rather than having it done automatically when the program is run. This leaves a chance that a developer may forget to
run the tool to update the markdown, which would make documentation out of date.
Be complete: If a vector has not yet be used, the vector will not show up in /metrics, and thus it will missed.

I don't think documentation automation with separate CLI is a strong argument, it should be easy to set up CI for all of this as we do with any other linting. Also not sure how that helps with automation. Can you explain what would be the process of exporting the list of metrics? When we would run this Write function? We will do this at the start? (this means we might not have all metrics registered at this point too)?

How deal with wrapped metrics (label or prefix?)
Could you clarify this a bit? I'm not exactly sure what this is asking.

This is something that https://github.com/yeya24/promlinter can't do: If project wraps are registered with custom label names or prefix to metric names (see

func WrapRegistererWithPrefix(prefix string, reg Registerer) Registerer {
), static analysis will have hard time to find that extra info as it's evaluated in runtime.

I still have high hopes and think that static CLI tool can traverse code and find even usages of those wraps.

There might be alternative too:

@stephaniehingtgen
Copy link
Author

I still have high hopes and think that static CLI tool can traverse code and find even usages of those wraps.

There might be alternative too:

Sounds good! I'll go ahead and close this PR. Thanks for talking through it with me! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants