Skip to content

Initial example of generating Prometheus SDK metrics from Otel semconv with the weaver tool. #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 31, 2025

Conversation

bwplotka
Copy link
Owner

@bwplotka bwplotka commented Jan 29, 2025

First iteration of the use of weaver to define and generate Prometheus SDK metrics in Go.

It was surprisingly fast and simple to developo, kudos to Otel community for this 💪🏽 Took mi ~2.5h to craft this. The weaver tool is ultra fast, with super helpful diagnostics and errors, super useful (I used not even prod optimized version).

Generally this forge doc was a good place to start for me. Some initial list of things to potentially improve/contribute to weaver/semconv:

  • Documentation
    • It was like 10y+ since I develop j2 templates, some good docs would be nice to link. I was using bunch of different resources e.g. 1, 2.
    • I spent some time looking on internet where the kebab_case function comes from and what are other functions like this. Turns out those are defined in weaver, ideally we link to this file or so.
    • I eventually found it, and it's linked somewhere, but semconv schema should be on the top of docs json, md more concise.
    • More accessible link to weaver YAML schema (link.
    • The filter parameter is quite confusing at the start. Perhaps it's worth to at least document what's the behaviour without filter or simple filter e.g. groups. (the fact that you have ctx variable with each setting).
  • Semconv schema
    • IDs and references
      • For code generation it's often desired to know "namespace" and shortest, ID for each metric/attributes. In example I see many different styles for ID format. There might be problem where some generators/templates will simply assume certain format e.g. in my example I take only last part of ID after . for metric identification. For namespace (e.g. for unique Go package name) I would take something that will give me my-app from somewhere. There's a little bit of chaos here, not sure what would help - maybe explicit namespace and short_name for metrics and labels? The prefixed attribute names in the official Otel semconv does not help here 🙈 Another solution would be to have some weaver j2 helper functions for that.
      • Generated code is generally oververbose and have long constructs (for Go that's not idiomatic). Anything that will make it shorter would help.
    • Still not clear why some values of unit is inside curly brackets e.g. "{goroutine}" and some are strings like “By”.
    • Requirement levels are overwhelming. Why using semantic version for registries if we have to set requirement for every single element (including label names and label values(?))?
    • Use with Prometheus metrics
      • It's actually doable, but ofc you have to understand that attribute is simply label and instrument is a metric type. Definitely ok for the first pass, but perhaps there's some room for simplified schema that generate Otel semconv schema 🙈

Next steps for our demo/work @vesari

  • Make it more type safe (see todos in the code)
  • Play with diff, rename strategies.

@bwplotka bwplotka changed the title Initial example. Initial example of generating Prometheus SDK metrics from Otel semconv with the weaver tool. Jan 30, 2025
@bwplotka bwplotka requested a review from vesari January 30, 2025 09:50
@bwplotka bwplotka marked this pull request as ready for review January 30, 2025 09:50
"of a very important metric that everyone is using.",
}, []string{"integer", "category", "fraction"})
customStableMetric.WithLabelValues("101", "AType", "1.22314").Inc()
switch *metricDefinition {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope the intention is clear. This binary will help us showcase the case of e.g. same type of container runs with different versions where metric name changed between e.g.

my-app [email protected]
my-app [email protected]

@jsuereth
Copy link

This looks great!

I need to process your feedback and add issues for us to work through. Thought I'd answer an "easy" one though.

Still not clear why some values of unit is inside curly brackets e.g. "{goroutine}" and some are strings like “By”.

This is how otel semconv defined units: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#instrument-units

It comes from the UCCM standard for units.

@jsuereth
Copy link

Requirement levels are overwhelming. Why using semantic version for registries if we have to set requirement for every single element (including label names and label values(?))?

This could be the complexity of spans/events leaking through to metrics. On metrics we tend to default everything to required or "opt_in" (i.e. user most provide a feature flag for the label). On Spans or Events(logs) it's more likely the set of labels has recommended or not-always available things. Since weaver is designed for all of them, and attributes (labels) are shared between them all, you see that hit metric definitions.

cc @lquerel @lmolkova for thoughts on simplifying metric definitions or defaulting to required.

@dashpole
Copy link

Still not clear why some values of unit is inside curly brackets e.g. "{goroutine}" and some are strings like “By”.

See "curly braces" in https://ucum.org/ucum. Curly braces are an "annotation", and are equivalent to a unit of 1. OTel typically uses {foo} to mean "count of foo". But per UCUM, it is equivalent to not having a unit.

@lmolkova
Copy link

lmolkova commented Jan 30, 2025

Thanks for the feedback! really appreciate it!

A few questions to clarify

For code generation it's often desired to know "namespace" and shortest, ID for each metric/attributes.

Do you mean for something like foo.bar.baz having namespace = foo.bar and short_id = baz? What would you do in the codegen for it?

class FooBar {
   Instrument createBaz(...)
   ...
}

?

The prefixed attribute names in the official Otel semconv does not help here

you mean that otel attributes are fully qualified and it's not the case for prometheus?

Requirement levels are overwhelming. Why using semantic version for registries if we have to set requirement for every single element (including label names and label values(?))?

back to @jsuereth comment - check out http request duration. We document (for instrumentations and consumers) that error.type is only set when an error happens. It's not obvious :)

For codegen purposes the difference they make is that:

  • required attributes would be required (non-nullable, positional) parameters in the recordHttpClientRequestDuration(..) function call
  • conditionally required and recommended would be optional/nullable/kwargs/etc
  • opt-in would be something codegen would add by first checking with config if the feature is enabled

I'm curious how you see the world without them :)

@lquerel
Copy link

lquerel commented Jan 30, 2025

@bwplotka @vesari

I need to go through all of your feedback. Your insights are extremely valuable and will most certainly lead to many improvements.

FYI, there is an example of a type-safe API for Rust in the Weaver repository, which I believe aligns with the direction you want to take. I had started exploring a Go version of this type-safe approach. As soon as I get my hands on it, I’ll share it with you.

The Rust experiment is visible here -> https://github.com/open-telemetry/weaver/tree/main/crates/weaver_codegen_test

Thank you!

@lquerel
Copy link

lquerel commented Jan 31, 2025

@bwplotka @vesari I retrieved a very old version of Weaver that predates the migration to the OTEL repo and contains a version of my experiment with a type-safe API for Go. The generated API followed the OTEL model, but the general approach is likely applicable to Prometheus as well.
Due to the limitations of the Go type system and the specific semantics of struct initialization in Go (where struct fields not explicitly declared during initialization are automatically assigned default values), I had to take a slightly different approach than in the Rust version. However, in the end, this approach provided an API surface that effectively prevented most potential misuse.

https://github.com/f5/otel-weaver/tree/main/templates/go/otel

The .tera files are essentially Jinja templates (just implemented using a different template engine). The following three files are the most relevant for your use case:

  • templates/go/optional_attrs.macro.tera: Generates the interfaces (markers), functions, and wrapper structs used to represent optional parameters.
  • templates/go/required_attrs.macro.tera: Generates the functions and wrapper structs used to represent required parameters.
  • templates/go/otel/meter/metric.tera: Generates the packages and functions used to represent different metrics.

The wrapper structs are used to create a distinct custom type for each attribute. So, even though both http_method and http_url are strings for example, they are treated as two separate types at the API level. The wrapper struct acts as a decorator around the string type, which should be zero-cost once compiled.

Interface markers were used to ensure that attributes could only be used in the appropriate context.

Let me know if you need more details. I could also potentially port this experiment to the latest version of Weaver.

@lquerel
Copy link

lquerel commented Jan 31, 2025

For the documentation improvements, I created the following issue to track progress.

open-telemetry/weaver#583

Copy link
Collaborator

@vesari vesari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a question about a future TODO. Wonderful work, thank you!

return promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
Name: "my_app_custom_elements_total",
Help: "Custom counter metric for my app counting important elements. It serves as an example of a very important metric that everyone is using.",
// Unit: "elements" // TODO(bwplotka): Add Unit as one of the supported options.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related to this PR: have you already thought how we should add Unit support?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, we add a field and put that in the UNIT field I presume (but it's not mandatory). So some part of your initial PR if you want to get that done 💪🏽

Signed-off-by: bwplotka <[email protected]>
@bwplotka
Copy link
Owner Author

Amazing, thanks everybody for feedback!

The unit definition now makes sense, noted 👍🏽

Thanks for creating issue with all the feedback around docs @lquerel 💪🏽

Answering your @lmolkova questions:

Namespacing:

For code generation it's often desired to know "namespace" and shortest, ID for each metric/attributes.

Do you mean for something like foo.bar.baz having namespace = foo.bar and short_id = baz? What would you do in the codegen for it?

class FooBar {
  Instrument createBaz(...)
  ...
}

?

Yea, that's one way of trying to get to the short, yet unique name for the group of attributes in a single "Go" package (or group of constructs in any other language). For id=foo.bar.baz then namespace = foo.bar and short_id = baz example this gives us e.g. in Go

package foo_bar

func NewBazCounter(...) ...

Instead of

package semconv // The same package name for all generated metrics from all registries.

func NewFooBarBazCounter(...) ...

However, that id format is obviously not consistent e.g. if we take http.client.request.duration. This is a bit similar to protobuf where you have a package and then it's not like every proto Message has each message prefixed with a package name, but something that is unique within a package.

ANYWAY, not a blocker. This might be a little bit inconvenient for Go, in Java long names might be desired etc. Trying to optimize for the dev experience using that generated code.

The prefixed attribute names in the official Otel semconv does not help here

you mean that otel attributes are fully qualified and it's not the case for prometheus?

Correct, and I understand why they are so (needed for more reliable correlation), but it's just not helping in my goal for the shortest unique string. (Plus it might be cumbersome to use in any metric query language, but that's a separate issue 🙈 ).

As per requirement levels:

For codegen purposes the difference they make is that:

  • required attributes would be required (non-nullable, positional) parameters in the recordHttpClientRequestDuration(..) function call
  • conditionally required and recommended would be optional/nullable/kwargs/etc
  • opt-in would be something codegen would add by first checking with config if the feature is enabled
    I'm curious how you see the world without them :)

Thanks for explaining, those advanced cases makes sense generally. However, common Prometheus SDKs paths usually makes it simpler and you always have static number of dimensions/labels per binary and it makes sense in the local binary scope. You are right that some attributes/label optionality could make sense here 🤔 I thought we could use registry versioning to tell all metrics are required there and have separate one for experimental ones...

Anyway, I removed all requirement fields from my semconv metrics and assume all is required to show simple usage.

@bwplotka
Copy link
Owner Author

@lquerel noted the Go generation for Otel you did in https://github.com/f5/otel-weaver/tree/main/templates/go/otel

You follow the typed methods which what I have here too https://github.com/f5/otel-weaver/blob/main/templates/go/otel/meter/metric.tera#L54

This is great but my thoughts are bit further, what we can optimize here given the underlying Prometheus client_golang SDK internals. Eventually all label values are string, so it's bit expensive to translate to string on every metric reference. Anyway, lot's of cool fun options with the generated code here! 💪🏽

@bwplotka bwplotka merged commit 22a4beb into main Jan 31, 2025
@lquerel
Copy link

lquerel commented Jan 31, 2025

@bwplotka I completely agree with you on this. The goal of this proof of concept was to add a type safety layer on top of the generic SDK client.

The medium-term objective is to create optimized SDK clients for a specific registry (per service/app), effectively removing all existing abstraction layers. This would significantly reduce the overhead related to instrumentation.

It’s just that, at the time, I didn’t have the time to do it.

If label values must all be strings for Prometheus, that’s precisely the kind of thing that can be specifically adapted/optimized in the generated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants