Define ServiceScopeConfig in ServiceSettings #3464

jaellio · 2025-03-11T00:01:09Z

Part of adding support for Ambient mulitcluster.

Based on https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0#heading=h.nehszyorutrv

#3463

istio-testing · 2025-03-11T00:01:12Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

jaellio · 2025-03-11T00:01:34Z

/test all

mesh/v1alpha1/config.proto

keithmattix

This looks like what we got general consensus on in https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0 across multiple TOC and WG meetings so I'm approving

howardjohn · 2025-03-13T17:46:08Z

mesh/v1alpha1/config.proto

@@ -444,6 +444,47 @@ message MeshConfig {
    //
    // For example: foo.bar.svc.cluster.local, *.baz.svc.cluster.local
    repeated string hosts = 2;
+
+    // Scope configuration to be applied to matching services.


What is the relation between this new setting and the existing hosts/settings?

My thought was that users set one or the other. A oneOf might be better to express that

At this point ServiceScopes will only apply to ambient multicluster. In traditional multicluster today the default availability is global. For ambient multicluster, the default availability is local. For alpha, I think it is unnecessary to support ServiceScopes for both configurations operating with different default availabilities

Is it possible there could be conflict where the host is foo.bar.svc.cluster.local and the namespace or service has istio.io/global label enabled?

Also, do we need to support both namespace and service level? what if they have conflicts?

I think if ServiceScopeConfig is defined, even with a 0 value, the host should be ignored in ambient mode. @keithmattix was that your understanding?

Service selectors overwrite namespace selectors. This behavior should be consistent with the ambient enablement API

Yes, I think hosts should be ignored if ambient is enabled in env vars

howardjohn · 2025-03-13T17:47:07Z

mesh/v1alpha1/config.proto

+    //
+    // ```yaml
+    // serviceSettings:
+    //   serviceScopes:


Suggested change

// serviceScopes:

// - serviceScopes:

serviceSettings is a list of MeshConfig_ServiceSettings. Not entirely sure we need the double nested list?

@keithmattix do you have a preference? I am in favor of it not being double nested

Hmm looking at the proto, I almost feel like ServiceScopes should just be a sibling to ServiceSettings. The other fields in service settings don't make sense for service scope, and eventually, I think the latter will supplant the former

Might be harder to enforce oneof though...

Fair enough...in that case yeah we don't want the extra nesting

I had another thought: I'm not sure if we can do oneOf because new versions of istiod reading old meshconfig would no longer be able to parse it correctly...

Signed-off-by: Jackie Elliott <[email protected]>

jaellio · 2025-04-17T05:46:00Z

@louiscryan Could you please take a look?

jaellio · 2025-05-09T17:28:13Z

I took another read before attempting to approve - but I've also looked at the previous examples and the existing use of the API we are changing, not only the additions.

I don't think I can approve this - maybe I'm failing to understand something that is obvious to everyone else, but as an Istio developer - or even as an user - the original intent of the API, to define 'cluster local' scope on the client side, defined as "if service is matching, clients will use cluster local - else global' and with a client-scope - mixed with the addition which appear to impact discovery and server side - is completely confusing and contradictory.

Thanks for sharing - we've reviewed this API in several WG meetings and there is an existing doc that shares this API https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0#heading=h.nehszyorutrv which you approved. I wish these fundamental concerns had been discussed sooner. I am not sure how to move forward here without having another discussion on the fundamentals of the design.

I'll let @keithmattix chime in here on why we're not using the existing sidecar mode fields for multicluster configuration.

costinm · 2025-05-09T17:43:30Z

On Fri, May 9, 2025 at 10:28 AM Jackie Maertens (Elliott) < ***@***.***> wrote: *jaellio* left a comment (istio/api#3464) <#3464 (comment)> I took another read before attempting to approve - but I've also looked at the previous examples and the existing use of the API we are changing, not only the additions. I don't think I can approve this - maybe I'm failing to understand something that is obvious to everyone else, but as an Istio developer - or even as an user - the original intent of the API, to define 'cluster local' scope on the client side, defined as "if service is matching, clients will use cluster local - else global' and with a client-scope - mixed with the addition which appear to impact discovery and server side - is completely confusing and contradictory. Thanks for sharing - we've reviewed this API in several WG meetings and there is an existing doc that shares this API https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0#heading=h.nehszyorutrv. I wish these fundamental concerns had been discussed sooner.

In isolation the document and intent are very good - but it is hard to review a proposal if the larger context and details are missing. To clarify: I am not blocking this CL. If 2 other TOC members believe it is the right thing to do - it can be merged, it's just that I don't get it, but unanimity is not required. I do believe the intent and the direction of the doc are valid and valuable - and using a label selector is a strict and clear improvement over what we currently have, but my head hurts when I see all the contradictions and inconsistencies, and this the kind of API that covers an area that is already extremely hard to understand for most people ( not only users, but many long time Istio developers ). Add on top the already extremely difficult subject of migration from sidecars to ambient - due in large part to subtle and not-so-subtle differences in behavior, which we rarely consider in depth when reviewing new features and APIs. There is no technical reason - and nothing in the document - that requires us to take a MeshConfig API that is used for a particular purpose and with a particular semantic and add fields with contradictory semantic. It is perfectly fine to add a brand new message and leave the old one as is, at least this avoids the contradictions.

…

I'll let @keithmattix <https://github.com/keithmattix> chime in here on why we're not using the existing sidecar mode fields for multicluster configuration. — Reply to this email directly, view it on GitHub <#3464 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2RCFEY5V43AA5YQ7PT25TQUJAVCNFSM6AAAAABYXQCKRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNRXGM3DANRXGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jaellio · 2025-05-09T17:50:33Z

These are the fundamental questions that I think you are looking to be answered:

Why are there separate APIs for ambient mode and sidecar mode global and local configuration?
How does the existing API for multicluster sidecar mode (hosts) differ from the proposed API in this PR?
If the ServiceScopeConfig API is defining discoverability of services, how are users supposed to express routability and enforce policy?
Should ServiceScopeConfig be defined in ServiceSettings or in a separate message?
What pain points exist in sidecar mode multicluster that we are trying to address with this API for ambient mode multicluster?

jaellio · 2025-05-09T19:18:18Z

These are the fundamental questions that I think you are looking to be answered:

Why are there separate APIs for ambient mode and sidecar mode global and local configuration?

How does the existing API for multicluster sidecar mode (hosts) differ from the proposed API in this PR?

The hosts field in ServiceSettings for sidecar mode multicluster relies on L7 hostnames. Since ztunnel is operating at L4, it adds additional complexity to scope discoverability to hostnames. The VIP (obtained through DNS) would need to be translated by ztunnel back into the original hostname and the hostname information from the hosts would either need to be communicated directly or via some mapping of hostname to filtered endpoints over WDS.

Aside from this complexity, sidecar mode multicluster operates with global being the default discoverability. In ambient mode mutlicluster our goal is for local to be the default discoverability. This fundamentally different default would cause significant confusion for users using the same API for sidecar and ambient mode.

If the ServiceScopeConfig API is defining discoverability of services, how are users supposed to express routability and enforce policy?

Routability and policy are expressed through DestinationRule on waypoints and Kubernetes traffic distribution

Should ServiceScopeConfig be defined in ServiceSettings or in a separate message?

I believe John was in favor of keeping ServiceScopeConfig within ServiceSettings. If this is undesirable to other TOC members we could create a separate message.

What pain points exist in sidecar mode multicluster that we are trying to address with this API for ambient mode multicluster?

The goal of ServiceScopeConfig is to define a service discoverability API that aligns with the design of Istio ambient mode (l4/l7 separation, per node vs per pod proxies) and ambient mode multicluster (namespace sameness, shift from global to local default discoverability)

jaellio · 2025-05-09T19:32:45Z

In isolation the document and intent are very good - but it is hard to review a proposal if the larger context and details are missing.

Now that the doc is no longer in isolation, is the biggest concern that if that ServiceScopeConfig API is defined in ServiceSettings it will add too much confusion with the existing host field? Additionally, it would make it very complicated or impossible to migrate (in place/within a cluster) if we only allowed hosts or ServiceScopeConfig to be set. This last point is assuming we support in place sidecar mode multicluster and ambient mode multicluster migration.

To clarify: I am not blocking this CL. If 2 other TOC members believe it is the right thing to do - it can be merged, it's just that I don't get it, but unanimity is not required.

I agree, but you bring up valid points. I am trying to figure out how in the future we could have exposed these fundamental concerns sooner and more clearly (through more documentation on my part, perhaps more clarify in the original doc on the big picture of how this design fits in with users' understanding of sidecar mode multicluster and pushing for earlier reviews).

I do believe the intent and the direction of the doc are valid and valuable - and using a label selector is a strict and clear improvement over what we currently have, but my head hurts when I see all the contradictions and inconsistencies, and this the kind of API that covers an area that is already extremely hard to understand for most people ( not only users, but many long time Istio developers ). Add on top the already extremely difficult subject of migration from sidecars to ambient - due in large part to subtle and not-so-subtle differences in behavior, which we rarely consider in depth when reviewing new features and APIs.

I agree, it is complicated, and I want to make sure this API has a reasonable user experience and is comprehendible for future maintainers.

There is no technical reason - and nothing in the document - that requires us to take a MeshConfig API that is used for a particular purpose and with a particular semantic and add fields with contradictory semantic. It is perfectly fine to add a brand new message and leave the old one as is, at least this avoids the contradictions.

I agree - there is no limitation on creating a separate message. Originally, I did make since to keep the multicluster discoverability information together in ServiceSettings.

costinm · 2025-05-09T20:20:39Z

On Fri, May 9, 2025 at 12:18 PM Jackie Maertens (Elliott) < ***@***.***> wrote: *jaellio* left a comment (istio/api#3464) <#3464 (comment)> These are the fundamental questions that I think you are looking to be answered: 1. Why are there separate APIs for ambient mode and sidecar mode global and local configuration? 2. How does the existing API for multicluster sidecar mode (hosts) differ from the proposed API in this PR? The hosts field in ServiceSettings for sidecar mode multicluster relies on L7 hostnames. Since ztunnel is operating at L4, it adds additional complexity to scope discoverability to hostnames. The VIP (obtained through DNS) would need to be translated by ztunnel back into the original hostname and the hostname information from the hosts would either need to be communicated directly or via some mapping of hostname to filtered endpoints over WDS.

The 'hosts' field in ServiceSetting has nothing to do with L7 in Istio multicluster. It can be an L4 or L7 - with multi-network, it is using SNI routing with the FQDN, otherwise it's using the VIP - just like ambient. I don't think we ever did any L7 processing for multi-cluster. The VIP does need to be translated to a service either way - you do need to know the labels and policies, there is no way around it in ambient either - and settings in a producer cluster are even less likely to help with the IP that is resolved in the client cluster. Arguably it is easier to map from the VIP to the service ( which we do frequently ) than to map the VIP to labels in the remote cluster. But how we implement it is separate from what the API defines: the intent is for a particular service to be global or local, and that is exactly the same semantic in ambient and sidecar.

Aside from this complexity, sidecar mode multicluster operates with global being the default discoverability. In ambient mode mutlicluster our goal is for local to be the default discoverability. This fundamentally different default would cause significant confusion for users using the same API for sidecar and ambient mode.

Yet the API is in the same struct - just using different fields. The whole point of this API is to give user explicit, fine control over connection to a service that is changing the default - it doesn't matter what the default was, if the setting exists, that's what the user expects to happen. Is a sidecar user migrating to ambient supposed to go over each service they already configured using hostname - and find the labels in the remote cluster ( and keep that up to date in case they change, for each client ) and maintain both hostname (for sidecar) and label selector (for ambient) because it is too hard for us to map VIP and hostname ?

3. If the ServiceScopeConfig API is defining discoverability of services, how are users supposed to express routability and enforce policy? Routability and policy are expressed through DestinationRule on waypoints and Kubernetes traffic distribution

And through this exact API we are changing - which exists to specify how to route requests to a host, local or global. Yes, DestinationRule has locality LB and other advanced settings - but for some reasons I don't remember - not the local/global semantics. I was not aware that we require a client-side egress Waypoint to do the routing decision - the choice to send to a local endpoint or to a remote cluster is happening in the client cluster, either in ztunnel or on an egress gateway.

4. Should ServiceScopeConfig be defined in ServiceSettings or in a separate message? I believe John was in favor of keeping ServiceScopeConfig within ServiceSettings. If this is undesirable to other TOC members we could create a separate message.

I also believe it should be in ServiceSettings - but not by making the existing content of ServiceSetting mutually exclusive and having opposite semantics to the newly added fields. As I mentioned above, keeping the ServiceSettings and clusterLocal field and semantics - as well as hosts - and adding a label selector is what I would prefer and can work ( if we chose to implement it ) for both ambient and sidecar with no ambiguity or confusion.

5. What pain points exist in sidecar mode multicluster that we are trying to address with this API for ambient mode multicluster? The goal of ServiceScopeConfig is to define a service discoverability API that aligns with the design of Istio ambient mode (l4/l7 separation, per node vs per pod proxies) and ambient mode multicluster (namespace sameness, shift from global to local default discoverability)

The choice for clients to send requests in same cluster or remote cluster is L4 in both sidecar and ambient. Sidecar also provides ability to configure local/global (this API), and if user is explicitly setting it, the default is not relevant. Not sure how 'namespace sameness' is related - the service accounts and FQDN may be the same, but applying policies that take into account the cluster ( or location, etc ) is common practice in Istio, and discoverability remains a very separate thing from routing decisions by clients - again, we are not hiding endpoints or changing how they are discovered, Istiod still needs to discover and pull all those endpoints ( to support the sidecars ) - we are just configuring client side endpoint selection. Message ID: ***@***.***>

…

costinm · 2025-05-09T20:27:45Z

On Fri, May 9, 2025 at 12:33 PM Jackie Maertens (Elliott) < ***@***.***> wrote: *jaellio* left a comment (istio/api#3464) <#3464 (comment)> In isolation the document and intent are very good - but it is hard to review a proposal if the larger context and details are missing. Now that the doc is no longer in isolation, is the biggest concern that if that ServiceScopeConfig API is defined in ServiceSettings it will add too much confusion with the existing host field? Additionally, it would make it very complicated or impossible to migrate (in place/within a cluster) if we only allowed hosts or ServiceScopeConfig to be set. This last point is assuming we support in place sidecar mode multicluster and ambient mode multicluster migration. To clarify: I am not blocking this CL. If 2 other TOC members believe it is the right thing to do - it can be merged, it's just that I don't get it, but unanimity is not required. I agree, but you bring up valid points. I am trying to figure out how in the future we could have exposed this fundamental concerns sooner and more clearly (through more documentation on my part, perhaps more clarify in the original doc on the big picture and pushing for earlier reviews). I do believe the intent and the direction of the doc are valid and valuable - and using a label selector is a strict and clear improvement over what we currently have, but my head hurts when I see all the contradictions and inconsistencies, and this the kind of API that covers an area that is already extremely hard to understand for most people ( not only users, but many long time Istio developers ). Add on top the already extremely difficult subject of migration from sidecars to ambient - due in large part to subtle and not-so-subtle differences in behavior, which we rarely consider in depth when reviewing new features and APIs. I agree it is complicated, and I want to make sure this API has a reasonable user experience and comprehendible fore maintainers.

It is not for me - but there are plenty of other things I don't understand and I'm not blocking this if other maintainers understand it :-)

There is no technical reason - and nothing in the document - that requires us to take a MeshConfig API that is used for a particular purpose and with a particular semantic and add fields with contradictory semantic. It is perfectly fine to add a brand new message and leave the old one as is, at least this avoids the contradictions. I agree - there is no limitation on creating a separate message. Originally, I did make since to keep the multicluster discoverability information together in ServiceSettings.

My preference is to keep the same message/API - and the semantics and behavior we have - with the addition of label selectors. The message is intended to control how clients select endpoints - local or global - and has nothing to do with discovery or all the other things in the new fields. If the label selector and this 'scope' is about discovery - and does something different than 'clients send traffic to same cluster or not' - a separate message is the better choice. The worst is to use the same API but in the opposite direction and semantics, with all existing fields mutually exclusive with new fields that have the same name ( local / global scope versus clusterLocal: true/false ). IMO 'clusterLocal: true/false' is far more intuitive and reflects what the API does - while words with a large scope - like 'scope' / 'global', 'discovery' don't help.

…

— Reply to this email directly, view it on GitHub <#3464 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2VXPQHEMQV72YZZ7XD25T7HJAVCNFSM6AAAAABYXQCKRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNRXGY4TMNRUHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jaellio · 2025-05-09T20:58:30Z

As far as I know ServiceSettings is hidden from our public docs to begin with. I am not sure any setting (existing or new) is clearly defined. Can we treat the existing API as superior or preferable when it isn't even documented? It might be, but why isn't it documented?

jaellio · 2025-05-09T21:34:56Z

Okay, let's back up to levelset on the big picture goal and view of the world:

In sidecar mode mulitcluster if users want to limit discoverability and routability they must do 2 things. 1) Must configure the hosts in ServicesSettings (to limit discoverability) 2) Must applly an istio-ingress resource with the host limitations (to limit available routes at the e/w gateway).
The goal of the proposed API in this PR is to offer a single location for users to declaratively define the local or global discoverability and routability of a service. All configuration for the e/w gateay routes and endpoint filtering for the ztunnels/waypoints would utilize this API as the source of trust for discoverability and routability of a service. This is the intended improvement of this API compared to the existing sidecar mode scope configuration in the meshConfig AND ingress-gatway.

@costinm does this clarify the big picture for you?

jaellio · 2025-05-09T21:41:17Z

@linsun @louiscryan Would appreciate your thoughts here as well to make sure I am communicating the big picture of this API change. Trying to come up with some actionable takeaways so this change isn't indefinably blocked either due to unapproachable complexity or fundamental opposition which hasn't been expressed in previous working group meetings.

costinm · 2025-05-10T04:49:07Z

Okay, let's back up to levelset on the big picture goal and view of the world:

In sidecar mode mulitcluster if users want to limit discoverability and routability they must do 2 things. 1) Must configure the hosts in ServicesSettings (to limit discoverability) 2) Must applly an istio-ingress resource with the host limitations (to limit available routes at the e/w gateway).

ServiceSetting does not impact discoverability. Just load balancing ( endpoint selection). And default sidecar multicluster does not restrict routes ( doesn't use one for flat network, and programs SNI routes for everything for multi network).

So normally user only needs to opt out services that need to be local, nothing else.

The goal of the proposed API in this PR is to offer a single location for users to declaratively define the local or global discoverability and routability of a service. All configuration for the e/w gateay routes and endpoint filtering for the ztunnels/waypoints would utilize this API as the source of trust for discoverability and routability of a service. This is the intended improvement of this API compared to the existing sidecar mode scope configuration in the meshConfig AND ingress-gatway.

@costinm does this clarify the big picture for you?

It clarifies the problems with the API in trying yo do too much with too little clarity. As I mentioned, maybe others will understand it but please don't mess up the relatively simple API we have - client side LB only and not programing gateways or LB or what not

keithmattix · 2025-05-10T14:07:52Z

@costinm I believe @jaellio is correct here. If you look at the code, serviceSettings.host (which I'll also point out is an undocumented feature) is plumbed through to endpointbuilder to filter out endpoints sent to the data plane. I understand that to be discoverability of the endpoints; maybe you have a different word for it that we can use to ensure we're all on the same page?

Also, default sidecar multicluster actually does require user interaction to expose services via a multi-network east/west gateway; see this istio.io Gateway resource, specifically the hosts field: https://github.com/istio/istio/blob/9e70ee786fc69aaff46dc3e4fe314bb7fa7b0d08/samples/multicluster/expose-services.yaml#L16.

So as @jaellio explained, we have 2 completely separate APIs for managing the "scope" where a service is made available: one is client side (consumer) and one is server-side (producer). Furthermore, one only exists in multinetwork. The point of the proposed ServiceScope API is to make a single, declarative statement about the scope of a service and its endpoints. Then, based on that declaration, different mesh components (istiod, the e/w gateway) will be configured in specific ways. That's a much smoother API IMO, and based on previous WG meetings, my understanding is that the rest of @istio/technical-oversight-committee agrees

costinm · 2025-05-11T02:49:01Z

On Sat, May 10, 2025 at 2:08 PM Keith Mattix II ***@***.***> wrote: *keithmattix* left a comment (istio/api#3464) <#3464 (comment)> @costinm <https://github.com/costinm> I believe @jaellio <https://github.com/jaellio> is correct here. If you look at the code <https://github.com/istio/istio/blob/9e70ee786fc69aaff46dc3e4fe314bb7fa7b0d08/pilot/pkg/xds/endpoints/endpoint_builder.go#L507>, serviceSettings.host (which I'll also point out is an undocumented feature) is plumbed through to endpointbuilder to filter out endpoints sent to the data plane. I understand that to be discoverability of the endpoints; maybe you have a different word for it that we can use to ensure we're all on the same page?

Selecting the endpoints to be returned is usually part of load balancing (and subsetting). In a 'central istiod' serving multiple clusters - an Istiod 'discovers' all the endpoints from all clusters, and picks (selects) specific endpoints to return to a client based on client network, cluster - and subsets, locality LB, other policies like cluster.local. Istiod needs to discover all endpoints because it may serve both sidecars and ambient - and multiple clusters. How it selects the subset relevant to a client is mainly driven by DestinationRule - which like the original API in MeshConfig are client-side and tied to the client cluster. The fundamental problem here is the scope of the API - original API and DestinationRule are client scoped, but from your description the new fields are server (producer cluster) scoped and based on the wording they somehow impact discovery instead of LB/picking/selection of endpoints, that's why it's so dangerous to mix it on top of the original API. BTW - I didn't mentioned this because there is already a lot of confusion with the words used, but there are very important reasons why hostname was used in the original API, which was driven by the use case of configuring kube-system and other non-istio-managed services, where the admin has little knowledge or control over the labels. It's fine to use label selectors on services you own and control ( if you control both the labeling and the selector ) - but almost impossible to reliably write label selectors for things that the vendor or some external entities control, like kube-system. We use label selector to select our own workloads, but almost always FQDNs or CIDR ranges when we don't control it. Having both is great - removing host will create problems.

Also, default sidecar multicluster actually does require user interaction to expose services via a multi-network east/west gateway; see this istio.io Gateway resource, specifically the hosts field: https://github.com/istio/istio/blob/9e70ee786fc69aaff46dc3e4fe314bb7fa7b0d08/samples/multicluster/expose-services.yaml#L16 .

That API is used for multi-network multi-cluster, and it's one time / setup, not per service ( no need to specific label selectors or specific services). There is user interaction for single-network multi-cluster ( setting the Secrets for watching ) and a different kind for the manual setup of multi-cluster using ServiceEntry/WorkloadEntry. So as @jaellio <https://github.com/jaellio> explained, we have 2 completely

separate APIs for managing the "scope" where a service is made available: one is client side (consumer) and one is server-side (producer). Furthermore, one only exists in multinetwork. The point of the proposed ServiceScope API is to make a single, declarative statement about the scope of a service and its endpoints. Then, based on that declaration, different mesh components (istiod, the e/w gateway) will be configured in specific ways. That's a much smoother API IMO, and based on previous WG meetings, my understanding is that the rest of @istio/technical-oversight-committee <https://github.com/orgs/istio/teams/technical-oversight-committee> agree

Squashing this new API with completely different semantics on top of the old one - no matter how much smoother the new API is - doesn't make sense to me, but the rest of the TOC can approve this CL as well if they think it's much better (and worth breaking so many patterns around pretty much everything we have been doing since the start of the project). My request was that if you have to do things so drastically different - leave the old API alone and don't twist it into something that means 10 different opposite things at once, and explain clearly how it is different from the previous patterns of scoping ( client separate from server - because of security and many other reasons, boundaries, rationale for using labels or hostnames in use cases - and all other things we did are no longer good ). Because I'm pretty sure users will also be confused, I highly doubt I'm the only one.

…

Message ID: ***@***.***>

jaellio · 2025-05-11T19:09:33Z

If hosts/ServiceSettings is a heavily utilized and critical API why isn't it documented? Why is it hidden?

We're not breaking the use of the existing API for sidecar mode. We're offering an alternative API for ambient mode. In this proposed API, istiod will still discover ALL endpoints. And will selectively send endpoints to ztunnels/waypoints based on the services scope. We are discussing "discoverability" from the perspective of the ztunnels/waypoints. They will only be able to select or discover endpoints based the destination service's defined scope (and therefore what endpoints have been shared by istiod).

For clarity (even though the exiting API isn't documented), we could create a seperate message for ambient mode service settings - AmbientServiceScope maybe.

Since there is so much knowlege being shared here about ServiceSettings and the hosts fields it would be great to document it. @costinm could you add this documentation? I think it will reduce confusion for us and users.

@keithmattix

Stevenjin8 · 2025-05-12T20:06:07Z

mesh/v1alpha1/config.proto

@@ -442,8 +442,59 @@ message MeshConfig {
    // The services to which the Settings should be applied. Services are selected using the hostname
    // matching rules used by DestinationRule.
    //
+    // The hosts field is ignored if ambient mode is enabled.


Bit late to the party, but it would also be good to edit the docs of message Settings to specify sidecar multicluster as opposed to ambient MC

I don't think we're restricting Settings to sidecar mode only. For ambient, we'll support both ServiceSettings and ServiceScopeConfigs

between ServiceSettings and ServiceScopeConfigs. Signed-off-by: Jackie Elliott <[email protected]>

costinm · 2025-05-12T21:00:49Z

If hosts/ServiceSettings is a heavily utilized and critical API why isn't it documented? Why is it hidden?

I don't think it's used by a lot of users - but it is critical for the ones that use it.

Is is hidden because it was expected to have feedback - and be documented in DestinationRule or as part of a first class API with the right ownership, with MeshConfig as a place holder. We followed the same path for locality load balancing - first hidden, then documented in MeshConfig and eventually moved to DR.

Ownership is important - DestinationRule like HttpRoute can be namespaced and modified by the owner of the service - while MeshConfig is restricted to mesh admin.

The relation between locality load balancing and the cluster local load balancing was also not very clearly defined -
sending traffic to same cluster is obviously a very specific case and we didn't want to confuse users, yet there was
an urgent need to have some knob for the users who were broken otherwise.

We're not breaking the use of the existing API for sidecar mode. We're offering an alternative API for ambient mode. In this proposed API, istiod will still discover ALL endpoints. And will selectively send endpoints to ztunnels/waypoints based on the services scope. We are discussing "discoverability" from the perspective of the ztunnels/waypoints. They will only be able to select or discover endpoints based the destination service's defined scope (and therefore what endpoints have been shared by istiod).

So DestinationRule subsetting and locality load balancing are also 'discoverability' - from some perspective ? I don't understand the insistence on using the term 'discoverability from ztunnel perspective' when we never used this term in DestinationRule and the other APIs. If you really must use this word - use it with the context (ztunnel or waypoint discoverability) so users don't get confused or believe that Istiod or K8S discoverability are impacted.

For clarity (even though the exiting API isn't documented), we could create a seperate message for ambient mode service settings - AmbientServiceScope maybe.

Since there is so much knowlege being shared here about ServiceSettings and the hosts fields it would be great to document it. @costinm could you add this documentation? I think it will reduce confusion for us and users.

@keithmattix

AFAIK the intention was for this feature to follow the same path as locality load balancing - with use by advanced users first, but eventual promotion to a namespaced v1 API. Incidentally, ServiceExport from MCP is a namespaced API as well - and may work better for a producer setting than a mesh config fragment (and is neutral to implementation choices - i.e. not ambient-only).

costinm · 2025-05-12T21:13:15Z

On Mon, May 12, 2025 at 9:03 AM Keith Mattix II ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In mesh/v1alpha1/config.proto <#3464 (comment)>: > + // namespacesSelector: + // matchExpressions: + // - key: istio.io/global + // operator: In + // values: [true] + // servicesSelector: + // matchExpressions: + // - key: istio.io/global + // operator: Exists + // scope: GLOBAL + // ``` + message ServiceScopeConfig { + // The scope of the matching service. Used to determine if the service is available locally + // (cluster local) or globally (mesh-wide). + enum Scope { + LOCAL = 0; The existing API has a single bool for local vs. global. Are you two proposing that we invert the default for this boolean for ambient workloads vs. sidecar workloads? That seems brittle...

How is true/false more brittle than an enum with 2 values, 0 and 1 ? Usually true/false and 0/1 are equivalent. The default value in the API - i.e. what happens if a user does not include the field at all - is different from the default behavior in the mesh when no configuration exists at all. If user just has a host: foo.kube-system.svc.cluster.local without specifying 'cluster_local: true' or a scope - it would mean opposite things for sidecar and ambient, i.e. sidecars will treat it as global (default cluster_local is false) and ambient as local (0), which is far from ideal. The default if nothing is specified at all is what we have defined in ambient ( there is no multicluster so all services are local) and what has been the default in sidecar ( all services global when using Secrets to watch remote clusters, explicit config when using Service/WorkloadEntry) Message ID: ***@***.***>

…

Signed-off-by: Jackie Elliott <[email protected]>

keithmattix

Added some wording suggestions. I think I'd like to see something more general about how this API should be used by mesh admins to set the criteria for what services are global vs. local

mesh/v1alpha1/config.proto

costinm · 2025-05-13T19:08:33Z

On Tue, May 13, 2025, 11:47 Jackie Maertens (Elliott) < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In mesh/v1alpha1/config.proto <#3464 (comment)>: > @@ -450,6 +450,57 @@ message MeshConfig { // Settings to be applied to select services. repeated ServiceSettings service_settings = 50; + // Ambient mode multicluster scope configuration to be applied to matching services via namespace and/or + // service selectors. This configuration is used to define the scope of services configured by this I am concerned about using "discovered by this cluster's control planes". This might be too implementation focused, but technically the control plane will discover all services but only share select services with the dataplane. Also, I'd like to clarify the meaning of a "local" scope for a matching service. If service A matches a local scope selector which of the following is true: 1. Any workloads on the local cluster A making requests to service A will only be routed locally (even if service A is globally scoped on other clusters). Service A will not be exposed at the e/w gateway (if one exists). 2. Any workloads on the local cluster A making requests to service A will be routed locally and to any clusters where service A is globally scoped). Any workload on a different cluster B where service A is globally scope will be able to be routed locally or to any other clusters where service A is global (not cluster A). Service A will not be exposed at the e/w gateway (if it exists)

So far, we never used a mesh config even from a different istiod revision in same cluster - and surely not a mesh config from a remote cluster. The security and reliability risks are very high, compounded with cross-namespace selector risks. We spent years removing the cross namespace uses in original API. I am not against adding a 4th way to configure if a service is exposed on a gateway - after 2 Gateway APIs and ServiceExport, but crossing the cluster boundaries to watch MeshConfigs is absolutely wrong.

…

— Reply to this email directly, view it on GitHub <#3464 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2UTKHEDPO76ATZDL5D26I43RAVCNFSM6AAAAABYXQCKRCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQMZXG44TEMRQGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Signed-off-by: Jackie Elliott <[email protected]>

jaellio · 2025-05-14T15:26:53Z

So far, we never used a mesh config even from a different istiod revision
in same cluster - and surely not a mesh config from a remote cluster. The
security and reliability risks are very high, compounded with
cross-namespace selector risks. We spent years removing the cross namespace
uses in original API.

I am not against adding a 4th way to configure if a service is exposed on a
gateway - after 2 Gateway APIs and ServiceExport, but crossing the cluster
boundaries to watch MeshConfigs is absolutely wrong.

I am not suggesting we cross cluster boundaries to watch MeshConfigs in different clusters. We are operating under the requirement that all MeshConfig ServiceScopeConfigs across clusters are the same. When istiod gets all services across all clusters, the selectors defined in the local MeshConfig are applied.

mesh/v1alpha1/config.proto

Signed-off-by: Jackie Elliott <[email protected]>

costinm

Cosmetic / doc suggestions - the structure and the rest is good.

costinm · 2025-05-15T16:59:18Z

mesh/v1alpha1/config.proto

+  // When in ambient mode, if ServiceSettings are defined they will be considered in addition to the
+  // ServiceScopeConfigs. If a service is defined by ServiceSetting to be cluster local and matches a
+  // global service scope selector, the service will be considered cluster local. If a service is 
+  // considered global by ServiceSettings and does not match a global service scope selector


I don't know if this is the right behavior: ServiceSettings operate by hostname, with the primary use case being services that are not under the control of the mesh admin ( like platform services ). If a user sets starts with a 'local by default' model (for example using Service scope with a wildcard or broad selector) - we want ServiceSettings to opt in by hostname.

The second (and maybe more important) issue is that the scope of both settings is usually .cluster.local names (or the custom suffix used by k8s ) - services using global names ( like .internal or prod.expample.com ) which are common for services used in mixed environments or VPC-scoped DNS, or using ServiceEntry and DNS interception - are almost always global.

I see original doc was vague on what happens to ServiceEntry / non-K8S FQDNs - they are still valid Istio services, but can't be in scope for either ServiceSettings or ServiceScope (unless we use the labels on the ServiceEntry - external services don't have labels )

costinm · 2025-05-15T17:01:41Z

mesh/v1alpha1/config.proto

+  // global service scope selector, the service will be considered cluster local. If a service is 
+  // considered global by ServiceSettings and does not match a global service scope selector
+  // the serive will be considered local. Local scope takes precedence over global scope. Since
+  // ServiceScopeConfigs is local by default, all services are considered local unless it is considered


See above - I would add 'takes precedence ... for .cluster.local services'. We don't want to accidentally break ServiceEntry and external services.

costinm · 2025-05-15T17:03:19Z

mesh/v1alpha1/config.proto

@@ -450,6 +458,54 @@ message MeshConfig {
  // Settings to be applied to select services.
  repeated ServiceSettings service_settings = 50;

+  // Configuration for ambient mode multicluster service scope. This setting allows mesh administrators
+  // to define the criteria by which the cluster's control plane determines which services in other
+  // clusters in the mesh are treated as global (accessible across multiple clusters) versus local


I am reading this correctly as 'services in other clusters' == services using K8S Service and Pods in other cluster, and implicitly excluding ServiceEntry ? If yes - please make it explicit to avoid confusion. If no - why and how do we avoid breaking them ?

costinm · 2025-05-15T17:04:26Z

mesh/v1alpha1/config.proto

+  // and/or other matching criteria. This is particularly  useful in multicluster service mesh deployments
+  // to control service visibility and access across clusters. This API is not intended to enforce
+  // security policies. Resources like DestinationRules should be used to enforce authorization policies.
+  // If a service matches a global service scope selector, the service's endpoints will be globally


Typo - AuthorizationPolicy enforces authorization, DestinationRule load balancing (including locality) and other client-side policies.

Good catch, I'll fix this typo

costinm · 2025-05-15T17:07:22Z

mesh/v1alpha1/config.proto

+  // If a service matches a global service scope selector, the service's endpoints will be globally
+  // exposed. If a service is locally scoped, its endpoints will only be exposed to local cluster
+  // services.
+  // 


Do you want to add a line about services matching multiple selectors ? Which one takes priority ? Maybe an example or a line on what happens if matchExpression is missing - does it match all and is the default ?

costinm · 2025-05-15T17:08:51Z

mesh/v1alpha1/config.proto

+    // Match expression for namespaces.
+    LabelSelector namespace_selector = 1;
+
+    // Match expression for serivces.


Only services in namespaces matching the namespace_selector will be used ( i.e. AND ). If namespace_selector is missing - all namespaces ?

Yes, this is consistent with the ambient enablement API. An empty selector (one selector must be non-empty) is the equivalent of a match all.

costinm · 2025-05-15T17:11:38Z

mesh/v1alpha1/config.proto

+    // (cluster local) or globally (mesh-wide).
+    enum Scope {
+      LOCAL = 0;
+      GLOBAL = 1;


If the comment above specify 'global' to mean mesh-wide - why not use MESH instead of GLOBAL ?

Usually GLOBAL is larger than MESH, we are twisting enough words. Also GLOBAL can create confusion with regional/zonal ( covered by other APIs ) - while mesh is a bit more clear as intent.

istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 11, 2025

jaellio marked this pull request as ready for review March 11, 2025 18:25

jaellio requested a review from a team as a code owner March 11, 2025 18:25

istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Mar 11, 2025

jaellio requested a review from keithmattix March 11, 2025 18:26

keithmattix reviewed Mar 12, 2025

View reviewed changes

mesh/v1alpha1/config.proto Outdated Show resolved Hide resolved

mesh/v1alpha1/config.proto Outdated Show resolved Hide resolved

mesh/v1alpha1/config.proto Outdated Show resolved Hide resolved

jaellio requested a review from keithmattix March 12, 2025 21:51

keithmattix reviewed Mar 12, 2025

View reviewed changes

mesh/v1alpha1/config.proto Outdated Show resolved Hide resolved

keithmattix approved these changes Mar 12, 2025

View reviewed changes

howardjohn reviewed Mar 13, 2025

View reviewed changes

howardjohn assigned louiscryan Mar 13, 2025

istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Apr 1, 2025

jaellio added 8 commits April 16, 2025 22:43

Define ServiceScopeConfig in ServiceSettings

9113803

Signed-off-by: Jackie Elliott <[email protected]>

Add release note

7ea4e09

Signed-off-by: Jackie Elliott <[email protected]>

remove suffix

c63acb4

Signed-off-by: Jackie Elliott <[email protected]>

add back missing scope from example

0f1bfd9

Signed-off-by: Jackie Elliott <[email protected]>

Remove nested list

eb9036e

Signed-off-by: Jackie Elliott <[email protected]>

make gen

f91c69a

Signed-off-by: Jackie Elliott <[email protected]>

Update example

ff953c1

Signed-off-by: Jackie Elliott <[email protected]>

make gen

c9111e4

Signed-off-by: Jackie Elliott <[email protected]>

jaellio force-pushed the jaellio/srvcapandexportapi branch from ad386cf to c9111e4 Compare April 16, 2025 23:39

istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Apr 16, 2025

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 16, 2025

jaellio removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 16, 2025

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 16, 2025

Update gen

ea41404

Signed-off-by: Jackie Elliott <[email protected]>

Stevenjin8 reviewed May 12, 2025

View reviewed changes

Move ServiceScopeConfigs to its own message. Define relationship

4527b21

between ServiceSettings and ServiceScopeConfigs. Signed-off-by: Jackie Elliott <[email protected]>

Clarify declaritive intent

26bf812

Signed-off-by: Jackie Elliott <[email protected]>

keithmattix reviewed May 12, 2025

View reviewed changes

mesh/v1alpha1/config.proto Outdated Show resolved Hide resolved

mesh/v1alpha1/config.proto Outdated Show resolved Hide resolved

mesh/v1alpha1/config.proto Outdated Show resolved Hide resolved

mesh/v1alpha1/config.proto Outdated Show resolved Hide resolved

Clarify API intent for mesh admin

1e9117b

Signed-off-by: Jackie Elliott <[email protected]>

jewertow suggested changes May 14, 2025

View reviewed changes

mesh/v1alpha1/config.proto Outdated Show resolved Hide resolved

Fix example

a79bf0f

Signed-off-by: Jackie Elliott <[email protected]>

jaellio requested review from jewertow and costinm May 14, 2025 20:47

costinm approved these changes May 15, 2025

View reviewed changes

istio-testing merged commit 6c028fe into istio:master May 15, 2025
5 checks passed

keithmattix mentioned this pull request May 19, 2025

Define service capture API #3463

Closed

Define ServiceScopeConfig in ServiceSettings #3464

Define ServiceScopeConfig in ServiceSettings #3464

Uh oh!

Conversation

jaellio commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

istio-testing commented Mar 11, 2025

Uh oh!

jaellio commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

keithmattix left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaellio May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaellio Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaellio commented Apr 17, 2025

Uh oh!

jaellio commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

costinm commented May 9, 2025 via email

Uh oh!

jaellio commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaellio commented May 9, 2025

Uh oh!

jaellio commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

costinm commented May 9, 2025 via email

Uh oh!

costinm commented May 9, 2025 via email

Uh oh!

jaellio commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaellio commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaellio commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

costinm commented May 10, 2025

Uh oh!

keithmattix commented May 10, 2025

Uh oh!

costinm commented May 11, 2025 via email

Uh oh!

jaellio commented Mar 11, 2025 •

edited

Loading

jaellio May 1, 2025 •

edited

Loading

jaellio Mar 13, 2025 •

edited

Loading

jaellio commented May 9, 2025 •

edited

Loading

jaellio commented May 9, 2025 •

edited

Loading

jaellio commented May 9, 2025 •

edited

Loading

jaellio commented May 9, 2025 •

edited

Loading

jaellio commented May 9, 2025 •

edited

Loading

jaellio commented May 9, 2025 •

edited

Loading

jaellio commented May 11, 2025 •

edited

Loading