-
Notifications
You must be signed in to change notification settings - Fork 573
Define ServiceScopeConfig in ServiceSettings #3464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define ServiceScopeConfig in ServiceSettings #3464
Conversation
Skipping CI for Draft Pull Request. |
/test all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like what we got general consensus on in https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0 across multiple TOC and WG meetings so I'm approving
mesh/v1alpha1/config.proto
Outdated
@@ -444,6 +444,47 @@ message MeshConfig { | |||
// | |||
// For example: foo.bar.svc.cluster.local, *.baz.svc.cluster.local | |||
repeated string hosts = 2; | |||
|
|||
// Scope configuration to be applied to matching services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the relation between this new setting and the existing hosts/settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that users set one or the other. A oneOf might be better to express that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point ServiceScopes
will only apply to ambient multicluster. In traditional multicluster today the default availability is global. For ambient multicluster, the default availability is local. For alpha, I think it is unnecessary to support ServiceScopes
for both configurations operating with different default availabilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible there could be conflict where the host is foo.bar.svc.cluster.local and the namespace or service has istio.io/global label enabled?
Also, do we need to support both namespace and service level? what if they have conflicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if ServiceScopeConfig
is defined, even with a 0 value, the host should be ignored in ambient mode. @keithmattix was that your understanding?
Service selectors overwrite namespace selectors. This behavior should be consistent with the ambient enablement API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think hosts should be ignored if ambient is enabled in env vars
mesh/v1alpha1/config.proto
Outdated
// | ||
// ```yaml | ||
// serviceSettings: | ||
// serviceScopes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// serviceScopes: | |
// - serviceScopes: |
serviceSettings is a list of MeshConfig_ServiceSettings. Not entirely sure we need the double nested list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keithmattix do you have a preference? I am in favor of it not being double nested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm looking at the proto, I almost feel like ServiceScopes should just be a sibling to ServiceSettings. The other fields in service settings don't make sense for service scope, and eventually, I think the latter will supplant the former
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be harder to enforce oneof though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough...in that case yeah we don't want the extra nesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another thought: I'm not sure if we can do oneOf because new versions of istiod reading old meshconfig would no longer be able to parse it correctly...
Signed-off-by: Jackie Elliott <[email protected]>
Signed-off-by: Jackie Elliott <[email protected]>
Signed-off-by: Jackie Elliott <[email protected]>
Signed-off-by: Jackie Elliott <[email protected]>
Signed-off-by: Jackie Elliott <[email protected]>
Signed-off-by: Jackie Elliott <[email protected]>
Signed-off-by: Jackie Elliott <[email protected]>
Signed-off-by: Jackie Elliott <[email protected]>
ad386cf
to
c9111e4
Compare
Signed-off-by: Jackie Elliott <[email protected]>
@louiscryan Could you please take a look? |
Thanks for sharing - we've reviewed this API in several WG meetings and there is an existing doc that shares this API https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0#heading=h.nehszyorutrv which you approved. I wish these fundamental concerns had been discussed sooner. I am not sure how to move forward here without having another discussion on the fundamentals of the design. I'll let @keithmattix chime in here on why we're not using the existing sidecar mode fields for multicluster configuration. |
On Fri, May 9, 2025 at 10:28 AM Jackie Maertens (Elliott) < ***@***.***> wrote:
*jaellio* left a comment (istio/api#3464)
<#3464 (comment)>
I took another read before attempting to approve - but I've also looked at
the previous examples and the existing use of the API we are changing, not
only the additions.
I don't think I can approve this - maybe I'm failing to understand
something that is obvious to everyone else, but as an Istio developer - or
even as an user - the original intent of the API, to define 'cluster local'
scope on the client side, defined as "if service is matching, clients will
use cluster local - else global' and with a client-scope - mixed with the
addition which appear to impact discovery and server side - is completely
confusing and contradictory.
Thanks for sharing - we've reviewed this API in several WG meetings and
there is an existing doc that shares this API
https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0#heading=h.nehszyorutrv.
I wish these fundamental concerns had been discussed sooner.
In isolation the document and intent are very good - but it is hard to
review a proposal if the larger context and details are missing.
To clarify: I am not blocking this CL. If 2 other TOC members believe it is
the right thing to do - it can be merged, it's just that I don't get it,
but unanimity is not required.
I do believe the intent and the direction of the doc are valid and valuable
- and using a label selector is a strict and clear improvement over what we
currently have, but my head hurts when
I see all the contradictions and inconsistencies, and this the kind of API
that covers an area that is already extremely hard to understand for most
people ( not only users, but many long time
Istio developers ). Add on top the already extremely difficult subject of
migration from sidecars to ambient - due in large part to subtle and
not-so-subtle differences in behavior, which we
rarely consider in depth when reviewing new features and APIs.
There is no technical reason - and nothing in the document - that requires
us to take a MeshConfig API that is used for a particular purpose and with
a particular semantic and
add fields with contradictory semantic. It is perfectly fine to add a brand
new message and leave the old one as is, at least this avoids the
contradictions.
… I'll let @keithmattix <https://github.com/keithmattix> chime in here on
why we're not using the existing sidecar mode fields for multicluster
configuration.
—
Reply to this email directly, view it on GitHub
<#3464 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2RCFEY5V43AA5YQ7PT25TQUJAVCNFSM6AAAAABYXQCKRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNRXGM3DANRXGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
These are the fundamental questions that I think you are looking to be answered:
|
The Aside from this complexity, sidecar mode multicluster operates with global being the default discoverability. In ambient mode mutlicluster our goal is for local to be the default discoverability. This fundamentally different default would cause significant confusion for users using the same API for sidecar and ambient mode.
Routability and policy are expressed through DestinationRule on waypoints and Kubernetes traffic distribution
I believe John was in favor of keeping
The goal of |
Now that the doc is no longer in isolation, is the biggest concern that if that
I agree, but you bring up valid points. I am trying to figure out how in the future we could have exposed these fundamental concerns sooner and more clearly (through more documentation on my part, perhaps more clarify in the original doc on the big picture of how this design fits in with users' understanding of sidecar mode multicluster and pushing for earlier reviews).
I agree, it is complicated, and I want to make sure this API has a reasonable user experience and is comprehendible for future maintainers.
I agree - there is no limitation on creating a separate message. Originally, I did make since to keep the multicluster discoverability information together in |
On Fri, May 9, 2025 at 12:18 PM Jackie Maertens (Elliott) < ***@***.***> wrote:
*jaellio* left a comment (istio/api#3464)
<#3464 (comment)>
These are the fundamental questions that I think you are looking to be
answered:
1. Why are there separate APIs for ambient mode and sidecar mode
global and local configuration?
2. How does the existing API for multicluster sidecar mode (hosts)
differ from the proposed API in this PR?
The hosts field in ServiceSettings for sidecar mode multicluster relies
on L7 hostnames. Since ztunnel is operating at L4, it adds additional
complexity to scope discoverability to hostnames. The VIP (obtained through
DNS) would need to be translated by ztunnel back into the original hostname
and the hostname information from the hosts would either need to be
communicated directly or via some mapping of hostname to filtered endpoints
over WDS.
The 'hosts' field in ServiceSetting has nothing to do with L7 in Istio
multicluster. It can be an L4 or L7 - with multi-network, it is using SNI
routing with the FQDN, otherwise it's using the VIP - just like ambient.
I don't think we ever did any L7 processing for multi-cluster.
The VIP does need to be translated to a service either way - you do need to
know the labels and policies, there is no way around it in ambient either -
and settings in a producer cluster are even less likely to
help with the IP that is resolved in the client cluster. Arguably it is
easier to map from the VIP to the service ( which we do frequently ) than
to map the VIP to labels in the remote cluster.
But how we implement it is separate from what the API defines: the intent
is for a particular service to be global or local, and that is exactly the
same semantic in ambient and sidecar.
Aside from this complexity, sidecar mode multicluster operates with global
being the default discoverability. In ambient mode mutlicluster our goal is
for local to be the default discoverability. This fundamentally different
default would cause significant confusion for users using the same API for
sidecar and ambient mode.
Yet the API is in the same struct - just using different fields.
The whole point of this API is to give user explicit, fine control over
connection to a service that is changing the default - it doesn't matter
what the default was, if the setting
exists, that's what the user expects to happen.
Is a sidecar user migrating to ambient supposed to go over each service
they already configured using hostname - and find the labels in the remote
cluster ( and keep that
up to date in case they change, for each client ) and maintain both
hostname (for sidecar) and label selector (for ambient) because it is too
hard for us to map VIP and hostname ?
3. If the ServiceScopeConfig API is defining discoverability of
services, how are users supposed to express routability and enforce policy?
Routability and policy are expressed through DestinationRule on waypoints
and Kubernetes traffic distribution
And through this exact API we are changing - which exists to specify how to
route requests to a host, local or global.
Yes, DestinationRule has locality LB and other advanced settings - but for
some reasons I don't remember - not the local/global semantics.
I was not aware that we require a client-side egress Waypoint to do the
routing decision - the choice to send to a local endpoint or to a remote
cluster
is happening in the client cluster, either in ztunnel or on an egress
gateway.
4. Should ServiceScopeConfig be defined in ServiceSettings or in a
separate message?
I believe John was in favor of keeping ServiceScopeConfig within
ServiceSettings. If this is undesirable to other TOC members we could
create a separate message.
I also believe it should be in ServiceSettings - but not by making the
existing content of ServiceSetting mutually exclusive and having opposite
semantics to the newly added
fields.
As I mentioned above, keeping the ServiceSettings and clusterLocal field
and semantics - as well as hosts - and adding a label selector is what I
would prefer and
can work ( if we chose to implement it ) for both ambient and sidecar with
no ambiguity or confusion.
5. What pain points exist in sidecar mode multicluster that we are
trying to address with this API for ambient mode multicluster?
The goal of ServiceScopeConfig is to define a service discoverability API
that aligns with the design of Istio ambient mode (l4/l7 separation, per
node vs per pod proxies) and ambient mode multicluster (namespace sameness,
shift from global to local default discoverability)
The choice for clients to send requests in same cluster or remote cluster
is L4 in both sidecar and ambient. Sidecar also provides ability to
configure local/global (this API), and if user is explicitly setting it,
the default is not relevant.
Not sure how 'namespace sameness' is related - the service accounts and
FQDN may be the same, but applying policies that take into account the
cluster ( or location, etc ) is common practice in Istio,
and discoverability remains a very separate thing from routing decisions by
clients - again, we are not hiding endpoints or changing how they are
discovered, Istiod still needs to discover and pull all
those endpoints ( to support the sidecars ) - we are just configuring
client side endpoint selection.
Message ID: ***@***.***>
… |
On Fri, May 9, 2025 at 12:33 PM Jackie Maertens (Elliott) < ***@***.***> wrote:
*jaellio* left a comment (istio/api#3464)
<#3464 (comment)>
In isolation the document and intent are very good - but it is hard to
review a proposal if the larger context and details are missing.
Now that the doc is no longer in isolation, is the biggest concern that if
that ServiceScopeConfig API is defined in ServiceSettings it will add too
much confusion with the existing host field? Additionally, it would make it
very complicated or impossible to migrate (in place/within a cluster) if we
only allowed hosts or ServiceScopeConfig to be set. This last point is
assuming we support in place sidecar mode multicluster and ambient mode
multicluster migration.
To clarify: I am not blocking this CL. If 2 other TOC members believe it
is the right thing to do - it can be merged, it's just that I don't get it,
but unanimity is not required.
I agree, but you bring up valid points. I am trying to figure out how in
the future we could have exposed this fundamental concerns sooner and more
clearly (through more documentation on my part, perhaps more clarify in the
original doc on the big picture and pushing for earlier reviews).
I do believe the intent and the direction of the doc are valid and
valuable - and using a label selector is a strict and clear improvement
over what we currently have, but my head hurts when I see all the
contradictions and inconsistencies, and this the kind of API that covers an
area that is already extremely hard to understand for most people ( not
only users, but many long time Istio developers ). Add on top the already
extremely difficult subject of migration from sidecars to ambient - due in
large part to subtle and not-so-subtle differences in behavior, which we
rarely consider in depth when reviewing new features and APIs.
I agree it is complicated, and I want to make sure this API has a
reasonable user experience and comprehendible fore maintainers.
It is not for me - but there are plenty of other things I don't understand
and I'm not blocking this if other maintainers understand it :-)
There is no technical reason - and nothing in the document - that requires
us to take a MeshConfig API that is used for a particular purpose and with
a particular semantic and add fields with contradictory semantic. It is
perfectly fine to add a brand new message and leave the old one as is, at
least this avoids the contradictions.
I agree - there is no limitation on creating a separate message.
Originally, I did make since to keep the multicluster discoverability
information together in ServiceSettings.
My preference is to keep the same message/API - and the semantics and
behavior we have - with the addition of label selectors. The message is
intended to control how clients select endpoints - local or global -
and has nothing to do with discovery or all the other things in the new
fields.
If the label selector and this 'scope' is about discovery - and does
something different than 'clients send traffic to same cluster or not' - a
separate message is the better choice.
The worst is to use the same API but in the opposite direction and
semantics, with all existing fields mutually exclusive with new fields that
have the same name ( local / global scope versus
clusterLocal: true/false ). IMO 'clusterLocal: true/false' is far more
intuitive and reflects what the API does - while words with a large scope -
like 'scope' / 'global', 'discovery' don't help.
… —
Reply to this email directly, view it on GitHub
<#3464 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2VXPQHEMQV72YZZ7XD25T7HJAVCNFSM6AAAAABYXQCKRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNRXGY4TMNRUHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
As far as I know |
Okay, let's back up to levelset on the big picture goal and view of the world:
@costinm does this clarify the big picture for you? |
@linsun @louiscryan Would appreciate your thoughts here as well to make sure I am communicating the big picture of this API change. Trying to come up with some actionable takeaways so this change isn't indefinably blocked either due to unapproachable complexity or fundamental opposition which hasn't been expressed in previous working group meetings. |
ServiceSetting does not impact discoverability. Just load balancing ( endpoint selection). And default sidecar multicluster does not restrict routes ( doesn't use one for flat network, and programs SNI routes for everything for multi network). So normally user only needs to opt out services that need to be local, nothing else.
It clarifies the problems with the API in trying yo do too much with too little clarity. As I mentioned, maybe others will understand it but please don't mess up the relatively simple API we have - client side LB only and not programing gateways or LB or what not |
@costinm I believe @jaellio is correct here. If you look at the code, serviceSettings.host (which I'll also point out is an undocumented feature) is plumbed through to endpointbuilder to filter out endpoints sent to the data plane. I understand that to be discoverability of the endpoints; maybe you have a different word for it that we can use to ensure we're all on the same page? Also, default sidecar multicluster actually does require user interaction to expose services via a multi-network east/west gateway; see this istio.io Gateway resource, specifically the hosts field: https://github.com/istio/istio/blob/9e70ee786fc69aaff46dc3e4fe314bb7fa7b0d08/samples/multicluster/expose-services.yaml#L16. So as @jaellio explained, we have 2 completely separate APIs for managing the "scope" where a service is made available: one is client side (consumer) and one is server-side (producer). Furthermore, one only exists in multinetwork. The point of the proposed |
On Sat, May 10, 2025 at 2:08 PM Keith Mattix II ***@***.***> wrote:
*keithmattix* left a comment (istio/api#3464)
<#3464 (comment)>
@costinm <https://github.com/costinm> I believe @jaellio
<https://github.com/jaellio> is correct here. If you look at the code
<https://github.com/istio/istio/blob/9e70ee786fc69aaff46dc3e4fe314bb7fa7b0d08/pilot/pkg/xds/endpoints/endpoint_builder.go#L507>,
serviceSettings.host (which I'll also point out is an undocumented feature)
is plumbed through to endpointbuilder to filter out endpoints sent to the
data plane. I understand that to be discoverability of the endpoints; maybe
you have a different word for it that we can use to ensure we're all on the
same page?
Selecting the endpoints to be returned is usually part of load balancing
(and subsetting). In a 'central istiod' serving multiple clusters - an
Istiod 'discovers' all the endpoints from all clusters, and
picks (selects) specific endpoints to return to a client based on client
network, cluster - and subsets, locality LB, other policies like
cluster.local.
Istiod needs to discover all endpoints because it may serve both sidecars
and ambient - and multiple clusters. How it selects the subset relevant to
a client is
mainly driven by DestinationRule - which like the original API in
MeshConfig are client-side and tied to the client cluster.
The fundamental problem here is the scope of the API - original API and
DestinationRule are client scoped, but from your description the new fields
are server (producer cluster) scoped
and based on the wording they somehow impact discovery instead of
LB/picking/selection of endpoints, that's why it's so dangerous to mix it
on top of the original API.
BTW - I didn't mentioned this because there is already a lot of confusion
with the words used, but there are very important reasons why hostname was
used in the
original API, which was driven by the use case of configuring kube-system
and other non-istio-managed services, where the admin has little knowledge
or
control over the labels. It's fine to use label selectors on services you
own and control ( if you control both the labeling and the selector ) - but
almost
impossible to reliably write label selectors for things that the vendor or
some external entities control, like kube-system. We use label selector to
select our own workloads,
but almost always FQDNs or CIDR ranges when we don't control it. Having
both is great - removing host will create problems.
Also, default sidecar multicluster actually does require user interaction
to expose services via a multi-network east/west gateway; see this
istio.io Gateway resource, specifically the hosts field:
https://github.com/istio/istio/blob/9e70ee786fc69aaff46dc3e4fe314bb7fa7b0d08/samples/multicluster/expose-services.yaml#L16
.
That API is used for multi-network multi-cluster, and it's one time /
setup, not per service ( no need to specific label selectors or specific
services).
There is user interaction for single-network multi-cluster ( setting the
Secrets for watching ) and a different kind for the manual setup of
multi-cluster using
ServiceEntry/WorkloadEntry.
So as @jaellio <https://github.com/jaellio> explained, we have 2 completely
separate APIs for managing the "scope" where a service is made available:
one is client side (consumer) and one is server-side (producer).
Furthermore, one only exists in multinetwork. The point of the proposed
ServiceScope API is to make a single, declarative statement about the
scope of a service and its endpoints. Then, based on that declaration,
different mesh components (istiod, the e/w gateway) will be configured in
specific ways. That's a much smoother API IMO, and based on previous WG
meetings, my understanding is that the rest of
@istio/technical-oversight-committee
<https://github.com/orgs/istio/teams/technical-oversight-committee> agree
Squashing this new API with completely different semantics on top of the
old one - no matter how much smoother the new API is - doesn't make sense
to me, but the rest of the TOC can approve
this CL as well if they think it's much better (and worth breaking so many
patterns around pretty much everything we have been doing since the start
of the project).
My request was that if you have to do things so drastically different -
leave the old API alone and don't twist it into something that means 10
different opposite things at once,
and explain clearly how it is different from the previous patterns of
scoping ( client separate from server - because of security and many other
reasons, boundaries, rationale for using
labels or hostnames in use cases - and all other things we did are no
longer good ). Because I'm pretty sure users will also be confused, I
highly doubt I'm the only one.
… Message ID: ***@***.***>
|
If hosts/ServiceSettings is a heavily utilized and critical API why isn't it documented? Why is it hidden? We're not breaking the use of the existing API for sidecar mode. We're offering an alternative API for ambient mode. In this proposed API, istiod will still discover ALL endpoints. And will selectively send endpoints to ztunnels/waypoints based on the services scope. We are discussing "discoverability" from the perspective of the ztunnels/waypoints. They will only be able to select or discover endpoints based the destination service's defined scope (and therefore what endpoints have been shared by istiod). For clarity (even though the exiting API isn't documented), we could create a seperate message for ambient mode service settings - Since there is so much knowlege being shared here about ServiceSettings and the hosts fields it would be great to document it. @costinm could you add this documentation? I think it will reduce confusion for us and users. |
mesh/v1alpha1/config.proto
Outdated
@@ -442,8 +442,59 @@ message MeshConfig { | |||
// The services to which the Settings should be applied. Services are selected using the hostname | |||
// matching rules used by DestinationRule. | |||
// | |||
// The hosts field is ignored if ambient mode is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit late to the party, but it would also be good to edit the docs of message Settings
to specify sidecar multicluster as opposed to ambient MC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we're restricting Settings to sidecar mode only. For ambient, we'll support both ServiceSettings and ServiceScopeConfigs
between ServiceSettings and ServiceScopeConfigs. Signed-off-by: Jackie Elliott <[email protected]>
I don't think it's used by a lot of users - but it is critical for the ones that use it. Is is hidden because it was expected to have feedback - and be documented in DestinationRule or as part of a first class API with the right ownership, with MeshConfig as a place holder. We followed the same path for locality load balancing - first hidden, then documented in MeshConfig and eventually moved to DR. Ownership is important - DestinationRule like HttpRoute can be namespaced and modified by the owner of the service - while MeshConfig is restricted to mesh admin. The relation between locality load balancing and the cluster local load balancing was also not very clearly defined -
So DestinationRule subsetting and locality load balancing are also 'discoverability' - from some perspective ? I don't understand the insistence on using the term 'discoverability from ztunnel perspective' when we never used this term in DestinationRule and the other APIs. If you really must use this word - use it with the context (ztunnel or waypoint discoverability) so users don't get confused or believe that Istiod or K8S discoverability are impacted.
AFAIK the intention was for this feature to follow the same path as locality load balancing - with use by advanced users first, but eventual promotion to a namespaced v1 API. Incidentally, ServiceExport from MCP is a namespaced API as well - and may work better for a producer setting than a mesh config fragment (and is neutral to implementation choices - i.e. not ambient-only). |
On Mon, May 12, 2025 at 9:03 AM Keith Mattix II ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In mesh/v1alpha1/config.proto
<#3464 (comment)>:
> + // namespacesSelector:
+ // matchExpressions:
+ // - key: istio.io/global
+ // operator: In
+ // values: [true]
+ // servicesSelector:
+ // matchExpressions:
+ // - key: istio.io/global
+ // operator: Exists
+ // scope: GLOBAL
+ // ```
+ message ServiceScopeConfig {
+ // The scope of the matching service. Used to determine if the service is available locally
+ // (cluster local) or globally (mesh-wide).
+ enum Scope {
+ LOCAL = 0;
The existing API has a single bool for local vs. global. Are you two
proposing that we invert the default for this boolean for ambient workloads
vs. sidecar workloads? That seems brittle...
How is true/false more brittle than an enum with 2 values, 0 and 1 ?
Usually true/false and 0/1 are equivalent.
The default value in the API - i.e. what happens if a user does not include
the field at all - is different from the default behavior in the mesh when
no configuration exists at all.
If user just has a
host: foo.kube-system.svc.cluster.local
without specifying 'cluster_local: true' or a scope - it would mean
opposite things for sidecar and ambient, i.e. sidecars will treat it as
global (default cluster_local is false) and ambient as local (0), which is
far from ideal.
The default if nothing is specified at all is what we have defined in
ambient ( there is no multicluster so all services are local) and what has
been the default in sidecar ( all services global when using
Secrets to watch remote clusters, explicit config when using
Service/WorkloadEntry)
Message ID: ***@***.***>
… |
Signed-off-by: Jackie Elliott <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some wording suggestions. I think I'd like to see something more general about how this API should be used by mesh admins to set the criteria for what services are global vs. local
On Tue, May 13, 2025, 11:47 Jackie Maertens (Elliott) < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In mesh/v1alpha1/config.proto
<#3464 (comment)>:
> @@ -450,6 +450,57 @@ message MeshConfig {
// Settings to be applied to select services.
repeated ServiceSettings service_settings = 50;
+ // Ambient mode multicluster scope configuration to be applied to matching services via namespace and/or
+ // service selectors. This configuration is used to define the scope of services configured by this
I am concerned about using "discovered by this cluster's control planes".
This might be too implementation focused, but technically the control plane
will discover all services but only share select services with the
dataplane.
Also, I'd like to clarify the meaning of a "local" scope for a matching
service. If service A matches a local scope selector which of the following
is true:
1. Any workloads on the local cluster A making requests to service A
will only be routed locally (even if service A is globally scoped on other
clusters). Service A will not be exposed at the e/w gateway (if one exists).
2. Any workloads on the local cluster A making requests to service A
will be routed locally and to any clusters where service A is globally
scoped). Any workload on a different cluster B where service A is globally
scope will be able to be routed locally or to any other clusters where
service A is global (not cluster A). Service A will not be exposed at the
e/w gateway (if it exists)
So far, we never used a mesh config even from a different istiod revision
in same cluster - and surely not a mesh config from a remote cluster. The
security and reliability risks are very high, compounded with
cross-namespace selector risks. We spent years removing the cross namespace
uses in original API.
I am not against adding a 4th way to configure if a service is exposed on a
gateway - after 2 Gateway APIs and ServiceExport, but crossing the cluster
boundaries to watch MeshConfigs is absolutely wrong.
…
—
Reply to this email directly, view it on GitHub
<#3464 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2UTKHEDPO76ATZDL5D26I43RAVCNFSM6AAAAABYXQCKRCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQMZXG44TEMRQGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Signed-off-by: Jackie Elliott <[email protected]>
I am not suggesting we cross cluster boundaries to watch MeshConfigs in different clusters. We are operating under the requirement that all MeshConfig ServiceScopeConfigs across clusters are the same. When istiod gets all services across all clusters, the selectors defined in the local MeshConfig are applied. |
Signed-off-by: Jackie Elliott <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cosmetic / doc suggestions - the structure and the rest is good.
// When in ambient mode, if ServiceSettings are defined they will be considered in addition to the | ||
// ServiceScopeConfigs. If a service is defined by ServiceSetting to be cluster local and matches a | ||
// global service scope selector, the service will be considered cluster local. If a service is | ||
// considered global by ServiceSettings and does not match a global service scope selector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is the right behavior: ServiceSettings operate by hostname, with the primary use case being services that are not under the control of the mesh admin ( like platform services ). If a user sets starts with a 'local by default' model (for example using Service scope with a wildcard or broad selector) - we want ServiceSettings to opt in by hostname.
The second (and maybe more important) issue is that the scope of both settings is usually .cluster.local names (or the custom suffix used by k8s ) - services using global names ( like .internal or prod.expample.com ) which are common for services used in mixed environments or VPC-scoped DNS, or using ServiceEntry and DNS interception - are almost always global.
I see original doc was vague on what happens to ServiceEntry / non-K8S FQDNs - they are still valid Istio services, but can't be in scope for either ServiceSettings or ServiceScope (unless we use the labels on the ServiceEntry - external services don't have labels )
// global service scope selector, the service will be considered cluster local. If a service is | ||
// considered global by ServiceSettings and does not match a global service scope selector | ||
// the serive will be considered local. Local scope takes precedence over global scope. Since | ||
// ServiceScopeConfigs is local by default, all services are considered local unless it is considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above - I would add 'takes precedence ... for .cluster.local services'. We don't want to accidentally break ServiceEntry and external services.
@@ -450,6 +458,54 @@ message MeshConfig { | |||
// Settings to be applied to select services. | |||
repeated ServiceSettings service_settings = 50; | |||
|
|||
// Configuration for ambient mode multicluster service scope. This setting allows mesh administrators | |||
// to define the criteria by which the cluster's control plane determines which services in other | |||
// clusters in the mesh are treated as global (accessible across multiple clusters) versus local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am reading this correctly as 'services in other clusters' == services using K8S Service and Pods in other cluster, and implicitly excluding ServiceEntry ? If yes - please make it explicit to avoid confusion. If no - why and how do we avoid breaking them ?
// and/or other matching criteria. This is particularly useful in multicluster service mesh deployments | ||
// to control service visibility and access across clusters. This API is not intended to enforce | ||
// security policies. Resources like DestinationRules should be used to enforce authorization policies. | ||
// If a service matches a global service scope selector, the service's endpoints will be globally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo - AuthorizationPolicy enforces authorization, DestinationRule load balancing (including locality) and other client-side policies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'll fix this typo
// If a service matches a global service scope selector, the service's endpoints will be globally | ||
// exposed. If a service is locally scoped, its endpoints will only be exposed to local cluster | ||
// services. | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to add a line about services matching multiple selectors ? Which one takes priority ? Maybe an example or a line on what happens if matchExpression is missing - does it match all and is the default ?
// Match expression for namespaces. | ||
LabelSelector namespace_selector = 1; | ||
|
||
// Match expression for serivces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only services in namespaces matching the namespace_selector will be used ( i.e. AND ). If namespace_selector is missing - all namespaces ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is consistent with the ambient enablement API. An empty selector (one selector must be non-empty) is the equivalent of a match all.
// (cluster local) or globally (mesh-wide). | ||
enum Scope { | ||
LOCAL = 0; | ||
GLOBAL = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the comment above specify 'global' to mean mesh-wide - why not use MESH instead of GLOBAL ?
Usually GLOBAL is larger than MESH, we are twisting enough words. Also GLOBAL can create confusion with regional/zonal ( covered by other APIs ) - while mesh is a bit more clear as intent.
Part of adding support for Ambient mulitcluster.
Based on https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0#heading=h.nehszyorutrv
#3463