Skip to content

Commit 9105afa

Browse files
committed
Introduce ADR for Project CodeFlare release process
Signed-off-by: Anish Asthana <[email protected]>
1 parent 2b59429 commit 9105afa

File tree

2 files changed

+123
-0
lines changed

2 files changed

+123
-0
lines changed
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Standardized release process for Project CodeFlare
2+
3+
| | |
4+
| -------------- | ------------------------ |
5+
| Date | 04/28/2023 |
6+
| Scope | |
7+
| Status | Proposed |
8+
| Authors | [Anish Asthana](@anishasthana) [Mustafa Eyceoz](@Maxusmusti) |
9+
| Supersedes | N/A |
10+
| Superseded by: | N/A |
11+
| Issues | |
12+
| Other docs: | PCF-ADR-002 |
13+
14+
## What
15+
16+
This ADR introduces a unified release process, testing, and versioning scheme for Project CodeFlare.
17+
[PCF-ADR-002](./PCF-ADR-0002-release-story-and-branch-maintenance.md) is a related ADR to this one but is purely supplemental. ADR-002 covers the branching strategy to be used by all components, whereas this ADR is focused on higher
18+
level details.
19+
20+
## Why
21+
22+
Project CodeFlare currently has different release processes and versioning schemes for all of its subprojects. This creates confusion for users when they wish to use the stack -- what components versions should they be using? How do external dependencies like KubeRay factor in?
23+
There are also no standards around testing for the project.
24+
This ADR aims to document the overall release process for Project CodeFlare.
25+
26+
## Goals
27+
28+
* Establish a common versioning scheme for all CodeFlare components
29+
* Establish a release process for Project CodeFlare
30+
* Establish guidelines for CodeFlare support matrix
31+
32+
## Non-Goals
33+
34+
* Require all components to restart versioning from scratch
35+
* Outline technical details for tests or CI infrastructure.
36+
37+
## How
38+
39+
We are proposing a unified release cycle for Project CodeFlare.
40+
We will release a new version of Project CodeFlare after every sprint (currently every 3 weeks), following normal semantic versioning. All components under the CodeFlare umbrella are expected to start using semantic versioning for future releases. Components are not expected to start versioning again from scratch (i.e v0.0.1). As part of a new release for Project CodeFlare, we will update the support matrix which outlines specific component versions and versions for important dependencies such as KubeRay. The components to be documented are as follows:
41+
42+
* CodeFlare Operator
43+
* The CodeFlare Operator version will be aligned with (i.e equivalent to) the Project CodeFlare version
44+
* CodeFlare SDK
45+
* The CodeFlare Notebook Image will follow the same versioning.
46+
* Supported TorchX version will be documented under the SDK.
47+
* InstaScale
48+
* MCAD
49+
* KubeRay
50+
* On a new KubeRay release, CodeFlare will update it's supported version by the end of the next CodeFlare sprint
51+
52+
### Images
53+
54+
For published images, we are proposing repositories include the following three tags:
55+
56+
1. Stable – Updated once every sprint as part of an official release
57+
2. dev – Updated every time a new PR is merged into the main branch of a repository
58+
3. Normal semantic version tags – Created whenever components require a new image to be published. This may happen more than once per sprint.
59+
60+
### Testing
61+
62+
Each component must implement unit and e2e tests that are run on all PRs.
63+
The CodeFlare operator must include an integration test suite that will ensure stability of the overall stack.
64+
In addition to running the integration test suite on every PR to the CodeFlare operator, the integration tests are run nightly in the CodeFlare operator repository using the `dev` tag for all underlying components
65+
Component repositories will not be required to run the integration test suite. This may chagne at a later date once the maturity of Project CodeFlare and the integration test suite increases.
66+
67+
The components themselves have dependencies on each other as well that may result in updates being required for multiple components. For example, an update to the AppWrapper CRD in the MCAD repository will require us to update all the components below it in the chain. On the other hand, the codeflare-sdk will not accept changes that will break integration with MCAD.
68+
69+
![codeflare_dependencies](images/PCF-ADR-003-dependencies.png)
70+
71+
With this in mind, we will make sure that the e2e tests for components pull in the `dev` tag for all components above them in the chain. This will help us catch issues earlier in the sprint.
72+
73+
### Release Process
74+
75+
1. Ask components to generate a new release.
76+
1. This will just be the new semantic version tag.
77+
2. Update image references in CodeFlare Operator repository
78+
1. Make sure that any CRD updates in the MCAD repository are copied into the CodeFlare operator repository
79+
2. Once merged, create a new operator image.
80+
3. Generate new operator bundle in CodeFlare Operator
81+
1. Update the image reference for the bundle
82+
2. As part of this, we will be testing the latest component versions
83+
3. Update support matrix in operator README.
84+
4. Create tag in operator repository and update release notes of operator version to include new support matrix.
85+
5. Open a pull request to OpenShift community operators repository with latest bundle.
86+
6. Once merged, update component stable tags to point at the latest image release.
87+
7. Announce the new release in slack and mail lists, if any.
88+
8. Update the Distributed Workloads component in ODH.
89+
1. At a minimum, we will need to update the README for Distributed Workloads to refer to the latest versions of each component
90+
2. Updates could include custom resources, KubeRay version, or usage/installation instructions.
91+
3. Update needs to occur within a week of CodeFlare release
92+
4. Make sure that the platform SIG is aware of these updates, so that the ODH release notes are updated as expected
93+
94+
Hotfix Process?
95+
If a critical issues is discovered requiring an out-of-band release, we will follow the normal release process, bumping up the z stream version for Project CodeFlare as well as the related component.
96+
97+
## Open Questions
98+
99+
1. How does versioning and releases for the distributed workloads component for ODH fit into this?
100+
* Currently thinking it will just follow Project CodeFlare versions.
101+
2. Do we want to include a code/feature freeze date?
102+
* Leaning towards keeping this window as small as possible – something like 2 days before the end of a given sprint.
103+
3. There’s work ongoing to include ui integrations for MCAD in the odh-dashboard. How does that fit into our release process?
104+
4. There’s a lot of scope for automation improvements as well as testing additions. How do we prioritize these inclusions?
105+
5. The CodeFlare operator currently uses pinned versions for component references, should we just use the component "stable" tags instead?
106+
* Currently thinking "No" -- using explicit version references makes it very clear what version of what component is included in a given release of the stack.
107+
108+
## Alternatives
109+
110+
We didn't consider any other alternatives
111+
112+
## Stakeholder Impacts
113+
114+
| Group | Key Contacts | Date | Impacted? |
115+
| ----------------------------- | ------------------ | ---------- | --------- |
116+
| CodeFlare SDK | Mustafa Eyceoz | 04/28/2023 | yes |
117+
| MCAD | Abhishek Malvankar | 04/28/2023 | yes |
118+
| InstaScale | Abhishek Malvankar | 04/28/2023 | yes |
119+
| CodeFlare Operator | Anish Asthana | 04/28/2023 | yes |
120+
121+
## Reviews
122+
123+
Reviews on the pull request will suffice for the approval process. At least 2 approvals are required prior to this ADR being merged. The ADR must also remain open for at least one week.

images/PCF-ADR-003-dependencies.png

18.9 KB
Loading

0 commit comments

Comments
 (0)