-
Notifications
You must be signed in to change notification settings - Fork 42
mirror_ocp_release: fixes for concurrent jobs #626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
from change #626:
|
1 similar comment
from change #626:
|
@@ -11,20 +11,26 @@ | |||
when: | |||
- mor_force or not _mor_target.stat.exists | |||
block: | |||
- name: "Extract installer and metadata from release image" | |||
ansible.builtin.shell: > | |||
flock -x {{ mor_cache_dir }}/{{ mor_version }}/release_extract.lock -c ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this new approach, use of filesystem locks is not needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered scenarios where two jobs run concurrently, both extracting to a temporary location and writing to mor_cache_dir
? How do you prevent conflicts in such cases? Implementing a mechanism like a lock might help avoid these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a fair point. I was accepting the risk of facing such scenarios provided moving files around the filesystem is must faster than running the operations directly on the mor_cache_dir.
But at this point it's right what we could just limit the protected zone to the task were we copy the files to the cache directory once they have been processed in the temporary directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here comes another thought. The way this implementation works, the problematic task would be when we run the copy module to move the files from the temporary directory to the cache directory. Alternatively or in combination of the lock usage, we can add the parameter "force: false" to the module call, so ansible won't replace a file that already exists, even if the contents are different.
The question here is whether it'd be safe to assume the artifact won't change between jobs deploying the same OCP release.
My guess is the files won't change if the jobs are running concurrently, but would those artifacts change between jobs running with, say, days of difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I don't find a way of implementing locks that would allows us to run ansible tasks in the locked zone.
In other words, when using locks the lock code must be part of the same shell script.
{{ mor_oc }} adm release extract | ||
--registry-config {{ mor_auths_file }} | ||
--command={{ mor_installer }} | ||
--from {{ mor_pull_url }} | ||
--to "{{ mor_cache_dir }}/{{ mor_version }}"; | ||
--to "{{ _mor_tmp.path }}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifacts are first extracted into the temporary directory for the job.
get_checksum: false | ||
register: target | ||
when: | ||
- not mor_force |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're extracting the artifacts on a temporary directory, the file won't exist in advance.
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 43s |
- name: Copy artifacts to release directory | ||
ansible.builtin.copy: | ||
src: "{{ _mor_tmp.path }}/" | ||
dest: "{{ mor_cache_dir }}/{{ mor_version }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of the job execution the artifacts are copied to the job's target directory. For this, we use a regular task at the end of the role instead of handlers, so we don't wait for the end of the play to copy the artifacts.
7e5d9f5
to
e7675e7
Compare
from change #626:
|
1 similar comment
from change #626:
|
state: directory | ||
prefix: mor- | ||
register: _mor_tmp | ||
notify: "Remove temporary directory" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use a handler to remove the temporary directory so we make sure it's run at the end of the play.
The goal is to have it running on successful and failed jobs. For this to work, the play calling this role must activate the flag "force_handlers: true", otherwise the handler will only be run on success.
We chose this approach to a block with an "always" section, so we don't need to add extra tasks to track the failed step properly.
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 53s |
e7675e7
to
903f001
Compare
from change #626:
|
1 similar comment
from change #626:
|
@@ -35,4 +27,5 @@ | |||
ansible.builtin.command: /usr/sbin/restorecon -R "{{ mor_dir }}/{{ mor_uri | basename }}" | |||
become: true | |||
when: _mor_selinux.rc == 0 | |||
# we may need to run this task over the target directory rather than mor_dir (= _mor_tmp.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original scenario, the artifacts are extracted into an httpd served directory, so restoring the contexts is needed for the files to be properly served. Restoring the contexts on the temporary directory may not have any effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the next tasks in the artifacts.yml file after including the fetch.yml set new the new context container_file_t on the extracted artifacts. This I assume is done from here so it'll be applied to both, OCP versions greater or equal than 4.8 and lower than 4.8. But that means the tasks in fetch.yml are not needed.
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 48s |
903f001
to
11767d3
Compare
from change #626:
|
1 similar comment
from change #626:
|
- name: "Apply new SELinux file context to file" | ||
ansible.builtin.command: /usr/sbin/restorecon -R "{{ mor_dir }}/{{ mor_uri | basename }}" | ||
become: true | ||
when: _mor_selinux.rc == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restoring the selinux context does not make sense when extracting the artifacts on a temporary directory.
Also, the first tasks in artifacts.yml after including fetch.yml override the context and set it to container_file_t, which should be valid even after moving the artifacts to the target directory served from the cache container.
A different discussion is whether these tasks should be run before or after copying the artifacts to the target directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to restore this block of code, since fetch.yml is also included from images.yml to pull the disk image directly into the cache store (version directory ignored) so then it's directly served by the cache container.
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 46s |
11767d3
to
4995c13
Compare
from change #626:
|
1 similar comment
from change #626:
|
4995c13
to
13e17c5
Compare
from change #626:
|
1 similar comment
from change #626:
|
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 49s |
13e17c5
to
bd32593
Compare
5c7fcac
to
c53b9cc
Compare
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 45s |
from change #626: |
- name: Copy artifacts to release directory | ||
ansible.builtin.copy: | ||
src: "{{ _mor_tmp.path }}/" | ||
dest: "{{ mor_cache_dir }}/{{ mor_version }}/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please apply the SELinux context to the file in the release directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Hey @nsilla, I have some concerns that this change might undermine the caching mechanism, leading to unnecessary downloads of releases/ISOs on every job. I understand the complexity here, and I also recognize that some installers already perform similar tasks during installation. Just my 2 cents- let's hear what the rest of the team thinks. |
19e8084
to
b3dc5c9
Compare
from change #626: |
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 50s |
b3dc5c9
to
89d08c8
Compare
from change #626: |
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 51s |
src: "{{ _mor_tmp.path }}/" | ||
dest: "{{ mor_cache_dir }}/{{ mor_version }}/" | ||
mode: preserve | ||
force: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By setting the "force" parameter to "false" we prevent the module to get a file into an inconsistent state if the file already exists.
This is interesting, for instance, to prevent binary execution exceptions if modification of the binary file are detected during the execution.
There are some concerns regarding this approach, though:
- if force is set to true, existing files are only replaced if changed, which should not happen between files belonging in the same release number.
- since Ansible copy module uses atomic moves, the target file path should not suffer hash code changes during the copy process, so it shouldn't be possible for Ansible to mark a file as replaceable just because it's content is in an unstable state yet.
- thus, the force: false option is only an extra precaution we take.
- this feature would prevent files to be updated if they suffer any modification even within the same ocp release number.
89d08c8
to
355027e
Compare
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 49s |
355027e
to
ceeec63
Compare
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 46s |
ceeec63
to
33a30f4
Compare
from change #626: |
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 48s |
33a30f4
to
eb3c103
Compare
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 46s |
from change #626: |
SUMMARY
Fixes CILAB-2034: when multiple jobs run to mirror the same version on the same host, the status of the mirroring directory may be unstable depending on the order the tasks are run on each job.
This change implements temporary directories per role call so the artifacts are only moved to the final location at the end of the execution.
ISSUE TYPE
Tests
TestBos2Sno: sno sno:components=ocp=4.18.5,