Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issue on openshift #357

Open
saad0805 opened this issue Sep 14, 2023 · 11 comments
Open

performance issue on openshift #357

saad0805 opened this issue Sep 14, 2023 · 11 comments

Comments

@saad0805
Copy link

redhat default value for read_ahead_kb is 64KB

for all persistent volumes using csi driver we see that value is 32Mb

and this is causing degraded io on our applications on ocp cluster.

is this defined somewhere? why is redhat default value overwritten

We are using 2.3.0 operator on redhat openshift

@datamattsson
Copy link
Collaborator

Thanks for raising this. I've examined our procedures and we change read_ahead_kb for Nimble devices but we set it to 128KB, not 32MB. I can't find any other traces of read_ahead_kb being touched.

@saad0805
Copy link
Author

thanks. that is weird then

When we were on ocp 4.10 / csi operator 2.2 read_ahead_kb was at 128K

Since the upgrade of csi driver to 2.3 and ocp to 4.12 , read_ahead_kb has been changed to 32M

cat /sys/class/block/dm-*/queue/read_ahead_kb

32768
32768
32768
32768
32768
32768

We try manually changing the value for a Persistent volume :

echo 4096 > /sys/block/dm-27/queue/read_ahead_kb

then empty the cache :
echo 1 > /proc/sys/vm/drop_caches

But when we restart the pod, the value is overwritten and comes back to 32M

For info we are using hpe 3par as storage . Worker nodes are on Synergy.

All local disks are at default redhat value of 4096K.

cat /sys/class/block/sda/queue/read_ahead_kb

4096

We have other servers in the same synergy with rhel7 and virtual machines on vmware with rhel7 and rhel8 with vlumes on same storage . it is @ 4096K as well

From your point of view, if csi driver is not setting this, is it on K8s, redhat kernel, storage or server level ?

sc : parameters:
accessProtocol: fc
allowMutations: compression, hostSeesVLUN
compression: "true"
cpg: XXXX
csi.storage.k8s.io/controller-expand-secret-name: primera3par-secret
csi.storage.k8s.io/controller-expand-secret-namespace: kube-system
csi.storage.k8s.io/controller-publish-secret-name: primera3par-secret
csi.storage.k8s.io/controller-publish-secret-namespace: kube-system
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/node-publish-secret-name: primera3par-secret
csi.storage.k8s.io/node-publish-secret-namespace: kube-system
csi.storage.k8s.io/node-stage-secret-name: primera3par-secret
csi.storage.k8s.io/node-stage-secret-namespace: kube-system
csi.storage.k8s.io/provisioner-secret-name: primera3par-secret
csi.storage.k8s.io/provisioner-secret-namespace: kube-system
hostSeesVLUN: "true"
provisioning_type: dedup

csi specs :
spec:
csp:
affinity: {}
labels: {}
nodeSelector: {}
tolerations: []
logLevel: info
node:
affinity: {}
labels: {}
nodeSelector: {}
tolerations: []
disable:
alletra6000: true
alletra9000: false
nimble: true
primera: false
disableNodeConformance: false
iscsi:
chapPassword: ''
chapUser: ''
imagePullPolicy: IfNotPresent
disableNodeGetVolumeStats: false
controller:
affinity: {}
labels: {}
nodeSelector: {}
tolerations: []
registry: quay.io
kubeletRootDir: /var/lib/kubelet/

multipath.conf :
device {
product "VV"
features "0"
prio alua
path_selector "round-robin 0"
rr_weight "uniform"
path_grouping_policy group_by_prio
no_path_retry 18
hardware_handler "1 alua"
path_checker tur
detect_prio yes
rr_min_io_rq 1
fast_io_fail_tmo 10
dev_loss_tmo infinity
vendor "3PARdata"
failback immediate
}
}

@datamattsson
Copy link
Collaborator

It could be some udev rule that sets it. Have you grep'd around in /etc/udev/rules.d/?

@saad0805
Copy link
Author

Infact ,I have grep in /etc , but nowhere to be found

1 similar comment
@saad0805
Copy link
Author

Infact ,I have grep in /etc , but nowhere to be found

@datamattsson
Copy link
Collaborator

What does Red Hat have to say about the matter? Are they pointing at the CSI driver?

@datamattsson
Copy link
Collaborator

This is what I'm seeing on OCP 4.13 with HPE CSI Driver v2.4.0-beta:

$ cat /sys/class/block/dm-*/queue/read_ahead_kb
128

I am using Nimble in this particular case though where we set it to 128.

@datamattsson
Copy link
Collaborator

This is what appears on a Primera, same OCP and CSI driver etc.

$ cat /sys/class/block/dm-*/queue/read_ahead_kb
8160

@datamattsson
Copy link
Collaborator

I've determined we can't do anything from the CSI driver perspective. Custom udev rules needs to be created for 3PAR devices on the worker nodes.

Create the below file at /etc/udev/rules.d/99-3par-tune.rules and run udevadm control --reload-rules. Also run udevadm trigger if you have attached devices.

##
# Copyright 2023 Hewlett Packard Enterprise Development LP.
#
##

ACTION!="add|change", GOTO="3par_tuning_end"
SUBSYSTEM!="block", GOTO="3par_tuning_end"
KERNEL!="sd*|dm-*", GOTO="3par_tuning_end"
KERNEL=="dm-*", ENV{DM_UUID}!="mpath-360002ac*", GOTO="3par_tuning_end"
ENV{DEVTYPE}=="partition", GOTO="3par_tuning_end"

# Please uncomment the lines beginning with ATTR to enable these rules
# and run "udevadm control --reload-rules" and "udevadm trigger" to apply for all 3PAR devices.

# set max_sectors_kb to max_hw_sectors_kb.
#ATTR{queue/max_sectors_kb}="4096"
# set read_ahead_kb to 64
ATTR{queue/read_ahead_kb}="64"
# set nr_requests to 512.
#ATTR{queue/nr_requests}="512"
# set scheduler to noop.
#ATTR{queue/scheduler}="noop"
# disable add_random.
#ATTR{queue/add_random}="0"
# disable rotational.
#ATTR{queue/rotational}="0"
# set rq_affinity to 2.
#ATTR{queue/rq_affinity}="2"

LABEL="3par_tuning_end"

@saad0805
Copy link
Author

Thank you Michael for your help.
After exchanging with hpe and redhat support we finally found out that the issue was with a change in the calculation of the read_ahead_kb in the linux kernel . This has changed from GA 8.5. So yes the only way is to apply a custom uev rule?

What we are still not sure is whether this fix will work when a pod will be restarted or scheduled to run on another worker node.
As we did this already with tuneD operator but the value comes back to 32M as soon as we restart a pod.

We will test this and update you

@datamattsson
Copy link
Collaborator

datamattsson commented Sep 21, 2023

I think the udev rule needs to be injected and enabled by a MachineConfig. I've not made one of these myself before, but here's the documentation on how to do it: https://docs.openshift.com/container-platform/4.12/post_installation_configuration/machine-configuration-tasks.html

Edit: The udev rule will be injected on all worker nodes and udev will intercept all 3PAR devices. Pod restarts won't affect the effective values set by udev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants