-
Notifications
You must be signed in to change notification settings - Fork 59
Metrics: PSS memory for a particular load level #891
Conversation
There was a swarm failure not part of
Searched a little bit about this and found this issue: |
# Currently default nuttcp has a bug | ||
# see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745051 | ||
# Image name | ||
image=gabyct/nuttcp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still don't have the Dockerfile
for this image in the repository. I've raised #892 to track this as the image is already being used by another test.
This use of custom image is a concern because we cannot recreate it until we have the Dockerfile
.
In my opinion, we should use standard images where available, but if not, we must submit the Dockerfile
on the same PR as any new tests that use it.
Please can you ensure #892 and the very similar #848 are resolved (it should be just a matter of raising a PR with a single Dockerfile
for each issue I think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodh-intel here it is the Dockerfile #895
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodh-intel in that Dockerfile we have the dependencies for gabyct/network and gabyct/nuttcp something that I will rename after it is merged
# see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745051 | ||
# Image name | ||
image=gabyct/nuttcp | ||
# This is required in order to reduce standard deviation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What unit is this time value in (seconds)? What sort of SD are we seeing?
|
||
server_command="/root/nuttcp -S" | ||
$DOCKER_EXE run -tid --name=${server_name} ${image} bash > /dev/null | ||
server_address=$($DOCKER_EXE inspect --format "{{.NetworkSettings.IPAddress}}" ${server_name}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be getting quite a collection of these tests now (good), but they all "look" very similar. I wonder if there is any opportunity to refactor them into some shared functions (which could live in test-common.bash
) then the tests themselves could become small stubs that call the shared functions. What do you think?
@grahamwhaley - do you have any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodh-intel @GabyCT Indeed, pretty much all the network tests look almost identical.
I agree that we should be able to refactor a bunch of this code out, with all the benefits that should bring.
I don't think we need to block this PR due to that, but @GabyCT I will make a request - for the next PR, rather than a copy/modify can we do a factor-out of the common code parts into the common library and then backport. I'm happy for that to happen as a pair of PRs - one to add a new test and the common parts and then a follow up one to backport using that common code to the existing tests for instance.
# | ||
# Description: | ||
# Measures Proportional Set Size memory while running an | ||
# inter (docker<->docker) 1Gbps network bandwidth using nuttcp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presuming we chose 1Gbps here for a reason, can we add some explanation into this comment please?
image=gabyct/nuttcp | ||
# This is required in order to reduce standard deviation | ||
total_time=6 | ||
# This time (seconds) is required when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I suggest a comment rewrite here. Maybe something like:
'We wait for the test system to settle into a steady mode before we measure the PSS. Thus, we have two times - the length of time the test runs for, and the time after which we sample the PSS`
middle_time=3 | ||
|
||
# Rate limit (speed at which transmitter send data, megabytes) | ||
rate_limit=10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how this 10,000 translates to the '1Gbps' - I'm not sure I can figure it out?
$DOCKER_EXE run -tid --name=${server_name} ${image} bash > /dev/null | ||
server_address=$($DOCKER_EXE inspect --format "{{.NetworkSettings.IPAddress}}" ${server_name}) | ||
|
||
client_command="/root/nuttcp -R${rate_limit}m -T${total_time} ${server_address}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff here shows we have some mixed whitespace - just a couple of lines with spaces instead of tabs - can you fix those up for us please (as I suspect we may get an update submitted anyhow).
server_address=$($DOCKER_EXE inspect --format "{{.NetworkSettings.IPAddress}}" ${server_name}) | ||
|
||
client_command="/root/nuttcp -R${rate_limit}m -T${total_time} ${server_address}" | ||
$DOCKER_EXE exec ${server_name} bash -c "${server_command}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this server exec command line be better placed up with the rest of the server bits a few lines up, rather than mixed here amongst the client commands. I know you may be sequencing the run/exec here - but I think this can be placed higher up for better semantic context without impacting the test itself - you think?
echo >&2 "WARNING: sleeping for $middle_time seconds in order to have server and client stable" | ||
sleep ${middle_time} | ||
|
||
${memory_command} -P @QEMU_PATH@ | tail -n 2 > "$total_memory" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you add the --no-header here (as used at this link), you can drop the head
maybe:
https://github.com/01org/cc-oci-runtime/blob/master/tests/metrics/density/docker_memory_usage.sh.in#L70
Also, I'm not convinced about the split of the smem options between this line and the pss_memory() function setting up the memory_command variable - it feels a bit 'fragmented', particularly as there is only a single use in the file. If you can think of a way to make that cleaner, please do.
Some comments left for some cleanup and stylistic. Fundamentally the test runs and looks to produce pretty stable results. |
@jodh-intel and @grahamwhaley changes were applied, thanks for the feedback |
qa-passed |
lgtm I think we might have a bit of a wait on landing this though as we need @grahamwhaley's ack and he's not about atm. |
|
||
# Rate limit (speed at which transmitter send data, megabytes) | ||
# We will measure PSS with a specific transfer rate | ||
rate_limit=10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @GabyCT Thanks for the fixups. I'm still not sure if this 10,000 rate_limit equates to 1Gb(/s) - if it were 1000, then that would make sense (as we are measuring megabytes), or if we were aiming for 1Gbit, and this were 100 Mbytes (taking 1byte on the wire as ~10 bits....), I could understand. So, either I think we may have a typo here (10,000 instead of 1000?), or can we have some more explanation or correction of the text - just so the 1Gb and the 10,000 correlate? :-) Thanks.
@grahamwhaley thank you for the review, yeah it was a typo, changes applied |
Proportional Set Size (PSS) memory while running a nuttcp 1 Gb transfer rate. The PSS measurement is taken by using smem. Signed-off-by: Gabriela Cervantes <[email protected]>
qa-passed |
lgtm |
Hi @grahamwhaley - please can you re-review so we can get this landed? |
lgtm |
@jodh-intel and @grahamwhaley, I had some issues doing the rebase :( besides I noticed that this script was outdated as it was not using the common script like the other ones so I submitted this #939 that is basically the same measurement but now without conflicts in the Makefile and using the common script of the networking tests, sorry for the inconvenience. |
qa-passed |
Proportional Set Size (PSS) memory while running a nuttcp 1 Gb transfer rate.
The PSS measurement is taken by using smem.
Signed-off-by: Gabriela Cervantes [email protected]