Skip to content
This repository was archived by the owner on Mar 28, 2018. It is now read-only.

Metrics: PSS memory for a particular load level #891

Closed
wants to merge 2 commits into from

Conversation

GabyCT
Copy link
Contributor

@GabyCT GabyCT commented May 8, 2017

Proportional Set Size (PSS) memory while running a nuttcp 1 Gb transfer rate.
The PSS measurement is taken by using smem.

Signed-off-by: Gabriela Cervantes [email protected]

@chavafg
Copy link
Contributor

chavafg commented May 8, 2017

qa-failed

Rejected with PullApprove

@chavafg
Copy link
Contributor

chavafg commented May 8, 2017

There was a swarm failure not part of cc-oci-runtime.

not ok 30 Checking MTU values in different interfaces
# (from function `setup' in test file /home/cloud/go/src/github.com/01org/cc-oci-runtime/cc-oci-runtime-2.1.7/tests/integration/docker/mtu.bats, line 43)
#   `$DOCKER_EXE swarm init ${swarm_interface_arg}' failed
# Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.
# Error response from daemon: error updating cluster settings: rpc error: code = 2 desc = update out of sequence
# Error response from daemon: service testswarm not found
# ID                           HOSTNAME                     STATUS  AVAILABILITY  MANAGER STATUS
# 91zpee0q04dqbwvnp6dwg6kv6 *  fedora-cc-ci-vm.localdomain  Ready   Active        Leader
# Node left the swarm.
# Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.

Searched a little bit about this and found this issue:
moby/moby#30794

@chavafg
Copy link
Contributor

chavafg commented May 8, 2017

qa-passed

Approved with PullApprove

# Currently default nuttcp has a bug
# see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745051
# Image name
image=gabyct/nuttcp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still don't have the Dockerfile for this image in the repository. I've raised #892 to track this as the image is already being used by another test.

This use of custom image is a concern because we cannot recreate it until we have the Dockerfile.

In my opinion, we should use standard images where available, but if not, we must submit the Dockerfile on the same PR as any new tests that use it.

Please can you ensure #892 and the very similar #848 are resolved (it should be just a matter of raising a PR with a single Dockerfile for each issue I think?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel here it is the Dockerfile #895

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel in that Dockerfile we have the dependencies for gabyct/network and gabyct/nuttcp something that I will rename after it is merged

# see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745051
# Image name
image=gabyct/nuttcp
# This is required in order to reduce standard deviation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What unit is this time value in (seconds)? What sort of SD are we seeing?


server_command="/root/nuttcp -S"
$DOCKER_EXE run -tid --name=${server_name} ${image} bash > /dev/null
server_address=$($DOCKER_EXE inspect --format "{{.NetworkSettings.IPAddress}}" ${server_name})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be getting quite a collection of these tests now (good), but they all "look" very similar. I wonder if there is any opportunity to refactor them into some shared functions (which could live in test-common.bash) then the tests themselves could become small stubs that call the shared functions. What do you think?

@grahamwhaley - do you have any thoughts on this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel @GabyCT Indeed, pretty much all the network tests look almost identical.
I agree that we should be able to refactor a bunch of this code out, with all the benefits that should bring.
I don't think we need to block this PR due to that, but @GabyCT I will make a request - for the next PR, rather than a copy/modify can we do a factor-out of the common code parts into the common library and then backport. I'm happy for that to happen as a pair of PRs - one to add a new test and the common parts and then a follow up one to backport using that common code to the existing tests for instance.

#
# Description:
# Measures Proportional Set Size memory while running an
# inter (docker<->docker) 1Gbps network bandwidth using nuttcp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presuming we chose 1Gbps here for a reason, can we add some explanation into this comment please?

image=gabyct/nuttcp
# This is required in order to reduce standard deviation
total_time=6
# This time (seconds) is required when

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I suggest a comment rewrite here. Maybe something like:
'We wait for the test system to settle into a steady mode before we measure the PSS. Thus, we have two times - the length of time the test runs for, and the time after which we sample the PSS`

middle_time=3

# Rate limit (speed at which transmitter send data, megabytes)
rate_limit=10000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this 10,000 translates to the '1Gbps' - I'm not sure I can figure it out?

$DOCKER_EXE run -tid --name=${server_name} ${image} bash > /dev/null
server_address=$($DOCKER_EXE inspect --format "{{.NetworkSettings.IPAddress}}" ${server_name})

client_command="/root/nuttcp -R${rate_limit}m -T${total_time} ${server_address}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff here shows we have some mixed whitespace - just a couple of lines with spaces instead of tabs - can you fix those up for us please (as I suspect we may get an update submitted anyhow).

server_address=$($DOCKER_EXE inspect --format "{{.NetworkSettings.IPAddress}}" ${server_name})

client_command="/root/nuttcp -R${rate_limit}m -T${total_time} ${server_address}"
$DOCKER_EXE exec ${server_name} bash -c "${server_command}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this server exec command line be better placed up with the rest of the server bits a few lines up, rather than mixed here amongst the client commands. I know you may be sequencing the run/exec here - but I think this can be placed higher up for better semantic context without impacting the test itself - you think?

echo >&2 "WARNING: sleeping for $middle_time seconds in order to have server and client stable"
sleep ${middle_time}

${memory_command} -P @QEMU_PATH@ | tail -n 2 > "$total_memory"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you add the --no-header here (as used at this link), you can drop the head maybe:
https://github.com/01org/cc-oci-runtime/blob/master/tests/metrics/density/docker_memory_usage.sh.in#L70
Also, I'm not convinced about the split of the smem options between this line and the pss_memory() function setting up the memory_command variable - it feels a bit 'fragmented', particularly as there is only a single use in the file. If you can think of a way to make that cleaner, please do.

@grahamwhaley
Copy link

Some comments left for some cleanup and stylistic. Fundamentally the test runs and looks to produce pretty stable results.

@GabyCT
Copy link
Contributor Author

GabyCT commented May 15, 2017

@jodh-intel and @grahamwhaley changes were applied, thanks for the feedback

@chavafg
Copy link
Contributor

chavafg commented May 15, 2017

qa-passed

@jodh-intel
Copy link
Contributor

jodh-intel commented May 15, 2017

lgtm

I think we might have a bit of a wait on landing this though as we need @grahamwhaley's ack and he's not about atm.

Approved with PullApprove


# Rate limit (speed at which transmitter send data, megabytes)
# We will measure PSS with a specific transfer rate
rate_limit=10000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @GabyCT Thanks for the fixups. I'm still not sure if this 10,000 rate_limit equates to 1Gb(/s) - if it were 1000, then that would make sense (as we are measuring megabytes), or if we were aiming for 1Gbit, and this were 100 Mbytes (taking 1byte on the wire as ~10 bits....), I could understand. So, either I think we may have a typo here (10,000 instead of 1000?), or can we have some more explanation or correction of the text - just so the 1Gb and the 10,000 correlate? :-) Thanks.

@GabyCT
Copy link
Contributor Author

GabyCT commented May 17, 2017

@grahamwhaley thank you for the review, yeah it was a typo, changes applied

Proportional Set Size (PSS) memory while running a nuttcp 1 Gb
transfer rate. The PSS measurement is taken by using smem.

Signed-off-by: Gabriela Cervantes <[email protected]>
@chavafg
Copy link
Contributor

chavafg commented May 17, 2017

qa-passed

@jodh-intel
Copy link
Contributor

lgtm

@jodh-intel
Copy link
Contributor

Hi @grahamwhaley - please can you re-review so we can get this landed?

@grahamwhaley
Copy link

lgtm
but now there is a merge conflict on Makefile.am - @GabyCT , can you do the necessary rebase, and then we can merge!

@GabyCT
Copy link
Contributor Author

GabyCT commented Jun 1, 2017

@jodh-intel and @grahamwhaley, I had some issues doing the rebase :( besides I noticed that this script was outdated as it was not using the common script like the other ones so I submitted this #939 that is basically the same measurement but now without conflicts in the Makefile and using the common script of the networking tests, sorry for the inconvenience.

@GabyCT GabyCT closed this Jun 1, 2017
@GabyCT GabyCT removed the in progress label Jun 1, 2017
@chavafg
Copy link
Contributor

chavafg commented Jun 1, 2017

qa-passed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants