Skip to content

Conversation

@ptosi
Copy link
Contributor

@ptosi ptosi commented May 15, 2019

Introduction

As discussed in #971, this PR tries to modify the perf instrument so that it accepts any perf command. It is related with ARM-software/devlib#388 which implements its back-end. It also requires ARM-software/devlib#387 to properly run some perf subcommands (ARM-software/devlib#386).

The idea is to accept a YAML input (part of the agenda) that is in the format of what perf expects:

$ perf help

 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

Which then only requires formatting this command based on the input and doesn't require having logic in the WA instrument to handle every COMMAND.

Single Command

An excerpt from an agenda describing a command (here, perf record) would look like:

                command: record
                flags:
                    - all-cpus
                    - no-inherit
                kwflags:
                    freq: 1000
                    event:
                        - r8
                        - cs
                    cpu:
                        - 0
                        - 4
                args:
                    command: sleep
                    args: 1000
                stdout: '/dev/null'
                stderr: '&1'

the accepted keys (command, flags, kwflags, ...) being the names of the parameters to devlib.utils.cli.Command.__init__ (see the back-end implementation). This allows the highest level of flexibility, will support any version of perf and any perf subcommand, is relatively simple to implement and maintain and relatively robust (see the following discussion about fail cases). Having such a level of control over the parts of the perf command allows us to take full advantage of YAML anchors. The readability (as seen from the agenda) of this approach partially comes from the expressiveness of the perf flags. As an example, notice that events are passed as clearly as in the previous implementation, yet no specific logic had to be implemented for it in the instrument.

Full Run

On top of this, the proposed perf instrument takes 3 dictionaries of commands (pre_commands, commands, post_commands) based on when these commands have to be run: instead of hardcoding this information in the instrument for each subcommand (cfr. #971), we let the user input it, which is a small amount of low-risk work for them. The keys of the command dictionaries are their labels (as introduced by the previous implementation, see further discussion in the back-end PR).

For example, a full agenda:

config:
    augmentations:
    - perf
    iterations: 1
    perf:
        force_install: true
        commands:
            first:
                command: record
                flags:
                    - all-cpus
                    - no-inherit
                kwflags:
                    freq: 1000
                    event:
                        - r8
                        - cs
                    cpu:
                        - 0
                        - 4
                args:
                    command: sleep
                    args: 1000
                stdout: '/dev/null'
                stderr: '&1'
        post_commands:
            first:
                command: report
                kwflags:
                    sort: dso
                    field-separator: ;
                stdout: report.stdout

workloads:
-   name: dhrystone
    params:
        cleanup_assets: true
        cpus: 0
        delay: 0
        duration: 0
        mloops: 0
        threads: 4

which will run the following on the device (notice the properly escaped --field-separator):

# perf record --all-cpus --no-inherit --freq=1000 --event=r8,cs --cpu=0,4 -- sleep 1000 1>/dev/null 2>&1
# perf report --sort=dso '--field-separator=;' -- 1>report.stdout

according to the following rules:

  • flags are prepended with - or -- based on their length;
  • kwflags are CSV lists linked with = (valid for all perf commands; seems to be a (sometimes implicit) standard for flags taking values. This is probably because they all use the same front-end);
  • args is recursively parsed as a command, allowing to use perf in its original usage (i.e. by launching a command through it and only capturing that command);
  • stdout and stderr are used as expected and handle the UNIX &-based pipe redirection (see discussion about file names, labels, and which files which commands can see); extra flags could be considered such as stdin or pipes but I'm wondering how generic we should go, here ...

NB: The user (writing the agenda) needs to have access to the file names so that they can use the -o/-i flags to properly pipe their record/report.

The force_install flag is kept from the previous implementation. However, what is the "standard WA way" of providing the perf binary? We could write the path in the agenda (e.g. for comparing versions of perf), we could have a collection of perf binaries in a standard location and pick the ideal one (e.g. based on kernel version of the target) from within the WA perf instrument, we could (but this requires automating the automation, WA, and feels messy) have WA get the binary from a standard location on the host and overwrite that one between runs... Ideally, it would be useful to have both control of which binaries are being used and automation for finding the optimal version (as I believe the perf-kernel interface evolves with the kernel).

Parsing of the output into WA metrics hasn't been started yet. I propose to have, in this case, a per-command logic (I don't think there is another way) for the perf subcommands that are supported (probably stat and report). Because of how flexible and variable the output of perf report can be (typically a table or a graph), what would be the expected output in terms of WA metrics? I currently have a parser that generates a pandas.DataFrame from the tabular output of perf report but this seems to be "too different" from the JSON-based outputs WA instrument seem to be using. Feedback required!

Full Run (with the power of YAML)

Another agenda, similar to the example mentioned in the docstring of the previous implementation, runs the same perf stat on the big and LITTLE clusters (using YAML anchors; #976):

    perf:
        commands:
            little: &little
                command: stat
                flags:
                    - all-cpus
                    - no-inherit
                kwflags: &little_kwflags
                    freq: 1000
                    event:
                        - r8
                        - cs
                    cpu: 0-3
                args:
                    command: sleep
                    args: 1000
                stdout: 'little.stat'
                stderr: '&1'
            big:
                <<: *little
                kwflags:
                    <<: *little_kwflags
                    cpu: 4-7
                stdout: 'big.data'

Drawbacks

Obviously, the flexibility of this instrument comes at a cost as this approach won't necessarily fail early:

  • Errors at the first "YAML dictionary level" (post_commands, commands, pre_commands, force_install) fail early as usual (through using the Parameter class);
  • Errors at the second "YAML dictionary level" (command, flags, kwflags, ...) fail relatively early, at instantiation-time of Command (almost same behaviour as Parameter);
  • Errors at the third "YAML dictionary level" (values of flags, keys of kwflags, format of the final command string) fail late, only once the corresponding command is run on the device.

Secondly, because perf is a tool intended to run another command, this instrument is implemented while making the assumption that the user of the agenda file is allowed by the owner of the target to run arbitrary commands on it. In the case they are not, additional security should be implemented. For example, it is possible to write an agenda running the following destructive performance-investigating command:

perf record -- rm -rf /

TODO:

  • Write a "porting guide" from old-PerfInstrument to PerfStatInstrument
  • Deprecate PerfStatInstrument
  • Document the instrument
  • Decide how labels are used on the device and the host, how output files are named (and how much control the user has over it) and if there are per-label subdirectories;
  • Implement parsers for stat
  • Implement parsers for report
  • Figure out why Parameter(..., kind=YamlCommandDescriptor) behaves weirdly and required somewhat hacking the constructor
  • Unlock teardown: should we remove everything from the target?
  • Change (or not) force_install

@setrofim
Copy link
Contributor

This is too opaque. Configuring this for someone unfamiliar would be a nightmare. Rather than having a single "command" map, parameters should be exposed individually, validated, and populated with sensible defaults whenever possible.

As a rule of thumb -- I should be able to run wa show perf, and the resulting page should be enough to allow me to at least start using the instrument without also having to look up the man page for perf and/or devlib docs. When I run wa create create agenda perf dhrystone, the resulting agenda should generate some kind of reasonable output for perf.

@ptosi
Copy link
Contributor Author

ptosi commented May 17, 2019

You're right, this was written while making the assumption that the user would know how to use perf (and this is the reason for which this PR is trying to give to them as much control as possible).

The difficulty with replicating perf parameters individually at the input of WA comes from how many subcommands there are, and how each has many flags. It becomes even more complicated to get right once we try to validate them as the simple availability of a flag (let alone the validity of the passed value) depends not only on the kernel version but also on the build flags of perf itself.

About the defaults, perf already provides a default behaviour for each subcommand (and default values for flags that require values e.g. event); isn't it confusing to add our own WA defaults, different from the perf defaults? Obviously, there's no default subcommand (running perf is equivalent to perf --help) but having one would barely make sense, in my opinion, because of how different they are.

I really like wa show as it's an easy way to figure out what to put in the agenda without having to go through documentation (or source code!). For this tool, it would give the meaning of force_install, pre_commands, commands, post_commands and define the formats they are expected to be in and how these relate to what ends up being run on the device. Of course, because we can't clearly define what goes inside of the values of the pairs in the commands dicts (see above), there's little we can tell about them in the output of wa show. In fact, as you say, I intend the help of this instrument to end with a RTFM-ish recommendation to refer to perf --help. Documentation of devlib isn't necessary for knowing how to use this instrument, though.

In the end, if the issue is with wa create perf, couldn't we simply have a default commands (probably calling stat -a -- sleep 1000 as in the previous implementation)? I haven't looked at how create was implemented so this could be hard to do (e.g. if it simply takes all default values of the Parameters).

@setrofim
Copy link
Contributor

The difficulty with replicating perf parameters individually at the input of WA comes from how many subcommands there are, and how each has many flags. It becomes even more complicated to get right once we try to validate them as the simple availability of a flag (let alone the validity of the passed value) depends not only on the kernel version but also on the build flags of perf itself.

Yes, I would not suggests trying to handle everything in perf that way -- that would ultimately be futile. Instead I would consider some higher-level use cases for why someone might want to use pref (e.g. "I want to get PMU counter values for the execution of this workload) and create a high level interface for that, which would work, say, 90% of the time. Then allow it to be augmented with the pass-though configuration you have at the moment, which will be the fallback for when the simple setup fails (or when the user that knows what they are doing wants to do something different).

But the pass-though configuration really should be treated as a fall back, not as the intended primary was of using the instrument in the "common case".

In the end, if the issue is with wa create perf, couldn't we simply have a default commands (probably calling stat -a -- sleep 1000 as in the previous implementation)? I haven't looked at how create was implemented so this could be hard to do (e.g. if it simply takes all default values of the Parameters).

Yes, that's pretty much how create command works, hence exposing parameters with sensible defaults is important to usability.

@ptosi
Copy link
Contributor Author

ptosi commented May 17, 2019

But the pass-though configuration really should be treated as a fall back, not as the intended primary was of using the instrument in the "common case".

That's feasible ... and would make everyone happy.

So this means effectively merging the current implementation of the instrument (from master) with this new one? We would have the "clear and easy-to-use" parameters for stat (with nice default values, allowing to easily create agendas) and the (pre_| |post_)commands (defaulting to None) for advanced usage (which includes any subcommands other than stat) with the "old" parameters being ignored once commands is not None?

This approach could then also allow backward compatibility (instead of what I'm doing right now) and is very easy to integrate as the constructor could simply build a command dictionary from these parameters and the rest of the execution flow would remain unchanged!

@setrofim
Copy link
Contributor

Yes, that's the sort of thing I had in mind!

@ptosi ptosi force-pushed the add-generic-perf branch 2 times, most recently from 9c28d74 to 55b0e5d Compare May 29, 2019 12:29
Introduce an implementation of the PerfInstrument that is more generic
than the previous one and which is expected to be able to handle all
potential calls to perf (irrespective of the subcommand, flags, options
or arguments being used) but which maintains backward compatibility with
the previous implementation, targeting perf-stat.
@ptosi ptosi force-pushed the add-generic-perf branch from 55b0e5d to 9464597 Compare June 4, 2019 13:24
Add tests with parser inputs (i.e. perf stat stdout outputs) and parser
outputs (i.e. arrays of WA metrics) for the `perf stat` parser of
PerfInstrument. This will be useful when modifying the code of the
parser, to verify its robustness.

NB: These tests are not exhaustive.
def setup(self, context):
self.collector.reset()
version = self.collector.execute('--version').strip()
context.update_metadata('versions', self.name, version)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is being logged in the right "metadata".

I've noticed the following (optional) idea that hasn't been addressed (from the original post):

The force_install flag is kept from the previous implementation. However, what is the "standard WA way" of providing the perf binary? We could write the path in the agenda (e.g. for comparing versions of perf), we could have a collection of perf binaries in a standard location and pick the ideal one (e.g. based on kernel version of the target) from within the WA perf instrument, we could (but this requires automating the automation, WA, and feels messy) have WA get the binary from a standard location on the host and overwrite that one between runs... Ideally, it would be useful to have both control of which binaries are being used and automation for finding the optimal version (as I believe the perf-kernel interface evolves with the kernel).

Based on ARM-software/devlib#395 and ARM-software/devlib#396, this may be useful but might be complicated to implement(?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the correct metadata.

WA has a standard mechanism for "resource resolution" that allows providing alternative versions of resources via various means.

https://workload-automation.readthedocs.io/en/latest/developer_information.html#dynamic-resource-resolution

See how dhrystone workload obtains its executable for example:

https://github.com/ARM-software/workload-automation/blob/master/wa/workloads/dhrystone/__init__.py#L78

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But dhrystone is picking its binaries based on the ABI which is pretty easy to check with an ELF reader; what I was considering was to get the version of perf which (AFAIK) can only be obtained by running the binaries themselves ... which requires something able to run them. Another approach would require storing this information in the file system (e.g. in the name).

Let's drop this idea due to its complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But dhrystone is picking its binaries based on the ABI which is pretty easy to check with an ELF reader;

Actually, it's picking the binary based on an arbitrary string, which happens to be the ABI -- we're not checking the ELF header, just resolving based on directory structure. So the same can be done for perf (it's basically what you're saying about encoding in the name, except we use directories rather than modifying the file name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, my bad for not looking further through the call stack of the code you shared!

I believe this is a good idea. However, this feature isn't required for using perf as a single version is enough in almost all cases. As users of WA-perf seem to have been satisfied with the single version approach until now, let's keep this feature for a potential future PR, once we are certain it will be necessary and used!

A (name, value) tuple for the matched counter (value is 0 if an
error occurred).
"""
name = f'{classifiers["label"]}_{match["name"]}'.replace(' ', '_')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per earlier discussion, we'd like to continue supporting Python 3.5 for the foreseeable future in order to avoid forcing potentially painful migrations on existing WA users. Because of that, 3.6 features, such as format strings, should not be used in WA code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what the conclusion of ARM-software/devlib#389 was; I have changed the f-strings to format calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that discussion kinda petered out without a concrete conclusion. However, I assume that earlier issue regarding support for Ubuntu 16.04 stands, and since there is nothing essential in 3.6, we're going to stick with 3.5 for the next release, at least.

the comment while ``'comment_units'`` holds the rest of the comment
(typically the units). Only available for the events for which
``perf`` added a comment.
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we get this documentation to render to HTML? Appending it to the class docstring doesn't really feel right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this in the description class attribute of the plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ptosi ptosi changed the title [RFC] Add generic perf Instrument Add generic perf Instrument Jun 25, 2019
Copy link
Contributor

@marcbonnici marcbonnici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently if an error occurs with the perf command, for example a misspelt event, then no error is displayed to the user and instead relies on examining the results / the raw output from perf to see what went wrong. Some form of error checking should be added to alert the user if there was a problem during the invocation of perf.

'PerfInstrument',
]

DEFAULT_EVENTS = ['migration', 'cs']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration default event should be migrations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ptosi
Copy link
Contributor Author

ptosi commented Jul 1, 2019

I've added logging of the artefacts extracted from the device.

@ptosi
Copy link
Contributor Author

ptosi commented Jul 9, 2019

Due to the complexity of this implementation and the resulting instability of the solution I have experienced, combined with the difficulties of integrating it with the WA framework, this PR will be closed in favour of an upcoming one, built on top of the Python scripts provided for simpleperf. Concerning perf, #971 may be enough to cover the needs of the users.

@ptosi ptosi closed this Jul 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants