-
Notifications
You must be signed in to change notification settings - Fork 3
Add LookupDict for faster and simpler lookups #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6451d14
to
809ad38
Compare
809ad38
to
d2c5d5f
Compare
I converted all I think that performance-wise this will not do anything, other than lazy loading the dicts that were previously created at the start. But I think it makes the code more readable with a common interface to access and search objects like assets and steps. It also reduces the code lines. And the LookupDict can also be used in other places like the mal-simulator and also supports getting objects by arbitrary fields if a need arises in the future. I haven't added tests for the LookupDict. Do you think I should? Can you help with this? |
I also think that it makes sense to merge file_utils.py into utils.py. |
logger.debug('Looking up node with id %s', node_id) | ||
return self._id_to_node.get(node_id) | ||
|
||
def get_node_by_full_name(self, full_name: str) -> Optional[AttackGraphNode]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally liked the methods get_node_by_id and get_node_by_full_name. They made it clear what attributes you could fetch a node by. Now the user would do attack_graph.nodes.fetch(attr, value)
without knowing which attributes are supported. Or are all attributes of nodes supported? If so, could we still have the get_node_by_x and use the LookupDict in those? Or at least have AttackGraph.get_node_by_attribute_value(attr: str, value: Any)
so the user does not have to go through attack_graph.nodes.fetch()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All valid attributes are supported. Removing those methods is what this PR does. Having them back and then going through the lookup dict undoes this.
I don't like the get methods as:
- they are too java-ish
- they take up lines of code for no reason
- whenever you need to get an item by another attribute that is not yet implemented you have to add new methods. With the lookup dict you can use whatever attribute you like whenever you like.
- they don't support comparisons other than equals
- each one needs its own unit test
- they make a bug more probable
- they make refactorings and optimizations harder as changes need to be made all over the code
- they don't support plural as in fetch all that match
- they require to bloat the class with as many private attributes.
As long as the user understands what fetch() or lookup() do, they won't have difficulty in reading the code. They are also cognitively more light as you don't have to read and read long lines, long method names etc. You read nodes.fetch(key, value)
and you understand the same thing as "get node by key" but via a single consistent API that works across any object and that you read and learn once as compared to reading 10 different methods per class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with removing the get_by_id and get_by_full_name, but we should still have one method in attack graph which uses the LookupDict so users don't have to interact with it directly.
As long as the user understands what fetch() or lookup() do, they won't have difficulty in reading the code
Perhaps, but I think having a method that wraps around nodes.fetch() called something like attack_graph.get_nodes_by_attribute_value(attribute: str, value: str)
would be easier to find/understand. And from the AttackGraphs perspective, that is what the LookupDict.fetch does.
Just to illustrate my thinking:
From the users perspective if we have no method in AttackGraph:
- They have an AttackGraph and want to fetch a node with a certain full_name
- They might look for a method in the attack graph that does this
- They will not find it, so they will think that they need to implement something themselves
- They look at attackgraph.nodes and realize it is a LookupDict
- They try to understand how the LookupDict works and find the .fetch()
- They use the .fetch('full_name', x)
If we instead had a method attack_graph.get_nodes_by_attribute_value(attribute: str, value: str)
in the attack_graph (which uses the LookupDict.fetch), they would find it more easily.
I am skeptical of making the API less explicit by having common methods in instance variables (.nodes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where documentation helps, with examples how you do things. And IDE completion is also useful here, it will show a fetch or lookup method with full docstrings and type hints.
I do not favor adding such a method. This code:
get_node_by_attribute_value(attribute, value)
I think does not add much compared to:
nodes.fetch(attribute, value)
especially once you have seen, read and used it once.
nodes
is LookupDict
and it supports an interface. If it were a dict it would support a different one (a subset). Do you think of adding wrappers around the dict.get() method or the dict.remove() method in that case? If not, why do it now? It is a custom API that not everyone is familiar with but so is the rest of the codebase and people have to study it first. It's not some hard piece of code, just that you use nodes.fetch
to get by an attribute.
Also, adding a get_ method would provide users with two ways to do the same thing which is not ideal.
A further motivation for doing this PR was that this LookupDict will be used elsewhere too, not just for nodes. Nodes is just the first application of this. For example, in model we have get_asset_by_id
and get_asset_by_id
. These can be removed both just by using a LookupDict. The interface will be one the users already now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would add wrappers around the dicts if they were dicts. That is how it worked before with get_node_by_id and get_node_by_full_name works like this I believe (the dicts were private variables of the AttackGraph).
I agree the LookupDict could be used in more places. But I mean that in the AttackGraph it has only one use, which is to fetch nodes by attribute values / key values, which is the context the method (name) would add.
But I don't seem to be able to convince you, so maybe I will just back off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a different programming style. I believe the changes will increase the quality of the python code, but that may not be the case at all. As a maintainer, you have a greater say, but I would hate to write back that kind of code. Maybe you want to add these changes or maybe @andrewbwm can help break the deadlock? But many of the changes I propose are of this kind, removing things that I find verbose and not pythonic and accepting some but not others will lead to an inconsistent style in the code. My mistake is that I didn't sync with you first before embarking on this. I was driven by a desire to make the code run faster and to do that it helps to have it simplified and more "linear".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is it. I don't think it is easy to say who is right or wrong, we just prioritize different things. Yes, maybe @andrewbwm can chime in to give some third perspective!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewbwm 🔔 or 🔕 ?
1167192
to
42af1b6
Compare
This is a rebased version of #95. See comments there for the state at this particular point in time.