Add LookupDict for faster and simpler lookups #105

andrewbwm · 2025-01-24T18:44:18Z

This is a rebased version of #95. See comments there for the state at this particular point in time.

nkakouros · 2025-02-11T21:23:15Z

I converted all get_by methods that made sense. The attacker list is left as a list and then there is the get_attacker_by_id method. I think it makes sense to convert the list to a dict and loop with .values() so that that method goes away too.

I think that performance-wise this will not do anything, other than lazy loading the dicts that were previously created at the start. But I think it makes the code more readable with a common interface to access and search objects like assets and steps. It also reduces the code lines. And the LookupDict can also be used in other places like the mal-simulator and also supports getting objects by arbitrary fields if a need arises in the future.

I haven't added tests for the LookupDict. Do you think I should? Can you help with this?

nkakouros · 2025-02-11T21:23:50Z

I also think that it makes sense to merge file_utils.py into utils.py.

maltoolbox/attackgraph/attackgraph.py

maltoolbox/utils.py

mrkickling · 2025-02-13T11:45:35Z

maltoolbox/attackgraph/attackgraph.py

-        logger.debug('Looking up node with id %s', node_id)
-        return self._id_to_node.get(node_id)
-
-    def get_node_by_full_name(self, full_name: str) -> Optional[AttackGraphNode]:


I personally liked the methods get_node_by_id and get_node_by_full_name. They made it clear what attributes you could fetch a node by. Now the user would do attack_graph.nodes.fetch(attr, value) without knowing which attributes are supported. Or are all attributes of nodes supported? If so, could we still have the get_node_by_x and use the LookupDict in those? Or at least have AttackGraph.get_node_by_attribute_value(attr: str, value: Any) so the user does not have to go through attack_graph.nodes.fetch() ?

All valid attributes are supported. Removing those methods is what this PR does. Having them back and then going through the lookup dict undoes this.

I don't like the get methods as:

they are too java-ish

they take up lines of code for no reason

whenever you need to get an item by another attribute that is not yet implemented you have to add new methods. With the lookup dict you can use whatever attribute you like whenever you like.

they don't support comparisons other than equals

each one needs its own unit test

they make a bug more probable

they make refactorings and optimizations harder as changes need to be made all over the code

they don't support plural as in fetch all that match

they require to bloat the class with as many private attributes.

As long as the user understands what fetch() or lookup() do, they won't have difficulty in reading the code. They are also cognitively more light as you don't have to read and read long lines, long method names etc. You read nodes.fetch(key, value) and you understand the same thing as "get node by key" but via a single consistent API that works across any object and that you read and learn once as compared to reading 10 different methods per class.

I am ok with removing the get_by_id and get_by_full_name, but we should still have one method in attack graph which uses the LookupDict so users don't have to interact with it directly.

As long as the user understands what fetch() or lookup() do, they won't have difficulty in reading the code

Perhaps, but I think having a method that wraps around nodes.fetch() called something like attack_graph.get_nodes_by_attribute_value(attribute: str, value: str) would be easier to find/understand. And from the AttackGraphs perspective, that is what the LookupDict.fetch does.

Just to illustrate my thinking:

From the users perspective if we have no method in AttackGraph:

They have an AttackGraph and want to fetch a node with a certain full_name

They might look for a method in the attack graph that does this

They will not find it, so they will think that they need to implement something themselves

They look at attackgraph.nodes and realize it is a LookupDict

They try to understand how the LookupDict works and find the .fetch()

They use the .fetch('full_name', x)

If we instead had a method attack_graph.get_nodes_by_attribute_value(attribute: str, value: str) in the attack_graph (which uses the LookupDict.fetch), they would find it more easily.

I am skeptical of making the API less explicit by having common methods in instance variables (.nodes).

This is where documentation helps, with examples how you do things. And IDE completion is also useful here, it will show a fetch or lookup method with full docstrings and type hints.

I do not favor adding such a method. This code:

get_node_by_attribute_value(attribute, value)

I think does not add much compared to:

nodes.fetch(attribute, value)

especially once you have seen, read and used it once.

nodes is LookupDict and it supports an interface. If it were a dict it would support a different one (a subset). Do you think of adding wrappers around the dict.get() method or the dict.remove() method in that case? If not, why do it now? It is a custom API that not everyone is familiar with but so is the rest of the codebase and people have to study it first. It's not some hard piece of code, just that you use nodes.fetch to get by an attribute.

Also, adding a get_ method would provide users with two ways to do the same thing which is not ideal.

A further motivation for doing this PR was that this LookupDict will be used elsewhere too, not just for nodes. Nodes is just the first application of this. For example, in model we have get_asset_by_id and get_asset_by_id. These can be removed both just by using a LookupDict. The interface will be one the users already now.

Yes, I would add wrappers around the dicts if they were dicts. That is how it worked before with get_node_by_id and get_node_by_full_name works like this I believe (the dicts were private variables of the AttackGraph).

I agree the LookupDict could be used in more places. But I mean that in the AttackGraph it has only one use, which is to fetch nodes by attribute values / key values, which is the context the method (name) would add.

But I don't seem to be able to convince you, so maybe I will just back off.

We have a different programming style. I believe the changes will increase the quality of the python code, but that may not be the case at all. As a maintainer, you have a greater say, but I would hate to write back that kind of code. Maybe you want to add these changes or maybe @andrewbwm can help break the deadlock? But many of the changes I propose are of this kind, removing things that I find verbose and not pythonic and accepting some but not others will lead to an inconsistent style in the code. My mistake is that I didn't sync with you first before embarking on this. I was driven by a desire to make the code run faster and to do that it helps to have it simplified and more "linear".

Yes, that is it. I don't think it is easy to say who is right or wrong, we just prioritize different things. Yes, maybe @andrewbwm can chime in to give some third perspective!

@andrewbwm 🔔 or 🔕 ?

maltoolbox/utils.py

andrewbwm requested review from mrkickling and nkakouros January 24, 2025 18:44

andrewbwm added enhancement New feature or request optimization labels Jan 24, 2025

andrewbwm added this to the Future Planning milestone Jan 24, 2025

andrewbwm linked an issue Jan 24, 2025 that may be closed by this pull request

Optimize lookups with dicts instead of lists #50

Open

nkakouros mentioned this pull request Jan 27, 2025

Update for lang-graph rewrite (maltoolbox 0.2) mal-lang/mal-simulator#89

Merged

nkakouros force-pushed the lookup-proxy branch from 6451d14 to 809ad38 Compare January 29, 2025 17:59

nkakouros force-pushed the lookup-proxy branch from 809ad38 to d2c5d5f Compare February 11, 2025 21:15

nkakouros marked this pull request as ready for review February 11, 2025 21:16

mrkickling approved these changes Feb 12, 2025

View reviewed changes

mrkickling self-requested a review February 12, 2025 15:51

mrkickling reviewed Feb 12, 2025

View reviewed changes

maltoolbox/attackgraph/attackgraph.py Outdated Show resolved Hide resolved

mrkickling reviewed Feb 12, 2025

View reviewed changes

maltoolbox/attackgraph/attackgraph.py Show resolved Hide resolved

mrkickling reviewed Feb 12, 2025

View reviewed changes

maltoolbox/utils.py Show resolved Hide resolved

nkakouros reviewed Feb 12, 2025

View reviewed changes

maltoolbox/utils.py Outdated Show resolved Hide resolved

nkakouros mentioned this pull request Feb 13, 2025

Handle node and attacker id in the classes themselves #119

Closed

mrkickling reviewed Feb 13, 2025

View reviewed changes

maltoolbox/utils.py Outdated Show resolved Hide resolved

mrkickling reviewed Feb 13, 2025

View reviewed changes

maltoolbox/utils.py Outdated Show resolved Hide resolved

mrkickling reviewed Feb 13, 2025

View reviewed changes

maltoolbox/utils.py Outdated Show resolved Hide resolved

nkakouros mentioned this pull request Feb 17, 2025

Handle ids in classes themselves #120

Open

Add and use LookupDict

42af1b6

nkakouros force-pushed the lookup-proxy branch from 1167192 to 42af1b6 Compare February 17, 2025 12:51

Merge file_utils into utils

6593af6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LookupDict for faster and simpler lookups #105

Add LookupDict for faster and simpler lookups #105

Uh oh!

andrewbwm commented Jan 24, 2025

Uh oh!

nkakouros commented Feb 11, 2025

Uh oh!

nkakouros commented Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mrkickling Feb 13, 2025

Uh oh!

nkakouros Feb 13, 2025

Uh oh!

mrkickling Feb 19, 2025

Uh oh!

nkakouros Feb 19, 2025

Uh oh!

mrkickling Feb 19, 2025

Uh oh!

nkakouros Feb 19, 2025

Uh oh!

mrkickling Feb 20, 2025

Uh oh!

nkakouros Feb 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add LookupDict for faster and simpler lookups #105

Are you sure you want to change the base?

Add LookupDict for faster and simpler lookups #105

Uh oh!

Conversation

andrewbwm commented Jan 24, 2025

Uh oh!

nkakouros commented Feb 11, 2025

Uh oh!

nkakouros commented Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!