Skip to content

Slow, consumes too much and leaks memory #82

@goblin

Description

@goblin

Current Behavior

Using a simple test program to add 100 000 nodes to a graph in a fresh ArangoDB database is very slow (3 minutes) and consumes way too much RAM (350 MB) compared to regular NetworkX (which takes 0.38 seconds and consumes 60MB, storing all nodes in memory rather than in a database on the disk). Also, the more nodes I'm adding, the more memory it uses, signifying memory leaks.

Expected Behavior

According to https://arangodb.com/introducing-the-arangodb-networkx-persistence-layer/, it should:

Handle Big Graphs Without Breaking a Sweat

Store massive graphs that would otherwise overwhelm memory in NetworkX, thanks to ArangoDBʼs ability to scale up.

... which definitely should NEVER use more memory than NetworkX. It could be a little bit slower due to the DB overhead, but not by such a huge factor (for instance SQLite3 can easily insert 100 000 rows in less than a second).

Steps to Reproduce

Save the attached tiny scripts in a directory (say ~/t), create new databases in arangosh using:

db._createDatabase('testdb1')
db._createDatabase('testdb2')

and then run:

(v) user@box:~/t$ export DATABASE_HOST=http://127.0.0.1:8529
(v) user@box:~/t$ export DATABASE_PASSWORD=root
(v) user@box:~/t$ export DATABASE_USERNAME=root
(v) user@box:~/t$ export DATABASE_NAME=testdb1
(v) user@box:~/t$ /usr/bin/time -v ./test_nx_arango.py 
[18:25:57 +0100] [INFO]: NetworkX-cuGraph is unavailable: No module named 'cupy'.
[18:25:57 +0100] [INFO]: Graph 'MyGraph1' created.
100000
	Command being timed: "./test_nx_arango.py"
	User time (seconds): 143.44
	System time (seconds): 7.35
	Percent of CPU this job got: 84%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:58.27
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 348476
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 230
	Minor (reclaiming a frame) page faults: 83816
	Voluntary context switches: 201036
	Involuntary context switches: 2283
	Swaps: 0
	File system inputs: 66888
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

(v) user@box:~/t$ export DATABASE_NAME=testdb2
(v) user@box:~/t$ /usr/bin/time -v ./test_nx_arango_batch.py 
[18:29:50 +0100] [INFO]: NetworkX-cuGraph is unavailable: No module named 'cupy'.
[18:29:50 +0100] [INFO]: Graph 'MyGraph1' created.
100000
	Command being timed: "./test_nx_arango_batch.py"
	User time (seconds): 144.68
	System time (seconds): 6.37
	Percent of CPU this job got: 85%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:57.05
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 355252
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 85774
	Voluntary context switches: 199885
	Involuntary context switches: 2611
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

(v) user@box:~/t$ /usr/bin/time -v ./test_nx.py 
100000
	Command being timed: "./test_nx.py"
	User time (seconds): 0.38
	System time (seconds): 0.07
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.46
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 59880
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 14032
	Voluntary context switches: 1
	Involuntary context switches: 11
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Observe much larger Maximum resident set size and way more time than test_nx.py, which uses pure NetworkX.

Environment

OS: Debian 12.9
Python version: 3.11.2
NetworkX version: 3.4
NetworkX-ArangoDB version: 1.3.0
NetworkX-cuGraph version (if applicable): N/A
ArangoDB version: 3.12.4-1

I've also used GNU time 1.9-0.2 from a Debian package time, rather than the bash builtin, to show memory usage.

Additional context

See attached files, or just paste them right in:

test_scripts.tar.gz

test_nx_arango.py:

#! /usr/bin/env python3

import nx_arangodb as nxa
G = nxa.Graph(name='MyGraph1')
for i in range(100_000):
    G.add_node(f'node_{i}')
print(G.number_of_nodes())

test_nx_arango_batch.py:

#! /usr/bin/env python3

import nx_arangodb as nxa
G = nxa.Graph(name='MyGraph1')
G.add_nodes_from([ f'node_{i}' for i in range(100_000) ])
print(G.number_of_nodes())

test_nx.py:

#! /usr/bin/env python3

import networkx as nx
G = nx.Graph(name='MyGraph1')
for i in range(100_000):
    G.add_node(f'node_{i}')
print(G.number_of_nodes())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions