-
Couldn't load subscription status.
- Fork 4
Description
Current Behavior
Using a simple test program to add 100 000 nodes to a graph in a fresh ArangoDB database is very slow (3 minutes) and consumes way too much RAM (350 MB) compared to regular NetworkX (which takes 0.38 seconds and consumes 60MB, storing all nodes in memory rather than in a database on the disk). Also, the more nodes I'm adding, the more memory it uses, signifying memory leaks.
Expected Behavior
According to https://arangodb.com/introducing-the-arangodb-networkx-persistence-layer/, it should:
Handle Big Graphs Without Breaking a Sweat
Store massive graphs that would otherwise overwhelm memory in NetworkX, thanks to ArangoDBʼs ability to scale up.
... which definitely should NEVER use more memory than NetworkX. It could be a little bit slower due to the DB overhead, but not by such a huge factor (for instance SQLite3 can easily insert 100 000 rows in less than a second).
Steps to Reproduce
Save the attached tiny scripts in a directory (say ~/t), create new databases in arangosh using:
db._createDatabase('testdb1')
db._createDatabase('testdb2')and then run:
(v) user@box:~/t$ export DATABASE_HOST=http://127.0.0.1:8529
(v) user@box:~/t$ export DATABASE_PASSWORD=root
(v) user@box:~/t$ export DATABASE_USERNAME=root
(v) user@box:~/t$ export DATABASE_NAME=testdb1
(v) user@box:~/t$ /usr/bin/time -v ./test_nx_arango.py
[18:25:57 +0100] [INFO]: NetworkX-cuGraph is unavailable: No module named 'cupy'.
[18:25:57 +0100] [INFO]: Graph 'MyGraph1' created.
100000
Command being timed: "./test_nx_arango.py"
User time (seconds): 143.44
System time (seconds): 7.35
Percent of CPU this job got: 84%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:58.27
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 348476
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 230
Minor (reclaiming a frame) page faults: 83816
Voluntary context switches: 201036
Involuntary context switches: 2283
Swaps: 0
File system inputs: 66888
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
(v) user@box:~/t$ export DATABASE_NAME=testdb2
(v) user@box:~/t$ /usr/bin/time -v ./test_nx_arango_batch.py
[18:29:50 +0100] [INFO]: NetworkX-cuGraph is unavailable: No module named 'cupy'.
[18:29:50 +0100] [INFO]: Graph 'MyGraph1' created.
100000
Command being timed: "./test_nx_arango_batch.py"
User time (seconds): 144.68
System time (seconds): 6.37
Percent of CPU this job got: 85%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:57.05
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 355252
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 85774
Voluntary context switches: 199885
Involuntary context switches: 2611
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
(v) user@box:~/t$ /usr/bin/time -v ./test_nx.py
100000
Command being timed: "./test_nx.py"
User time (seconds): 0.38
System time (seconds): 0.07
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.46
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 59880
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 14032
Voluntary context switches: 1
Involuntary context switches: 11
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Observe much larger Maximum resident set size and way more time than test_nx.py, which uses pure NetworkX.
Environment
OS: Debian 12.9
Python version: 3.11.2
NetworkX version: 3.4
NetworkX-ArangoDB version: 1.3.0
NetworkX-cuGraph version (if applicable): N/A
ArangoDB version: 3.12.4-1
I've also used GNU time 1.9-0.2 from a Debian package time, rather than the bash builtin, to show memory usage.
Additional context
See attached files, or just paste them right in:
test_nx_arango.py:
#! /usr/bin/env python3
import nx_arangodb as nxa
G = nxa.Graph(name='MyGraph1')
for i in range(100_000):
G.add_node(f'node_{i}')
print(G.number_of_nodes())test_nx_arango_batch.py:
#! /usr/bin/env python3
import nx_arangodb as nxa
G = nxa.Graph(name='MyGraph1')
G.add_nodes_from([ f'node_{i}' for i in range(100_000) ])
print(G.number_of_nodes())test_nx.py:
#! /usr/bin/env python3
import networkx as nx
G = nx.Graph(name='MyGraph1')
for i in range(100_000):
G.add_node(f'node_{i}')
print(G.number_of_nodes())