Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for Py3, Networkx 2, Keras 2, some fixes, Readme enhancements #12

Merged
merged 5 commits into from
Oct 25, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
*.edgelist
*.idx2node
intermediate/
.idea
marvel_*
tempGraph.*
ES_EVENT_LOG_v11.*
py2test/
py3test/

# THIS FILE IS GENERATED FROM GEM SETUP.PY
gem/version.py

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -101,4 +114,4 @@ gem/intermediate/*
*.json

# images
*.png
*.png
78 changes: 59 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ GEM implements the following graph embedding techniques:
* [Laplacian Eigenmaps](http://yeolab.weebly.com/uploads/2/5/5/0/25509700/belkin_laplacian_2003.pdf)
* [Locally Linear Embedding](http://www.robots.ox.ac.uk/~az/lectures/ml/lle.pdf)
* [Graph Factorization](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40839.pdf)
* [Higher-Prder Proximity preserved Embedding (HOPE)](http://www.kdd.org/kdd2016/papers/files/rfp0184-ouA.pdf)
* [Higher-Order Proximity preserved Embedding (HOPE)](http://www.kdd.org/kdd2016/papers/files/rfp0184-ouA.pdf)
* [Structural Deep Network Embedding (SDNE)](http://www.kdd.org/kdd2016/papers/files/rfp0191-wangAemb.pdf)
* [node2vec](http://www.kdd.org/kdd2016/papers/files/rfp0218-groverA.pdf)

Expand All @@ -28,39 +28,79 @@ The graphs are saved using `nx.write_gpickle` in the networkx format and can be
* **gem/c_ext**: Python interface for source files in c_src using [Boost.Python](http://www.boost.org/doc/libs/1_64_0/libs/python/doc/html/index.html)

## Dependencies
GEM is tested to work on Python 2.7.
GEM is tested to work on Python 2.7 and Python 3.6

The required dependencies are: Numpy >= 1.12.0, SciPy >= 0.19.0, Networkx >= 1.11, Scikit-learn >= 0.18.1.

To run SDNE, GEM requires Theano >= 0.9.0 and Keras = 2.0.2.

In case of Python 3, make sure it was compiled with `./configure --enable-shared`, and that you have `/usr/local/bin/python` in your `LD_LIBRARY_PATH`

## Install
The package uses setuptools, which is a common way of installing python modules. To install in your home directory, use:

```bash
python setup.py install --user
```

To install for all users on Unix/Linux:
```bash
sudo python setup.py install
```

## Usage
Run Graph Factorization on Karate graph and evaluate it on graph reconstruction:

from gem.embedding.gf import GraphFactorization as gf
from gem.evaluation import evaluate_graph_reconstruction as gr
from gem.utils import graph_util

# Instatiate the embedding method with hyperparameters
em = gf(2, 100000, 1*10**-4, 1.0)

# Load graph
graph = graph_util.loadGraphFromEdgeListTxt('gem/data/karate.edgelist')
You also can use `python3` instead of `python`

## Usage
Run the methods on Karate graph and evaluate them on graph reconstruction:

```python
import matplotlib.pyplot as plt

from gem.utils import graph_util, plot_util
from gem.evaluation import visualize_embedding as viz
from gem.evaluation import evaluate_graph_reconstruction as gr
from time import time

from gem.embedding.gf import GraphFactorization
from gem.embedding.hope import HOPE
from gem.embedding.lap import LaplacianEigenmaps
from gem.embedding.lle import LocallyLinearEmbedding
from gem.embedding.node2vec import node2vec
from gem.embedding.sdne import SDNE

# File that contains the edges. Format: source target
# Optionally, you can add weights as third column: source target weight
edge_f = 'gem/data/karate.edgelist'
# Specify whether the edges are directed
isDirected = True

# Load graph
G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=isDirected)
G = G.to_directed()

models = []
# You can comment out the methods you don't want to run
models.append(GraphFactorization(2, 100000, 1*10**-4, 1.0))
models.append(HOPE(4, 0.01))
models.append(LaplacianEigenmaps(2))
models.append(LocallyLinearEmbedding(2))
models.append(node2vec(2, 1, 80, 10, 10, 1, 1))
models.append(SDNE(d=2, beta=5, alpha=1e-5, nu1=1e-6, nu2=1e-6, K=3,n_units=[50, 15,], rho=0.3, n_iter=50, xeta=0.01,n_batch=500,
modelfile=['./intermediate/enc_model.json', './intermediate/dec_model.json'],
weightfile=['./intermediate/enc_weights.hdf5', './intermediate/dec_weights.hdf5']))

for embedding in models:
print ('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()))
t1 = time()
# Learn embedding - accepts a networkx graph or file with edge list
Y, t = em.learn_embedding(graph, edge_f=None, is_weighted=True, no_python=True)

Y, t = embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True)
print (embedding._method_name+':\n\tTraining time: %f' % (time() - t1))
# Evaluate on graph reconstruction
MAP, prec_curv = gr.evaluateStaticGraphReconstruction(graph, em, Y, None)
MAP, prec_curv = gr.evaluateStaticGraphReconstruction(G, embedding, Y, None)
# Visualize
viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None)
plt.show()
```


## Cite
@article{goyal2017graph,
Expand Down
3 changes: 2 additions & 1 deletion gem/c_exe/readme.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
1. Recompile from https://github.com/snap-stanford/snap and copy node2vec executable to this folder
2. To grant executable permission, run: chmod +x ./c_exe/node2vec
2. To grant executable permission, run: chmod +x ./c_exe/node2vec

27 changes: 15 additions & 12 deletions gem/embedding/gf.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
disp_avlbl = True
from os import environ
if 'DISPLAY' not in environ:
import os
if 'DISPLAY' not in os.environ:
disp_avlbl = False
import matplotlib
matplotlib.use('Agg')
Expand All @@ -12,28 +12,31 @@

import sys
sys.path.append('./')
sys.path.append(os.path.realpath(__file__))

from static_graph_embedding import StaticGraphEmbedding
from .static_graph_embedding import StaticGraphEmbedding
from gem.utils import graph_util, plot_util
from gem.evaluation import visualize_embedding as viz
from time import time

class GraphFactorization(StaticGraphEmbedding):

def __init__(self, d, max_iter, eta, regu):
def __init__(self, d, max_iter, eta, regu, print_step=10000):
''' Initialize the GraphFactorization class

Args:
d: dimension of the embedding
eta: learning rate of sgd
regu: regularization coefficient of magnitude of weights
max_iter: max iterations in sgd
print_step: #iterations to log the prgoress (step%print_step)
'''
self._d = d
self._eta = eta
self._regu = regu
self._max_iter = max_iter
self._method_name = 'graph_factor_sgd'
self._print_step = print_step

def get_method_name(self):
return self._method_name
Expand All @@ -43,7 +46,7 @@ def get_method_summary(self):

def _get_f_value(self, graph):
f1 = 0
for i, j, w in graph.edges_iter(data='weight', default=1):
for i, j, w in graph.edges(data='weight', default=1):
f1 += (w - np.dot(self._X[i, :], self._X[j, :]))**2
f2 = self._regu*(np.linalg.norm(self._X)**2)
return [f1, f2, f1+f2]
Expand All @@ -56,7 +59,7 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python=
try:
from c_ext import graphFac_ext
except:
print 'Could not import C++ module for Graph Factorization. Reverting to python implementation. Please recompile graphFac_ext from graphFac.cpp using bjam'
print('Could not import C++ module for Graph Factorization. Reverting to python implementation. Please recompile graphFac_ext from graphFac.cpp using bjam')
c_flag = False
if c_flag:
if edge_f:
Expand All @@ -65,7 +68,7 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python=
is_weighted = True
edge_f = 'tempGraph.graph'
t1 = time()
graphFac_ext.learn_embedding(edge_f, "tempGraphGF.emb", True, is_weighted, self._d, self._eta, self._regu, self._max_iter)
graphFac_ext.learn_embedding(edge_f, "tempGraphGF.emb", True, is_weighted, self._d, self._eta, self._regu, self._max_iter)
self._X = graph_util.loadEmbedding('tempGraphGF.emb')
t2 = time()
return self._X, (t2-t1)
Expand All @@ -76,11 +79,11 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python=
self._node_num = graph.number_of_nodes()
self._X = 0.01*np.random.randn(self._node_num, self._d)
for iter_id in range(self._max_iter):
if not iter_id%100:
if not iter_id%self._print_step:
[f1, f2, f] = self._get_f_value(graph)
print '\t\tIter id: %d, Objective value: %g, f1: %g, f2: %g' % (iter_id, f, f1, f2)
print('\t\tIter id: %d, Objective value: %g, f1: %g, f2: %g' % (iter_id, f, f1, f2))
tempFlag = False
for i, j, w in graph.edges_iter(data='weight', default=1):
for i, j, w in graph.edges(data='weight', default=1):
if j <= i:
continue
delPhi = -(w - np.dot(self._X[i, :], self._X[j, :]))*self._X[j, :] + self._regu*self._X[i, :]
Expand Down Expand Up @@ -114,11 +117,11 @@ def get_reconstructed_adj(self, X=None, node_l=None):
G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=False)
G = G.to_directed()
res_pre = 'results/testKarate'
print 'Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())
print ('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()))
t1 = time()
embedding = GraphFactorization(2, 100000, 1*10**-4, 1.0)
embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True)
print 'Graph Factorization:\n\tTraining time: %f' % (time() - t1)
print ('Graph Factorization:\n\tTraining time: %f' % (time() - t1))

viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None)
plt.show()
Expand Down
17 changes: 9 additions & 8 deletions gem/embedding/hope.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
disp_avlbl = True
from os import environ
if 'DISPLAY' not in environ:
import os
if 'DISPLAY' not in os.environ:
disp_avlbl = False
import matplotlib
matplotlib.use('Agg')
Expand All @@ -16,8 +16,9 @@

import sys
sys.path.append('./')
sys.path.append(os.path.realpath(__file__))

from static_graph_embedding import StaticGraphEmbedding
from .static_graph_embedding import StaticGraphEmbedding
from gem.utils import graph_util, plot_util
from gem.evaluation import visualize_embedding as viz

Expand Down Expand Up @@ -51,15 +52,15 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python=
M_l = self._beta*A
S = np.dot(np.linalg.inv(M_g), M_l)

u, s, vt = lg.svds(S, k=self._d/2)
u, s, vt = lg.svds(S, k=self._d//2)
X1 = np.dot(u, np.diag(np.sqrt(s)))
X2 = np.dot(vt.T, np.diag(np.sqrt(s)))
t2 = time()
self._X = np.concatenate((X1, X2), axis=1)

p_d_p_t = np.dot(u, np.dot(np.diag(s), vt))
eig_err = np.linalg.norm(p_d_p_t - S)
print 'SVD error (low rank): %f' % eig_err
print('SVD error (low rank): %f' % eig_err)

# p_d_p_t = np.dot(self._X, np.dot(w[1:self._d+1, 1:self._d+1], self._X.T))
# eig_err = np.linalg.norm(p_d_p_t - L_sym)
Expand All @@ -70,7 +71,7 @@ def get_embedding(self):
return self._X

def get_edge_weight(self, i, j):
return np.dot(self._X[i, :self._d/2], self._X[j, self._d/2:])
return np.dot(self._X[i, :self._d//2], self._X[j, self._d//2:])

def get_reconstructed_adj(self, X=None, node_l=None):
if X is not None:
Expand All @@ -92,11 +93,11 @@ def get_reconstructed_adj(self, X=None, node_l=None):
G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=False)
G = G.to_directed()
res_pre = 'results/testKarate'
print 'Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())
print('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()))
t1 = time()
embedding = HOPE(4, 0.01)
embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True)
print 'HOPE:\n\tTraining time: %f' % (time() - t1)
print('HOPE:\n\tTraining time: %f' % (time() - t1))

viz.plot_embedding2D(embedding.get_embedding()[:, :2], di_graph=G, node_colors=None)
plt.show()
Expand Down
11 changes: 6 additions & 5 deletions gem/embedding/lap.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
disp_avlbl = True
from os import environ
if 'DISPLAY' not in environ:
import os
if 'DISPLAY' not in os.environ:
disp_avlbl = False
import matplotlib
matplotlib.use('Agg')
Expand All @@ -16,8 +16,9 @@

import sys
sys.path.append('./')
sys.path.append(os.path.realpath(__file__))

from static_graph_embedding import StaticGraphEmbedding
from .static_graph_embedding import StaticGraphEmbedding
from gem.utils import graph_util, plot_util
from gem.evaluation import visualize_embedding as viz

Expand Down Expand Up @@ -95,11 +96,11 @@ def get_reconstructed_adj(self, X=None, node_l=None):
G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=False)
G = G.to_directed()
res_pre = 'results/testKarate'
print 'Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())
print('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()))
t1 = time()
embedding = LaplacianEigenmaps(2)
embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True)
print 'Laplacian Eigenmaps:\n\tTraining time: %f' % (time() - t1)
print('Laplacian Eigenmaps:\n\tTraining time: %f' % (time() - t1))

viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None)
plt.show()
Expand Down
11 changes: 6 additions & 5 deletions gem/embedding/lle.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
disp_avlbl = True
from os import environ
if 'DISPLAY' not in environ:
import os
if 'DISPLAY' not in os.environ:
disp_avlbl = False
import matplotlib
matplotlib.use('Agg')
Expand All @@ -18,8 +18,9 @@

import sys
sys.path.append('./')
sys.path.append(os.path.realpath(__file__))

from static_graph_embedding import StaticGraphEmbedding
from .static_graph_embedding import StaticGraphEmbedding
from gem.utils import graph_util, plot_util
from gem.evaluation import visualize_embedding as viz

Expand Down Expand Up @@ -79,11 +80,11 @@ def get_reconstructed_adj(self, X=None, node_l=None):
G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=False)
G = G.to_directed()
res_pre = 'results/testKarate'
print 'Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())
print('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()))
t1 = time()
embedding = LocallyLinearEmbedding(2)
embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True)
print 'Graph Factorization:\n\tTraining time: %f' % (time() - t1)
print('Graph Factorization:\n\tTraining time: %f' % (time() - t1))

viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None)
plt.show()
Expand Down
Loading