diff --git a/.gitignore b/.gitignore index a11e1fb..c9d9ee8 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,16 @@ +*.edgelist +*.idx2node +intermediate/ +.idea +marvel_* +tempGraph.* +ES_EVENT_LOG_v11.* +py2test/ +py3test/ + +# THIS FILE IS GENERATED FROM GEM SETUP.PY +gem/version.py + # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] @@ -101,4 +114,4 @@ gem/intermediate/* *.json # images -*.png \ No newline at end of file +*.png diff --git a/README.md b/README.md index e826956..7af3e95 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ GEM implements the following graph embedding techniques: * [Laplacian Eigenmaps](http://yeolab.weebly.com/uploads/2/5/5/0/25509700/belkin_laplacian_2003.pdf) * [Locally Linear Embedding](http://www.robots.ox.ac.uk/~az/lectures/ml/lle.pdf) * [Graph Factorization](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40839.pdf) -* [Higher-Prder Proximity preserved Embedding (HOPE)](http://www.kdd.org/kdd2016/papers/files/rfp0184-ouA.pdf) +* [Higher-Order Proximity preserved Embedding (HOPE)](http://www.kdd.org/kdd2016/papers/files/rfp0184-ouA.pdf) * [Structural Deep Network Embedding (SDNE)](http://www.kdd.org/kdd2016/papers/files/rfp0191-wangAemb.pdf) * [node2vec](http://www.kdd.org/kdd2016/papers/files/rfp0218-groverA.pdf) @@ -28,39 +28,79 @@ The graphs are saved using `nx.write_gpickle` in the networkx format and can be * **gem/c_ext**: Python interface for source files in c_src using [Boost.Python](http://www.boost.org/doc/libs/1_64_0/libs/python/doc/html/index.html) ## Dependencies -GEM is tested to work on Python 2.7. +GEM is tested to work on Python 2.7 and Python 3.6 The required dependencies are: Numpy >= 1.12.0, SciPy >= 0.19.0, Networkx >= 1.11, Scikit-learn >= 0.18.1. To run SDNE, GEM requires Theano >= 0.9.0 and Keras = 2.0.2. +In case of Python 3, make sure it was compiled with `./configure --enable-shared`, and that you have `/usr/local/bin/python` in your `LD_LIBRARY_PATH` + ## Install The package uses setuptools, which is a common way of installing python modules. To install in your home directory, use: - +```bash python setup.py install --user +``` To install for all users on Unix/Linux: - +```bash sudo python setup.py install +``` -## Usage -Run Graph Factorization on Karate graph and evaluate it on graph reconstruction: - - from gem.embedding.gf import GraphFactorization as gf - from gem.evaluation import evaluate_graph_reconstruction as gr - from gem.utils import graph_util - - # Instatiate the embedding method with hyperparameters - em = gf(2, 100000, 1*10**-4, 1.0) - - # Load graph - graph = graph_util.loadGraphFromEdgeListTxt('gem/data/karate.edgelist') +You also can use `python3` instead of `python` +## Usage +Run the methods on Karate graph and evaluate them on graph reconstruction: + +```python +import matplotlib.pyplot as plt + +from gem.utils import graph_util, plot_util +from gem.evaluation import visualize_embedding as viz +from gem.evaluation import evaluate_graph_reconstruction as gr +from time import time + +from gem.embedding.gf import GraphFactorization +from gem.embedding.hope import HOPE +from gem.embedding.lap import LaplacianEigenmaps +from gem.embedding.lle import LocallyLinearEmbedding +from gem.embedding.node2vec import node2vec +from gem.embedding.sdne import SDNE + +# File that contains the edges. Format: source target +# Optionally, you can add weights as third column: source target weight +edge_f = 'gem/data/karate.edgelist' +# Specify whether the edges are directed +isDirected = True + +# Load graph +G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=isDirected) +G = G.to_directed() + +models = [] +# You can comment out the methods you don't want to run +models.append(GraphFactorization(2, 100000, 1*10**-4, 1.0)) +models.append(HOPE(4, 0.01)) +models.append(LaplacianEigenmaps(2)) +models.append(LocallyLinearEmbedding(2)) +models.append(node2vec(2, 1, 80, 10, 10, 1, 1)) +models.append(SDNE(d=2, beta=5, alpha=1e-5, nu1=1e-6, nu2=1e-6, K=3,n_units=[50, 15,], rho=0.3, n_iter=50, xeta=0.01,n_batch=500, + modelfile=['./intermediate/enc_model.json', './intermediate/dec_model.json'], + weightfile=['./intermediate/enc_weights.hdf5', './intermediate/dec_weights.hdf5'])) + +for embedding in models: + print ('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())) + t1 = time() # Learn embedding - accepts a networkx graph or file with edge list - Y, t = em.learn_embedding(graph, edge_f=None, is_weighted=True, no_python=True) - + Y, t = embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True) + print (embedding._method_name+':\n\tTraining time: %f' % (time() - t1)) # Evaluate on graph reconstruction - MAP, prec_curv = gr.evaluateStaticGraphReconstruction(graph, em, Y, None) + MAP, prec_curv = gr.evaluateStaticGraphReconstruction(G, embedding, Y, None) + # Visualize + viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None) + plt.show() +``` + ## Cite @article{goyal2017graph, diff --git a/gem/c_exe/readme.txt b/gem/c_exe/readme.txt index efefa78..fc117d9 100644 --- a/gem/c_exe/readme.txt +++ b/gem/c_exe/readme.txt @@ -1,2 +1,3 @@ 1. Recompile from https://github.com/snap-stanford/snap and copy node2vec executable to this folder -2. To grant executable permission, run: chmod +x ./c_exe/node2vec \ No newline at end of file +2. To grant executable permission, run: chmod +x ./c_exe/node2vec + diff --git a/gem/embedding/gf.py b/gem/embedding/gf.py index 82188ed..e4c6ed1 100644 --- a/gem/embedding/gf.py +++ b/gem/embedding/gf.py @@ -1,6 +1,6 @@ disp_avlbl = True -from os import environ -if 'DISPLAY' not in environ: +import os +if 'DISPLAY' not in os.environ: disp_avlbl = False import matplotlib matplotlib.use('Agg') @@ -12,15 +12,16 @@ import sys sys.path.append('./') +sys.path.append(os.path.realpath(__file__)) -from static_graph_embedding import StaticGraphEmbedding +from .static_graph_embedding import StaticGraphEmbedding from gem.utils import graph_util, plot_util from gem.evaluation import visualize_embedding as viz from time import time class GraphFactorization(StaticGraphEmbedding): - def __init__(self, d, max_iter, eta, regu): + def __init__(self, d, max_iter, eta, regu, print_step=10000): ''' Initialize the GraphFactorization class Args: @@ -28,12 +29,14 @@ def __init__(self, d, max_iter, eta, regu): eta: learning rate of sgd regu: regularization coefficient of magnitude of weights max_iter: max iterations in sgd + print_step: #iterations to log the prgoress (step%print_step) ''' self._d = d self._eta = eta self._regu = regu self._max_iter = max_iter self._method_name = 'graph_factor_sgd' + self._print_step = print_step def get_method_name(self): return self._method_name @@ -43,7 +46,7 @@ def get_method_summary(self): def _get_f_value(self, graph): f1 = 0 - for i, j, w in graph.edges_iter(data='weight', default=1): + for i, j, w in graph.edges(data='weight', default=1): f1 += (w - np.dot(self._X[i, :], self._X[j, :]))**2 f2 = self._regu*(np.linalg.norm(self._X)**2) return [f1, f2, f1+f2] @@ -56,7 +59,7 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python= try: from c_ext import graphFac_ext except: - print 'Could not import C++ module for Graph Factorization. Reverting to python implementation. Please recompile graphFac_ext from graphFac.cpp using bjam' + print('Could not import C++ module for Graph Factorization. Reverting to python implementation. Please recompile graphFac_ext from graphFac.cpp using bjam') c_flag = False if c_flag: if edge_f: @@ -65,7 +68,7 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python= is_weighted = True edge_f = 'tempGraph.graph' t1 = time() - graphFac_ext.learn_embedding(edge_f, "tempGraphGF.emb", True, is_weighted, self._d, self._eta, self._regu, self._max_iter) + graphFac_ext.learn_embedding(edge_f, "tempGraphGF.emb", True, is_weighted, self._d, self._eta, self._regu, self._max_iter) self._X = graph_util.loadEmbedding('tempGraphGF.emb') t2 = time() return self._X, (t2-t1) @@ -76,11 +79,11 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python= self._node_num = graph.number_of_nodes() self._X = 0.01*np.random.randn(self._node_num, self._d) for iter_id in range(self._max_iter): - if not iter_id%100: + if not iter_id%self._print_step: [f1, f2, f] = self._get_f_value(graph) - print '\t\tIter id: %d, Objective value: %g, f1: %g, f2: %g' % (iter_id, f, f1, f2) + print('\t\tIter id: %d, Objective value: %g, f1: %g, f2: %g' % (iter_id, f, f1, f2)) tempFlag = False - for i, j, w in graph.edges_iter(data='weight', default=1): + for i, j, w in graph.edges(data='weight', default=1): if j <= i: continue delPhi = -(w - np.dot(self._X[i, :], self._X[j, :]))*self._X[j, :] + self._regu*self._X[i, :] @@ -114,11 +117,11 @@ def get_reconstructed_adj(self, X=None, node_l=None): G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=False) G = G.to_directed() res_pre = 'results/testKarate' - print 'Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()) + print ('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())) t1 = time() embedding = GraphFactorization(2, 100000, 1*10**-4, 1.0) embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True) - print 'Graph Factorization:\n\tTraining time: %f' % (time() - t1) + print ('Graph Factorization:\n\tTraining time: %f' % (time() - t1)) viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None) plt.show() diff --git a/gem/embedding/hope.py b/gem/embedding/hope.py index 565097d..aa356a0 100644 --- a/gem/embedding/hope.py +++ b/gem/embedding/hope.py @@ -1,6 +1,6 @@ disp_avlbl = True -from os import environ -if 'DISPLAY' not in environ: +import os +if 'DISPLAY' not in os.environ: disp_avlbl = False import matplotlib matplotlib.use('Agg') @@ -16,8 +16,9 @@ import sys sys.path.append('./') +sys.path.append(os.path.realpath(__file__)) -from static_graph_embedding import StaticGraphEmbedding +from .static_graph_embedding import StaticGraphEmbedding from gem.utils import graph_util, plot_util from gem.evaluation import visualize_embedding as viz @@ -51,7 +52,7 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python= M_l = self._beta*A S = np.dot(np.linalg.inv(M_g), M_l) - u, s, vt = lg.svds(S, k=self._d/2) + u, s, vt = lg.svds(S, k=self._d//2) X1 = np.dot(u, np.diag(np.sqrt(s))) X2 = np.dot(vt.T, np.diag(np.sqrt(s))) t2 = time() @@ -59,7 +60,7 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python= p_d_p_t = np.dot(u, np.dot(np.diag(s), vt)) eig_err = np.linalg.norm(p_d_p_t - S) - print 'SVD error (low rank): %f' % eig_err + print('SVD error (low rank): %f' % eig_err) # p_d_p_t = np.dot(self._X, np.dot(w[1:self._d+1, 1:self._d+1], self._X.T)) # eig_err = np.linalg.norm(p_d_p_t - L_sym) @@ -70,7 +71,7 @@ def get_embedding(self): return self._X def get_edge_weight(self, i, j): - return np.dot(self._X[i, :self._d/2], self._X[j, self._d/2:]) + return np.dot(self._X[i, :self._d//2], self._X[j, self._d//2:]) def get_reconstructed_adj(self, X=None, node_l=None): if X is not None: @@ -92,11 +93,11 @@ def get_reconstructed_adj(self, X=None, node_l=None): G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=False) G = G.to_directed() res_pre = 'results/testKarate' - print 'Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()) + print('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())) t1 = time() embedding = HOPE(4, 0.01) embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True) - print 'HOPE:\n\tTraining time: %f' % (time() - t1) + print('HOPE:\n\tTraining time: %f' % (time() - t1)) viz.plot_embedding2D(embedding.get_embedding()[:, :2], di_graph=G, node_colors=None) plt.show() diff --git a/gem/embedding/lap.py b/gem/embedding/lap.py index 1ffd9d1..2818cd7 100644 --- a/gem/embedding/lap.py +++ b/gem/embedding/lap.py @@ -1,6 +1,6 @@ disp_avlbl = True -from os import environ -if 'DISPLAY' not in environ: +import os +if 'DISPLAY' not in os.environ: disp_avlbl = False import matplotlib matplotlib.use('Agg') @@ -16,8 +16,9 @@ import sys sys.path.append('./') +sys.path.append(os.path.realpath(__file__)) -from static_graph_embedding import StaticGraphEmbedding +from .static_graph_embedding import StaticGraphEmbedding from gem.utils import graph_util, plot_util from gem.evaluation import visualize_embedding as viz @@ -95,11 +96,11 @@ def get_reconstructed_adj(self, X=None, node_l=None): G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=False) G = G.to_directed() res_pre = 'results/testKarate' - print 'Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()) + print('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())) t1 = time() embedding = LaplacianEigenmaps(2) embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True) - print 'Laplacian Eigenmaps:\n\tTraining time: %f' % (time() - t1) + print('Laplacian Eigenmaps:\n\tTraining time: %f' % (time() - t1)) viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None) plt.show() diff --git a/gem/embedding/lle.py b/gem/embedding/lle.py index 3b92dd9..06d0cc5 100644 --- a/gem/embedding/lle.py +++ b/gem/embedding/lle.py @@ -1,6 +1,6 @@ disp_avlbl = True -from os import environ -if 'DISPLAY' not in environ: +import os +if 'DISPLAY' not in os.environ: disp_avlbl = False import matplotlib matplotlib.use('Agg') @@ -18,8 +18,9 @@ import sys sys.path.append('./') +sys.path.append(os.path.realpath(__file__)) -from static_graph_embedding import StaticGraphEmbedding +from .static_graph_embedding import StaticGraphEmbedding from gem.utils import graph_util, plot_util from gem.evaluation import visualize_embedding as viz @@ -79,11 +80,11 @@ def get_reconstructed_adj(self, X=None, node_l=None): G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=False) G = G.to_directed() res_pre = 'results/testKarate' - print 'Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()) + print('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())) t1 = time() embedding = LocallyLinearEmbedding(2) embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True) - print 'Graph Factorization:\n\tTraining time: %f' % (time() - t1) + print('Graph Factorization:\n\tTraining time: %f' % (time() - t1)) viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None) plt.show() diff --git a/gem/embedding/node2vec.py b/gem/embedding/node2vec.py index bb27ae8..2f74f03 100644 --- a/gem/embedding/node2vec.py +++ b/gem/embedding/node2vec.py @@ -1,6 +1,6 @@ disp_avlbl = True -from os import environ -if 'DISPLAY' not in environ: +import os +if 'DISPLAY' not in os.environ: disp_avlbl = False import matplotlib matplotlib.use('Agg') @@ -16,9 +16,11 @@ import sys sys.path.append('./') +sys.path.append(os.path.dirname(os.path.realpath(__file__))) + from subprocess import call -from static_graph_embedding import StaticGraphEmbedding +from .static_graph_embedding import StaticGraphEmbedding from gem.utils import graph_util, plot_util from gem.evaluation import visualize_embedding as viz @@ -43,7 +45,7 @@ def get_method_summary(self): return '%s_%d' % (self._method_name, self._d) def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python=False): - args = ["./c_exe/node2vec"] + args = ["gem/c_exe/node2vec"] if not graph and not edge_f: raise Exception('graph/edge_f needed') if edge_f: @@ -64,7 +66,8 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python= t1 = time() try: call(args) - except: + except Exception as e: + print(str(e)) raise Exception('./node2vec not found. Please compile snap, place node2vec in the path and grant executable permission') self._X = graph_util.loadEmbedding('tempGraph.emb') t2 = time() @@ -96,11 +99,11 @@ def get_reconstructed_adj(self, X=None, node_l=None): G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=False) G = G.to_directed() res_pre = 'results/testKarate' - print 'Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()) + print('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())) t1 = time() embedding = node2vec(2, 1, 80, 10, 10, 1, 1) embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True) - print 'node2vec:\n\tTraining time: %f' % (time() - t1) + print('node2vec:\n\tTraining time: %f' % (time() - t1)) viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None) plt.show() diff --git a/gem/embedding/sdne.py b/gem/embedding/sdne.py index aadd67d..3e15160 100644 --- a/gem/embedding/sdne.py +++ b/gem/embedding/sdne.py @@ -1,6 +1,6 @@ disp_avlbl = True -from os import environ -if 'DISPLAY' not in environ: +import os +if 'DISPLAY' not in os.environ: disp_avlbl = False import matplotlib matplotlib.use('Agg') @@ -11,11 +11,12 @@ import sys sys.path.append('./') +sys.path.append(os.path.realpath(__file__)) -from static_graph_embedding import StaticGraphEmbedding +from .static_graph_embedding import StaticGraphEmbedding from gem.utils import graph_util, plot_util from gem.evaluation import visualize_embedding as viz -from sdne_utils import * +from .sdne_utils import * from keras.layers import Input, Dense, Lambda, merge from keras.models import Model, model_from_json @@ -95,7 +96,7 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python= else: S = graph_util.transform_DiGraph_to_adj(graph) if not np.allclose(S.T, S): - print "SDNE only works for symmetric graphs! Making the graph symmetric" + print("SDNE only works for symmetric graphs! Making the graph symmetric") t1 = time() S = (S + S.T)/2 # enforce S is symmetric S -= np.diag(np.diag(S)) # enforce diagonal = 0 @@ -125,9 +126,13 @@ def learn_embedding(self, graph=None, edge_f=None, is_weighted=False, no_python= [x_hat1, y1] = self._autoencoder(x1) [x_hat2, y2] = self._autoencoder(x2) # Outputs - x_diff1 = merge([x_hat1, x1], mode=lambda (a,b): a - b, output_shape=lambda L: L[1]) - x_diff2 = merge([x_hat2, x2], mode=lambda (a,b): a - b, output_shape=lambda L: L[1]) - y_diff = merge([y2, y1], mode=lambda (a,b): a - b, output_shape=lambda L: L[1]) + #x_diff1 = merge([x_hat1, x1], mode=lambda (a,b): a - b, output_shape=lambda L: L[1]) + #x_diff2 = merge([x_hat2, x2], mode=lambda (a,b): a - b, output_shape=lambda L: L[1]) + #y_diff = merge([y2, y1], mode=lambda (a,b): a - b, output_shape=lambda L: L[1]) + + x_diff1 = merge([x_hat1, x1], mode=lambda ab: ab[0] - ab[1], output_shape=lambda L: L[1]) + x_diff2 = merge([x_hat2, x2], mode=lambda ab: ab[0] - ab[1], output_shape=lambda L: L[1]) + y_diff = merge([y2, y1], mode=lambda ab: ab[0] - ab[1], output_shape=lambda L: L[1]) # Objectives def weighted_mse_x(y_true, y_pred): @@ -155,11 +160,12 @@ def weighted_mse_y(y_true, y_pred): # InData format: [x1, x2] # OutData format: [b1, b2, s12, deg1, deg2] data_chunk_size = 100000 + print("\nnode num: {}\n".format(self._node_num)) InData = np.zeros((data_chunk_size, 2*self._node_num)) OutData = np.zeros((data_chunk_size, 2*self._node_num + 3)) # Train the model for epoch_num in range(self._num_iter): - print 'EPOCH %d/%d' % (epoch_num, self._num_iter) + print('EPOCH %d/%d' % (epoch_num, self._num_iter)) e = 0 k = 0 for i in range(self._node_num): @@ -201,7 +207,7 @@ def weighted_mse_y(y_true, y_pred): return self._Y, (t2-t1) - def get_embedding(self, filesuffix): + def get_embedding(self, filesuffix=None): return self._Y if filesuffix is None else np.loadtxt('embedding_'+filesuffix+'.txt') @@ -254,11 +260,11 @@ def get_reconst_from_embed(self, embed, node_l=None, filesuffix=None): G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=False) G = G.to_directed() res_pre = 'results/testKarate' - print 'Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges()) + print('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())) t1 = time() embedding = SDNE(d=2, beta=5, alpha=1e-5, nu1=1e-6, nu2=1e-6, K=3, n_units=[50, 15,], rho=0.3, n_iter=50, xeta=0.01, n_batch=500, modelfile=['./intermediate/enc_model.json', './intermediate/dec_model.json'], weightfile=['./intermediate/enc_weights.hdf5', './intermediate/dec_weights.hdf5']) embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True) - print 'SDNE:\n\tTraining time: %f' % (time() - t1) + print('SDNE:\n\tTraining time: %f' % (time() - t1)) viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None) - plt.show() \ No newline at end of file + plt.show() diff --git a/gem/evaluation/evaluate_graph_reconstruction.py b/gem/evaluation/evaluate_graph_reconstruction.py index 4245d34..c186a9e 100644 --- a/gem/evaluation/evaluate_graph_reconstruction.py +++ b/gem/evaluation/evaluate_graph_reconstruction.py @@ -14,10 +14,10 @@ def evaluateStaticGraphReconstruction(digraph, graph_embedding, estimated_adj = graph_embedding.get_reconstructed_adj(X_stat, node_l) else: estimated_adj = graph_embedding.get_reconstructed_adj(X_stat, file_suffix, node_l) - + predicted_edge_list = evaluation_util.getEdgeListFromAdjMtx(estimated_adj, is_undirected=is_undirected, edge_pairs=eval_edge_pairs) MAP = metrics.computeMAP(predicted_edge_list, digraph) prec_curv, _ = metrics.computePrecisionCurve(predicted_edge_list, digraph) - return (MAP, prec_curv) \ No newline at end of file + return (MAP, prec_curv) diff --git a/gem/evaluation/metrics.py b/gem/evaluation/metrics.py index e04b485..5ab0cc4 100644 --- a/gem/evaluation/metrics.py +++ b/gem/evaluation/metrics.py @@ -13,7 +13,7 @@ def computePrecisionCurve(predicted_edge_list, true_digraph, max_k=-1): precision_scores = [] delta_factors = [] correct_edge = 0 - for i in xrange(max_k): + for i in range(max_k): if true_digraph.has_edge(sorted_edges[i][0], sorted_edges[i][1]): correct_edge += 1 delta_factors.append(1.0) @@ -25,13 +25,13 @@ def computePrecisionCurve(predicted_edge_list, true_digraph, max_k=-1): def computeMAP(predicted_edge_list, true_digraph, max_k=-1): node_num = true_digraph.number_of_nodes() node_edges = [] - for i in xrange(node_num): + for i in range(node_num): node_edges.append([]) for (st, ed, w) in predicted_edge_list: node_edges[st].append((st, ed, w)) node_AP = [0.0] * node_num count = 0 - for i in xrange(node_num): + for i in range(node_num): if true_digraph.out_degree(i) == 0: continue count += 1 @@ -75,4 +75,4 @@ def getNodeAnomaly(X_dyn): node_anom = np.zeros((n_nodes, T-1)) for t in range(T-1): node_anom[:, t] = np.linalg.norm(X_dyn[t+1][:n_nodes, :] - X_dyn[t][:n_nodes, :], axis = 1) - return node_anom \ No newline at end of file + return node_anom diff --git a/gem/evaluation/visualize_embedding.py b/gem/evaluation/visualize_embedding.py index 47677a8..9df5997 100644 --- a/gem/evaluation/visualize_embedding.py +++ b/gem/evaluation/visualize_embedding.py @@ -5,7 +5,7 @@ def plot_embedding2D(node_pos, node_colors=None, di_graph=None): node_num, embedding_dimension = node_pos.shape if(embedding_dimension > 2): - print "Embedding dimensiion greater than 2, use tSNE to reduce it to 2" + print("Embedding dimensiion greater than 2, use tSNE to reduce it to 2") model = TSNE(n_components=2) node_pos = model.fit_transform(node_pos) @@ -15,7 +15,7 @@ def plot_embedding2D(node_pos, node_colors=None, di_graph=None): else: # plot using networkx with edge structure pos = {} - for i in xrange(node_num): + for i in range(node_num): pos[i] = node_pos[i, :] if node_colors: nx.draw_networkx_nodes(di_graph, pos, node_color=node_colors, width=0.1, node_size=100, arrows=False, alpha=0.8, font_size=5) diff --git a/gem/utils/evaluation_util.py b/gem/utils/evaluation_util.py index ca121c4..aeeeffd 100644 --- a/gem/utils/evaluation_util.py +++ b/gem/utils/evaluation_util.py @@ -23,8 +23,8 @@ def getEdgeListFromAdjMtx(adj, threshold=0.0, is_undirected=True, edge_pairs=Non if adj[st, ed] >= threshold: result.append((st, ed, adj[st, ed])) else: - for i in xrange(node_num): - for j in xrange(node_num): + for i in range(node_num): + for j in range(node_num): if(j == i): continue if(is_undirected and i >= j): @@ -48,4 +48,4 @@ def splitDiGraphToTrainTest(di_graph, train_ratio, is_undirected=True): train_digraph.remove_edge(st, ed) if(is_undirected): train_digraph.remove_edge(ed, st) - return (train_digraph, test_digraph) \ No newline at end of file + return (train_digraph, test_digraph) diff --git a/gem/utils/graph_util.py b/gem/utils/graph_util.py index ae1710c..3cf3893 100644 --- a/gem/utils/graph_util.py +++ b/gem/utils/graph_util.py @@ -1,4 +1,5 @@ -import cPickle as pickle +try: import cPickle as pickle +except: import pickle import numpy as np import networkx as nx import random @@ -9,7 +10,7 @@ def transform_DiGraph_to_adj(di_graph): n = di_graph.number_of_nodes() adj = np.zeros((n ,n)) - for st, ed, w in di_graph.edges_iter(data='weight', default=1): + for st, ed, w in di_graph.edges(data='weight', default=1): adj[st, ed] = w return adj @@ -31,7 +32,7 @@ def sample_graph(di_graph, n_sampled_nodes=None): node_l_inv = {v: k for k, v in enumerate(node_l)} sampled_graph = nx.DiGraph() sampled_graph.add_nodes_from(range(n_sampled_nodes)) - for st, ed, w in di_graph.edges_iter(data='weight', default=1): + for st, ed, w in di_graph.edges(data='weight', default=1): try: v_i = node_l_inv[st] v_j = node_l_inv[ed] @@ -88,26 +89,26 @@ def addNodeAnomalies(di_graphs, p, k): # pdb.set_trace() di_graphs[t].add_edges_from(itertools.product(list(anomalous_nodes), range(n_nodes))) di_graphs[t].add_edges_from(itertools.product(range(n_nodes), list(anomalous_nodes))) - print 'Nodes: %d, Edges: %d' % (di_graphs[t].number_of_nodes(), di_graphs[t].number_of_edges()) + print('Nodes: %d, Edges: %d' % (di_graphs[t].number_of_nodes(), di_graphs[t].number_of_edges())) return anomaly_time_steps def saveGraphToEdgeListTxt(graph, file_name): with open(file_name, 'w') as f: f.write('%d\n' % graph.number_of_nodes()) f.write('%d\n' % graph.number_of_edges()) - for i, j, w in graph.edges_iter(data='weight', default=1): + for i, j, w in graph.edges(data='weight', default=1): f.write('%d %d %f\n' % (i, j, w)) def saveGraphToEdgeListTxtn2v(graph, file_name): with open(file_name, 'w') as f: - for i, j, w in graph.edges_iter(data='weight', default=1): + for i, j, w in graph.edges(data='weight', default=1): f.write('%d %d %f\n' % (i, j, w)) def loadGraphFromEdgeListTxt(file_name, directed=True): with open(file_name, 'r') as f: - n_nodes = f.readline() - f.readline() # Discard the number of edges + #n_nodes = f.readline() + #f.readline() # Discard the number of edges if directed: G = nx.DiGraph() else: @@ -183,4 +184,4 @@ def saveDynamicSBmGraph(file_perfix, dynamic_graphs): node_infos = {} node_infos['community'] = dynamic_graphs[i][1] node_infos['perturbation'] = dynamic_graphs[i][2] - pickle.dump(node_infos, fp) \ No newline at end of file + pickle.dump(node_infos, fp) diff --git a/test.py b/test.py new file mode 100644 index 0000000..6e5998e --- /dev/null +++ b/test.py @@ -0,0 +1,51 @@ +import matplotlib.pyplot as plt +from time import time + +from gem.utils import graph_util, plot_util +from gem.evaluation import visualize_embedding as viz +from gem.evaluation import evaluate_graph_reconstruction as gr + +from gem.embedding.gf import GraphFactorization +from gem.embedding.hope import HOPE +from gem.embedding.lap import LaplacianEigenmaps +from gem.embedding.lle import LocallyLinearEmbedding +from gem.embedding.node2vec import node2vec +from gem.embedding.sdne import SDNE + + +# File that contains the edges. Format: source target +# Optionally, you can add weights as third column: source target weight +edge_f = 'gem/data/TEST_50M.edgelist' +# Specify whether the edges are directed +isDirected = True + +# Load graph +G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=isDirected) +G = G.to_directed() + +models = [] +# You can comment out the methods you don't want to run +models.append(GraphFactorization(2, 50000, 1*10**-4, 1.0)) +models.append(HOPE(4, 0.01)) +models.append(LaplacianEigenmaps(2)) +models.append(LocallyLinearEmbedding(2)) +models.append(node2vec(2, 1, 80, 10, 10, 1, 1)) +models.append(SDNE(d=2, beta=5, alpha=1e-5, nu1=1e-6, nu2=1e-6, K=3,n_units=[50, 15,], rho=0.3, n_iter=50, xeta=0.01,n_batch=500, + modelfile=['./intermediate/enc_model.json', './intermediate/dec_model.json'], + weightfile=['./intermediate/enc_weights.hdf5', './intermediate/dec_weights.hdf5'])) + + +for embedding in models: + print ('Num nodes: %d, num edges: %d' % (G.number_of_nodes(), G.number_of_edges())) + t1 = time() + # Learn embedding - accepts a networkx graph or file with edge list + Y, t = embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True) + print (embedding._method_name+':\n\tTraining time: %f' % (time() - t1)) + # Evaluate on graph reconstruction + MAP, prec_curv = gr.evaluateStaticGraphReconstruction(G, embedding, Y, None) + #--------------------------------------------------------------------------------- + print(("\tMAP: {} \t preccision curve: {}\n\n\n\n"+'-'*100).format(MAP,prec_curv)) + #--------------------------------------------------------------------------------- + # Visualize + viz.plot_embedding2D(embedding.get_embedding(), di_graph=G, node_colors=None) + plt.show()