|
5 | 5 | "id": "94f8164e-537f-41a1-bfc4-ed15c7b00cf8", |
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | | - "# Modularity formula\n", |
| 8 | + "# Modularity\n", |
| 9 | + "## Modularity formula\n", |
9 | 10 | "\n", |
10 | 11 | "Modularity is a quantitative metric used to evaluate the strength of a network's division into modules (or communities). It measures how well the network is partitioned by comparing the density of edges within communities to the expected density of such edges in a randomized network that preserves the original degree distribution. The formula for modularity is given below.\n", |
11 | 12 | "\n", |
|
36 | 37 | }, |
37 | 38 | { |
38 | 39 | "cell_type": "code", |
39 | | - "execution_count": 2, |
| 40 | + "execution_count": 1, |
40 | 41 | "id": "a4766cc0-3157-493f-a072-9fd87ed92519", |
41 | 42 | "metadata": {}, |
42 | 43 | "outputs": [ |
|
100 | 101 | }, |
101 | 102 | { |
102 | 103 | "cell_type": "code", |
103 | | - "execution_count": 4, |
| 104 | + "execution_count": 2, |
104 | 105 | "id": "eccd7d47-1606-4fc8-ac0c-31ad49604f0b", |
105 | 106 | "metadata": {}, |
106 | 107 | "outputs": [ |
|
127 | 128 | "metadata": {}, |
128 | 129 | "source": [ |
129 | 130 | "### Modularity calculation\n", |
130 | | - "### Computation for \"good\" partitioning ($P_{good}$)\n", |
| 131 | + "### Computation for \"good\" partitioning ($P_\\text{good}$)\n", |
131 | 132 | "\n", |
132 | 133 | "This partition correctly identifies the two cliques.\n", |
133 | 134 | "\n", |
|
154 | 155 | "\n", |
155 | 156 | "**Final modularity** ($Q_{good}$):\n", |
156 | 157 | "$Q_{good} = \\frac{1}{2m} \\times (\\text{Total Sum}) = \\frac{1}{14} \\times 5 = \\frac{5}{14} \\approx \\mathbf{0.357}$\n", |
| 158 | + "\n" |
| 159 | + ] |
| 160 | + }, |
| 161 | + { |
| 162 | + "cell_type": "code", |
| 163 | + "execution_count": 3, |
| 164 | + "id": "b6de50ca-146b-4323-9828-80d883d2d8e9", |
| 165 | + "metadata": {}, |
| 166 | + "outputs": [ |
| 167 | + { |
| 168 | + "data": { |
| 169 | + "text/plain": [ |
| 170 | + "0.3571428571428571" |
| 171 | + ] |
| 172 | + }, |
| 173 | + "execution_count": 3, |
| 174 | + "metadata": {}, |
| 175 | + "output_type": "execute_result" |
| 176 | + } |
| 177 | + ], |
| 178 | + "source": [ |
| 179 | + "membership_good = [0, 0, 0, 1, 1, 1]\n", |
| 180 | + "g.modularity(membership_good)" |
| 181 | + ] |
| 182 | + }, |
| 183 | + { |
| 184 | + "cell_type": "markdown", |
| 185 | + "id": "8bdbc1dc-5c26-4b33-b5f4-072fee61fc7f", |
| 186 | + "metadata": {}, |
| 187 | + "source": [ |
157 | 188 | "\n", |
158 | | - "\n", |
159 | | - "### 2. Computation for \"bad\" partitioning ($P_{bad}$)\n", |
| 189 | + "### Computation for \"bad\" partitioning ($P_\\text{bad}$)\n", |
160 | 190 | "\n", |
161 | 191 | "This partition incorrectly splits a clique and merges nodes from both communities.\n", |
162 | 192 | "\n", |
|
186 | 216 | "$Q_{bad} = \\frac{1}{2m} \\times (\\text{Total Sum}) = \\frac{1}{14} \\times (-3) = -\\frac{3}{14} \\approx \\mathbf{-0.214}$" |
187 | 217 | ] |
188 | 218 | }, |
| 219 | + { |
| 220 | + "cell_type": "code", |
| 221 | + "execution_count": 4, |
| 222 | + "id": "894ddb6b-25d5-4060-be09-5758f2d3db45", |
| 223 | + "metadata": {}, |
| 224 | + "outputs": [ |
| 225 | + { |
| 226 | + "data": { |
| 227 | + "text/plain": [ |
| 228 | + "-0.2142857142857143" |
| 229 | + ] |
| 230 | + }, |
| 231 | + "execution_count": 4, |
| 232 | + "metadata": {}, |
| 233 | + "output_type": "execute_result" |
| 234 | + } |
| 235 | + ], |
| 236 | + "source": [ |
| 237 | + "membership_bad = [0, 0, 1, 0, 1, 1]\n", |
| 238 | + "g.modularity(membership_bad)" |
| 239 | + ] |
| 240 | + }, |
189 | 241 | { |
190 | 242 | "cell_type": "markdown", |
191 | 243 | "id": "f3981bc5-f6bf-4b41-a168-cae9c17ec764", |
192 | 244 | "metadata": {}, |
193 | 245 | "source": [ |
194 | 246 | "*Note:* Based on our previous analysis, the \"good\" partitioning yields a significantly higher modularity score. It is important to note, however, that a high modularity score is not always a definitive indicator of a better community partitioning, as was previously demonstrated with the Grid Graph [here](test_significance_of_community.ipynb).\n", |
195 | 247 | "\n", |
196 | | - "# Directed Modularity\n", |
| 248 | + "## Directed modularity\n", |
197 | 249 | "\n", |
198 | 250 | "While the classic modularity formula works for undirected networks, a different approach is needed for **directed networks**, where edges have a specific direction (e.g., from node *i* to node *j*). In this context, the direction of an edge is crucial and should not be ignored.\n", |
199 | 251 | "\n", |
|
213 | 265 | "* **Flipping a single edge:** Reversing a single edge (e.g., from A → B to B → A) will change the modularity score. This is because the out-degree of A and the in-degree of B would change, altering the null model's calculation and, consequently, the overall score.\n", |
214 | 266 | "* **Flipping all edges:** If you reverse the direction of **every single edge** in the network, the modularity score will **remain the same**. This is due to a symmetry property of the formula. The set of in-degrees becomes the new set of out-degrees, and vice versa. When the formula is applied to this completely reversed network, the total modularity score is unchanged. This is a fascinating property of directed modularity.\n", |
215 | 267 | "\n", |
216 | | - "# From directed to undirected formula\n", |
| 268 | + "## From directed to undirected formula\n", |
217 | 269 | "Start with the directed formula:\n", |
218 | 270 | "\n", |
219 | 271 | "$$Q = \\frac{1}{m} \\sum_{i,j} \\left[ A_{ij} - \\gamma \\frac{k_i^\\text{out} k_j^\\text{in}}{m} \\right] \\delta(c_i, c_j)$$\n", |
|
232 | 284 | "\n", |
233 | 285 | "\n", |
234 | 286 | "\n", |
235 | | - "# Why the resolution parameter is important\n", |
| 287 | + "## Why the resolution parameter is important\n", |
236 | 288 | "\n", |
237 | 289 | "The resolution parameter addresses a fundamental limitation of the original modularity measure, known as the **\"resolution limit\"**. This is the tendency of the original formula (where $\\gamma=1$) to fail at detecting small communities, especially in large graphs. It often merges smaller, distinct communities into a single larger one to maximize the modularity score.\n", |
238 | 290 | "\n", |
|
242 | 294 | "* $\\gamma < 1$: Decreasing the resolution parameter reduces the penalty. This allows the algorithm to find **more and smaller communities**, as it becomes easier for closely-knit groups to be identified as their own communities.\n", |
243 | 295 | "\n", |
244 | 296 | "In essence, the resolution parameter provides a flexible way to explore the community structure of a network at different scales, moving beyond the limitations of a single, fixed-scale partition.\n", |
245 | | - "# Density-based modularity for undirected graphs\n", |
| 297 | + "\n", |
| 298 | + "## Density-based modularity for undirected graphs\n", |
246 | 299 | "\n", |
247 | 300 | "While modularity is a powerful metric, it suffers from a well-known flaw called the **resolution limit**. This problem causes the modularity-maximizing algorithm to fail to detect small, tightly-knit communities, especially in large networks. Instead of finding these small groups, it often merges them into a single larger one to maximize the modularity score.\n", |
248 | 301 | "\n", |
|
270 | 323 | "\n", |
271 | 324 | "In this formulation, the null model assumes **uniform edge probability**, so communities are favored if their **internal density** is higher than the global density.\n" |
272 | 325 | ] |
273 | | - }, |
274 | | - { |
275 | | - "cell_type": "code", |
276 | | - "execution_count": null, |
277 | | - "id": "1b9c3fa0-05b7-461f-acf8-8348f344a2d9", |
278 | | - "metadata": {}, |
279 | | - "outputs": [], |
280 | | - "source": [] |
281 | 326 | } |
282 | 327 | ], |
283 | 328 | "metadata": { |
|
0 commit comments