You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,21 +18,21 @@ bibliography: paper.bib
18
18
19
19
# Introduction
20
20
21
-
Graph clustering algorithms aim to divide the nodes of a graph into one or more groups or clusters based on some measure of similarity. There are many different ways to define a cluster depending on the goal of the analysis. In graph cluster analysis the usual definition of a cluster is a group of nodes that are relatively densely interconnected but have few connections towards the nodes outside the cluster.
21
+
Graph clustering algorithms aim to divide the nodes of a graph into one or more clusters based on some measure of similarity. There are many different ways to define a cluster depending on the goal of the analysis. In graph cluster analysis the usual definition of a cluster is a group of nodes that are relatively densely interconnected but have few connections towards nodes outside of the cluster.
22
22
23
-
For an overview of graph clustering see [@Schaeffer:2007]. The more developed field within graph clustering is global clustering. Global algorithms are allowed to look at the whole graph while calculating its clusters. Well known global clustering algorithms include [@Blondel:2008], [@Dongen:2000], [@Newman:2004], [@Newman:2006], [@Pons:2006] and [@Raghavan:2007].
23
+
For an overview of graph clustering algorithms see [@Schaeffer:2007]. The more developed field within graph clustering is global clustering. Global algorithms are allowed to look at the whole graph while calculating its clusters. Well known global clustering algorithms include [@Blondel:2008], [@Dongen:2000], [@Newman:2004], [@Newman:2006], [@Pons:2006] and [@Raghavan:2007].
24
24
25
-
On the other hand, local clustering algorithms get a small set of source nodes (typically a single node) as input and calculate the cluster they belong to in the graph. While doing so, local algorithms are only allowed to look at the already visited nodes of the graph and their neighbours.
25
+
Local clustering algorithms get a small set of source nodes (typically a single node) as input and calculate the cluster they belong to in the graph. While doing so, local algorithms are only allowed to look at the already visited nodes of the graph and their neighbours.
26
26
27
27
Graph cluster analysis is used in a wide variety of fields. This project does not target one specific field, instead it aims to be a general tool for graph cluster analysis for cases where global cluster analysis is not applicable or practical for example because of the size of the data set or because a different (local) perspective is required.
28
28
29
29
# Algorithm
30
30
31
-
This Python project implements the Hermina-Janos local clustering algorithm and its hierarchical variation. The algorithms are independent of the used cluster definition, instead they define an [interface](src/localclustering/definitions/base.py) cluster definitions must implement. One such cluster definition, a simple connectivity based one, is available as part of the project and it was used to generate the example result below as well as all other results that can be found in the repository.
31
+
This Python project implements the (hierarchical) Hermina-Janos local clustering algorithm. The algorithm is independent of the used cluster definition, instead it defines an [interface](src/localclustering/definitions/base.py)which cluster definitions must implement. One such cluster definition, a simple connectivity based one, is available as part of the project and it was used to generate the example figure as well as all other results that can be found in the repository.
32
32
33
33

34
34
35
-
The following sections provide a high-level overview of the algorithms and cluster definitions. For more details and analysis, please see the [algorithm description](documents/algorithm.rst) and [IPython notebook](documents/Algorithm%20Analysis%20with%20the%20Spotify%20Related%20Artists%20Graph.ipynb) that are provided as part of the project.
35
+
The following sections provide a high-level overview of the algorithm and cluster definition. For more details and analysis, please see the [algorithm description](documents/algorithm.rst) and [IPython notebook](documents/Algorithm%20Analysis%20with%20the%20Spotify%20Related%20Artists%20Graph.ipynb) that are provided as part of the project.
36
36
37
37
## Local clustering algorithm
38
38
@@ -49,8 +49,8 @@ The hierarchical version of the Hermina-Janos local clustering algorithm extends
49
49
50
50
Similarly to the base algorithm, the hierarchical Hermina-Janos algorithm is also an iterative process with the following two steps:
51
51
52
-
1. Local clustering step: use the Hermina-Janos local clustering algorithm with the current configuration of the used cluster definition to calculate the cluster.
53
-
2. Cluster definition relaxation step: this is a highly cluster definition-dependent step where the algorithm adjusts or relaxes the cluster definition's parameters so in the next iteration the local clustering algorithm will be able to further extend the cluster.
52
+
1. Local clustering step: Use the Hermina-Janos local clustering algorithm with the current configuration of the used cluster definition to calculate the cluster.
53
+
2. Cluster definition relaxation step: This is a highly cluster definition-dependent step where the algorithm adjusts or relaxes the cluster definition's parameters so in the next iteration the local clustering algorithm will be able to further extend the cluster.
54
54
55
55
## Cluster definitions
56
56
@@ -63,9 +63,9 @@ Furthermore, for a cluster definition to be hierarchical, it must be able to adj
63
63
64
64
### Connectivity based cluster definition
65
65
66
-
The connectivity based cluster definition is the default cluster definition implementation in this project that also happens to be a hierarchical one.
66
+
The connectivity based cluster definition is the default (hierarchical) cluster definition implementation in this project.
67
67
68
-
The cluster definition broadly works the following way:
68
+
The cluster definition broadly works in the following way:
69
69
70
70
1. It calculates the *quality difference* the node provides or would provide for the cluster.
71
71
2. It calculates the minimum quality difference - the *threshold* - to compare the quality difference to.
@@ -96,7 +96,7 @@ A component for recording the steps the algorithms have taken is also provided.
96
96
97
97
# Ideas for future work
98
98
99
-
- Reimplementation for parallel computing: Most of the calculations the algorithms make (the only exception being the actual cluster update) can be executed in parallel, that could significantly improve performace.
99
+
- Reimplementation for parallel computing: Most of the calculations the algorithms make (the only exception being the actual cluster update) can be executed in parallel, which could significantly improve performance.
100
100
- New cluster definitions: Only one cluster definition is provided in the project. More cluster definitions can be implemented for example by building on cluster quality metrics such as modularity [@Newman:2004].
101
101
- Analysis of how cluster definitions should be configured for graphs with different characteristics.
102
102
- Result comparison with global clustering algorithms on well-known and -analyzed graphs such as the Zachary karate club [@Zachary:1977].
0 commit comments