Skip to content

Commit 469e6f1

Browse files
authored
Merge pull request #2 from DannyArends/master
Changes to the paper text
2 parents 1a8d5c6 + 0a169bd commit 469e6f1

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

paper.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,21 +18,21 @@ bibliography: paper.bib
1818

1919
# Introduction
2020

21-
Graph clustering algorithms aim to divide the nodes of a graph into one or more groups or clusters based on some measure of similarity. There are many different ways to define a cluster depending on the goal of the analysis. In graph cluster analysis the usual definition of a cluster is a group of nodes that are relatively densely interconnected but have few connections towards the nodes outside the cluster.
21+
Graph clustering algorithms aim to divide the nodes of a graph into one or more clusters based on some measure of similarity. There are many different ways to define a cluster depending on the goal of the analysis. In graph cluster analysis the usual definition of a cluster is a group of nodes that are relatively densely interconnected but have few connections towards nodes outside of the cluster.
2222

23-
For an overview of graph clustering see [@Schaeffer:2007]. The more developed field within graph clustering is global clustering. Global algorithms are allowed to look at the whole graph while calculating its clusters. Well known global clustering algorithms include [@Blondel:2008], [@Dongen:2000], [@Newman:2004], [@Newman:2006], [@Pons:2006] and [@Raghavan:2007].
23+
For an overview of graph clustering algorithms see [@Schaeffer:2007]. The more developed field within graph clustering is global clustering. Global algorithms are allowed to look at the whole graph while calculating its clusters. Well known global clustering algorithms include [@Blondel:2008], [@Dongen:2000], [@Newman:2004], [@Newman:2006], [@Pons:2006] and [@Raghavan:2007].
2424

25-
On the other hand, local clustering algorithms get a small set of source nodes (typically a single node) as input and calculate the cluster they belong to in the graph. While doing so, local algorithms are only allowed to look at the already visited nodes of the graph and their neighbours.
25+
Local clustering algorithms get a small set of source nodes (typically a single node) as input and calculate the cluster they belong to in the graph. While doing so, local algorithms are only allowed to look at the already visited nodes of the graph and their neighbours.
2626

2727
Graph cluster analysis is used in a wide variety of fields. This project does not target one specific field, instead it aims to be a general tool for graph cluster analysis for cases where global cluster analysis is not applicable or practical for example because of the size of the data set or because a different (local) perspective is required.
2828

2929
# Algorithm
3030

31-
This Python project implements the Hermina-Janos local clustering algorithm and its hierarchical variation. The algorithms are independent of the used cluster definition, instead they define an [interface](src/localclustering/definitions/base.py) cluster definitions must implement. One such cluster definition, a simple connectivity based one, is available as part of the project and it was used to generate the example result below as well as all other results that can be found in the repository.
31+
This Python project implements the (hierarchical) Hermina-Janos local clustering algorithm. The algorithm is independent of the used cluster definition, instead it defines an [interface](src/localclustering/definitions/base.py) which cluster definitions must implement. One such cluster definition, a simple connectivity based one, is available as part of the project and it was used to generate the example figure as well as all other results that can be found in the repository.
3232

3333
![The cluster of Elvis Presley in Spotify's Related Artists graph.](documents/cluster_example.png)
3434

35-
The following sections provide a high-level overview of the algorithms and cluster definitions. For more details and analysis, please see the [algorithm description](documents/algorithm.rst) and [IPython notebook](documents/Algorithm%20Analysis%20with%20the%20Spotify%20Related%20Artists%20Graph.ipynb) that are provided as part of the project.
35+
The following sections provide a high-level overview of the algorithm and cluster definition. For more details and analysis, please see the [algorithm description](documents/algorithm.rst) and [IPython notebook](documents/Algorithm%20Analysis%20with%20the%20Spotify%20Related%20Artists%20Graph.ipynb) that are provided as part of the project.
3636

3737
## Local clustering algorithm
3838

@@ -49,8 +49,8 @@ The hierarchical version of the Hermina-Janos local clustering algorithm extends
4949

5050
Similarly to the base algorithm, the hierarchical Hermina-Janos algorithm is also an iterative process with the following two steps:
5151

52-
1. Local clustering step: use the Hermina-Janos local clustering algorithm with the current configuration of the used cluster definition to calculate the cluster.
53-
2. Cluster definition relaxation step: this is a highly cluster definition-dependent step where the algorithm adjusts or relaxes the cluster definition's parameters so in the next iteration the local clustering algorithm will be able to further extend the cluster.
52+
1. Local clustering step: Use the Hermina-Janos local clustering algorithm with the current configuration of the used cluster definition to calculate the cluster.
53+
2. Cluster definition relaxation step: This is a highly cluster definition-dependent step where the algorithm adjusts or relaxes the cluster definition's parameters so in the next iteration the local clustering algorithm will be able to further extend the cluster.
5454

5555
## Cluster definitions
5656

@@ -63,9 +63,9 @@ Furthermore, for a cluster definition to be hierarchical, it must be able to adj
6363

6464
### Connectivity based cluster definition
6565

66-
The connectivity based cluster definition is the default cluster definition implementation in this project that also happens to be a hierarchical one.
66+
The connectivity based cluster definition is the default (hierarchical) cluster definition implementation in this project.
6767

68-
The cluster definition broadly works the following way:
68+
The cluster definition broadly works in the following way:
6969

7070
1. It calculates the *quality difference* the node provides or would provide for the cluster.
7171
2. It calculates the minimum quality difference - the *threshold* - to compare the quality difference to.
@@ -96,7 +96,7 @@ A component for recording the steps the algorithms have taken is also provided.
9696

9797
# Ideas for future work
9898

99-
- Reimplementation for parallel computing: Most of the calculations the algorithms make (the only exception being the actual cluster update) can be executed in parallel, that could significantly improve performace.
99+
- Reimplementation for parallel computing: Most of the calculations the algorithms make (the only exception being the actual cluster update) can be executed in parallel, which could significantly improve performance.
100100
- New cluster definitions: Only one cluster definition is provided in the project. More cluster definitions can be implemented for example by building on cluster quality metrics such as modularity [@Newman:2004].
101101
- Analysis of how cluster definitions should be configured for graphs with different characteristics.
102102
- Result comparison with global clustering algorithms on well-known and -analyzed graphs such as the Zachary karate club [@Zachary:1977].

0 commit comments

Comments
 (0)