In this article, we discuss several ways to quantify the importance of nodes in a network. We will discuss how a simple game can help study this special property, and how it can help us in cases like reducing fake news.
Suppose there is a wave of fake news spreading on a social media platform. The situation is extremely alarming, and it is very difficult to filter the real from the fake news. The authorities come to you and ask how they should react. How can they control the spread of fake news? Information and news are spread through social media, which is just a network of people who are connected with virtual friendships. You need to decide which nodes affect the spread of information the most! You can rephrase this under the more general question, “Which is the most important node in a network?”.
Well, there is not one single answer. Importance changes when we change the context we are dealing with. In a group of scientists, maybe a person with the most influential publications could be considered the most important? Amongst politicians or actors, maybe a person with the most followers could be considered the most influential? So it indeed seems, that importance depends on what context one is dealing with.
Making importance concrete
In network theory too, people are often intrigued to know, what is the most important node in a network. There are instances when it is needed to rank nodes of a network in order of their importance. Web search engines for example order websites, and often give us the desired results in the first few searches, how does this happen? An extremely logical question in this respect would be: Can we mathematically quantify importance? Can we assign numbers for associating importance to nodes in a network?
This is where centrality measures come into the picture. Suppose we assign a positive number to each node. The number associated with each node should be considered to be dependent on the importance of that particular node with respect to this measure.
For example, in the network above the black node in the middle is the most important when you consider closeness centrality (no worries, we will explain it a little bit). If you remove this point the network will break down in two components.
Let us look at some important examples of centrality measures.
Degree Centrality: This centrality measure counts the number of neighbors of a node in the network. Consider a network of people, with two people linked when they are friends. In this network, degree centrality is the same as the number of friends of a specific person. This centrality is the reason why companies choose celebrities to advertise their product, as celebrities have the largest degree on social media.
Closeness centrality: In this centrality measure, we look at the reciprocal of the sum of distances of a specific node from the rest of the nodes. The larger this quantity, the more central we consider the vertex to be. Intuitively, vertices with a higher average distance from the rest are less central.
PageRank Centrality: This centrality measure is used to rank web pages in a Google search. It is also used in citation networks to establish the importance index of a paper or a journal. You could have a look at this article written by Nelly Litvak for a detailed discussion on PageRank.
Betweenness Centrality: This centrality measures the proportion of shortest paths containing a particular vertex. This means that the vertices which lie in the shortest paths more often are considered to be more important.
Context plays a role
One of the strongest limitations that centrality measures have is that each of these measures works well in a certain context, whereas fails when subjected to a different context. For example, in the network below, the nodes with the highest degrees (i.e. 2 and 4) may not always be the most important ones. Node 3 for example, which has degree equal to two, keeps the two parts of the network (nodes 4, 5, 6, and 9, and 1, 2, 7, and 8) connected with each other. Nodes 2 and 4 are the nodes which can influence the most other nodes in the network. But if we want to identify the nodes that are important for the spread of information between nodes then node 3, which has a low degree but a "central" position, is also important.
This subtlety can also be illustrated as follows. Centrality measures can be classified into two kinds. One that is based on how strong the nodes keep the network together, and the other type is one in which we look at the flow of traffic within the network. A centrality measure in one particular category tends to fail in the other one. High betweenness centrality vertices are contained in most of the shortest paths. Whereas centrality measures like degree centrality only look at the number of neighbours of a particular vertex. So, if a vertex has a high degree but is far away from the rest of the network, it is relatively unnecessary for traffic flow within a network, in comparison to a low-degree vertex which lies in a key position in the network.
In the network above you see that a node with degree centrality (left) does not necessarily also have a high betweenness centrality (right).
One of the questions that network scientists are interested in is how we can compare several centrality measures. There could be several ways to do this. But one particular way is by removing the top percent nodes with respect to each centrality, and then compare the size of the largest connected component (the set of nodes in a network which are connected to each other is called a component), or the total number of connected components.
Closeness centrality
PageRank centrality
For example, in the network above if you remove the node with the highest closeness centrality (top left) then the network breaks down into two components (top right). If you remove the node with the highest PageRank centrality (bottom left) then the network stays connected (bottom right).
For very good centrality measures we expect that removing a percentage of vertices that are “important” according to this measure, will have a great impact on the structure of the remaining network. For example, it can lead to a network which consists of many components which are not connected with each other. In some way, removing important nodes will break down the whole network.
Coming back to the question we started from, which nodes are crucial in a social network when we look at how news spread? Nodes whose removal breaks down the network into many components could be considered important. Because breaking a network into smaller and smaller connected components will reduce the spread. So the centrality measure which performs the best with respect to node removal, is of great importance.
Do you want to experiment with various centrality measures and networks? You can try yourself various cases in the animation below, developed by Martijn Gösgens from the Eindhoven University of Technology. You can choose a network from the Erdrös-Rényi, Preferential Attachment, Geometric, and Configuration Model, a centrality measure (Betweenness, Closeness, Degree, Eigenvector, Game of Thieves (discussed later), Harmonic, and PageRank), and the fraction of most central nodes to be removed with respect to the chosen centrality measure. Then you see what happens to the network. The higher the number/darker the color of a node the more central it is with respect to the chosen centrality measure. Try to figure out what happens!
Game of Thieves
There is another centrality measure based on a game played by thieves on a network, which is called the Game of Thieves. In this game, there are thieves at each node in the network. Assume that every node contains a fixed number of diamonds. Every thief keeps jumping to its neighbouring nodes, jumping to each neighbour with equal probability until it reaches a node with at least one diamond in it. The thief then steals this diamond and takes it back to the node it started from. The configuration, or the diamond distribution, of the network at the end is used to decide how important or central each node is. The more central the node is, the faster it will be drained of diamonds, as it would be attacked by more thieves. The less central nodes will have more diamonds.
It has been shown that Game of Thieves is indeed a very nice way to depict importance in terms of node removal procedures. It is also time efficient, in terms of algorithmic complexity they say that it converges in polylogarithmic times, which is a power of logarithms. In short, it is a very efficient algorithm to compute central nodes in a network.
Giving meaning to the words "important" and "central" depends heavily on the context. In this article, you read about various techniques to give concrete meaning to these words. Next time you want to find the most central node in a network you can choose the measure that suits your needs!