The internet is growing quickly. How can it remain as fast as it is now? We’ll find the answer in the supermarket!
About 6.560.000.000 results (0,51 seconds). That is the first thing you see when you search for `networks’ in Google. Such huge numbers have become so normal, that you sometimes forget how impressive it is. Think about it: how many librarians would you need to find 10 billion books about networks? And how long would it take them to find all this?
And this is only one part of the story, because the number of internet users keeps growing as well, doubling over the last 10 years, up to 4 billion internet users in 2019. And the number keeps growing and growing. With this increase in users, more data-servers are built to satisfy all the users’ requests. For example, in 2016, Google built a new data center in Eemshaven. Total size? More than four soccer fields!
The increase in the amount of people using the internet gives birth to a big challenge, because just like you, every single one of these 4 billion wants the fastest internet possible. Hence, we need to make sure that every user can surf the internet as smoothly as possible. This modern demand gives birth to a big challenge for many scientists and engineers have taken up this task.
And the stakes are high, also for you: would you be willing to wait 10 seconds every time you search for something on Google?
But what is this modern challenge exactly? In order to explain it we need to describe what happens when we are surfing the internet. Amazingly enough, surfing the Internet is very similar to going to the supermarket, it is just a huge digital supermarket that we access from the computer at home.
Let’s go shopping
In a supermarket you pick up a basket or a trolley and you are ready for your shopping. A loaf of bread, peanut butter, milk, bananas, potatoes and a bar of Swiss chocolate. You found all of it and arrive at the checkout of your local supermarket and you want to purchase your groceries. You are presented with a choice: which counter do you choose to do checkout? Maybe you choose the one with the friendly looking cashier, but if you want to be home quickly, you probably choose the one with the shortest line.
In the big digital supermarket called the Internet your basket, or your trolley, is your browser and instead of bread, peanut butter and milk, you “purchase” websites, like Youtube, webmail, your music band’s and your local library’s site. Instead of a cashier who has to scan your products, there is a computer (used by Youtube, your music band and your library) that has to give you access to the website you requested, these computers are usually stored in data centers.
In a data center, many data servers are used to process requests from users. How does this work?
Suppose Anna wants to open an article on The Networks Pages, then this request will be sent to the data center. A server will be chosen to process Anna’s request. The data server can only process one request at a time, so if it’s already working on a different request, Anna’s request will wait until the server is free. So in this digital supermarket, where your basket is full of digital products, that is websites, you need to wait for your request to be served. And what do you see on your screen?
Now why is this similar to what happens in supermarkets? Just like you are waiting in front of a cashier, Anna’s request is waiting for a data server. And just like you want to check out your groceries, Anna wants to open her website.
In the supermarket, you choose yourself which line would be the best one to wait in. Luckily, in data centers you don’t need to choose yourself, the data center does this for you! This gives the following question for the data center: how can we make sure that every request is sent to the best server? This is a very challenging problem! Since we can view the Internet as a huge digital supermarket where the products are the websites we can see how this could work for the ordinary supermarket.
The question is: how can the supermarket help customers to choose the line which they will wait in?
The supermarket comes with a new idea to make checking out for their customers easier. They hire a new employee, Robin, who gives a ticket to every customer before they reach the counters. The customer has no choice anymore: on the ticket is written which queue they will have to join. And because Robin wants to have the best possible performance his ultimate goal is to send each customer to the server with the shortest queue.
In the example of data centers, there are digital Robins, called dispatchers, which before a request can join a server, first tell the request to which server it should go to. And it would make sense of course if the dispatcher would send requests to the shortest queues! Right?
Join the shortest queue?
Unfortunately, it’s not that simple: joining the shortest queue is too difficult. Why? Because it is impossible to find the shortest queue! In large data centers with thousands of servers and millions of requests, it is impossible that the dispatcher will know exactly how many requests are waiting for every server. And if the dispatcher doesn’t know how long the queues are, it also cannot find the shortest one!
But there is hope: there are innovative ways to overcome this growing mess of information.
Round-Robin
The first solution is: Round-Robin. The name does not originate from Robin the supermarket-employee, in fact, the word is older than the oldest supermarket.
In 1731, the following was written about Round-Robin:
“Lastly, mentions the Method used by Sailors when they mutiny, by signing their Names in an orbicular manner, which they call, a round Robin; whence the Phrase,
‘We have him as round as a Robin’.” [1]
Often with these petitions, the one who signed on top of a document would be seen as the leader of the petition, and could be punished for signing it. Sailors used to sign petitions on long round ribbons, so no one’s name would come first and no leader could be identified!
The phrase “we have him as round as a Robin” didn’t take off, but the Round-Robin technique is still very useful nowadays in data centers! A similar circular thing happens: the first request goes to server #1, the second request goes to server #2, third to #3 etc. After every server received a request, the next request will go the first server again, which makes the circle complete. This is the Round-Robin method of dispatching request to the servers.
The great thing that makes this method easy is: you don’t need to know anything about the queue lengths at the servers! Even with thousands of servers, this method is still very easy. And it works relatively well: servers will never receive too many requests in a row. However, it may still be the case that many long requests will be sent to the same server, which will lead to a big queue. Therefore, we will explore an alternative method.
The second method requires the dispatcher to remember queue lengths, and when done correctly, it will work even better than Round-Robin. To see this in action, we will go back to the supermarket once more and focus on the collaboration that is needed between Robin and the cashiers.
Pen and paper
Robin has been taking the job more seriously and is now equipped with pen and paper. Every time Robin sends a customer to a cashier, Robin writes it down, in order to memorize how many customers have been sent to which queues already. The cashiers are a vital part of the new strategy as well: because after a customer, say Anna, has paid for all of her groceries, the cashier will scream “Anna has left the building!” Robin will hear this, and will write this down too. When done correctly, Robin will know exactly how many customers are waiting for each cashier, without having to look at all the queues every time. That saves a lot of time for Robin!
As weird as the previous may sound, the servers in a data center will tell the dispatcher when a request is processed, similarly to the cashier screaming to Robin. This way, the dispatcher can keep track of all queue lengths, without much effort. So the idea from the supermarket works perfectly in a data center as well! In fact, it is much more beneficial since the number of data servers is much bigger than the number of cashiers.
When the size or scale of a data center becomes larger, the method, or algorithm, still performs well. In other words, it is scalable. If the data centers become even larger than they are now, you could decide that the servers should only send information to the dispatcher once every few minutes. That makes the method even more scalable!
Alright, problem solved!?
The two new methods you read about greatly decrease the communication needed in data centers, and they make sure that your internet is not impacted by the millions of users that join the internet daily. Can we lay back now and enjoy our super-fast internet? This problem is only one of many puzzle pieces that make the internet. For the next couple of years this specific piece fits fine, but sooner or later, when the internet is even bigger than it ever has been, we will need to find even better algorithms, and make the puzzle piece fit even better.
[1] The Gentleman's Magazine, Volume 1, Sylvanus Urban, Gent, page 238