Sunday School of Cool Programming: The Birth of an Apilator (Part III)

The Birth of an Apilator is part of a case study in modern client-server programming for the web. Part III focuses on automated session management and on methods foe zero-configuration clustering. Get and try Apilator yourself here!

Automated Session Management

For better or worse, the web was design as a stateless system. While this simplifies many things, its also means that every time a back-end needs to learn something about the client, it has to retrieve a lot of data from a permanent storage like a database. This is not only a boring thing to do, but is also rather slow. Things become especially complex when you need to serve based on who the client is. One of the most common tasks in modern web programming is therefore creation and maintenance of a stateful back-end.
A classic solution to both problems is well-known: create a temporary cache to store data (e.g., on disk or in-memory) and send the client a small token (known as session ID stored in a HTTP cookie) which he has to provide back on each request. The web has gone as far as automating the cookie management on the client side… and did little, if anything, for the server side.

Building a stateful back-end on one server is just a part of the task however. Having multiple API servers that work in parallel to share the same information – and do it consistently – is a much more daunting one. It is usually solved by utilising an intermediate caching agent which is accessed by all API servers. While solving one problem, it creates many others: what if this agent goes off-line? How to maintain a replicated cluster reliably? What if a cluster member is restarted? Since API servers need to both read and write data, how to guarantee the consistency of the cluster? Resolving these problems is often more complex than building the whole API. And what is worse, even if a reliable, replicated, write-everywhere cluster can be created, it will still have to be accessed over the network. While we believe networks nowadays are fast, in fact they are not – especially if you have to make a network transaction every time you need to serve an API request. Those familiar with TCP protocol will know that even if you only need to transfer small amount of data that will fit into one network packet, you will still need to exchange seven packets to get the data actually delivered — and this is a huge overhead.
When designing Apilator, we decided not to try and fix broken things, but to solve the problems at their root. It is obvious that such cache will be fastest not only if it resides in the same server, but if it is part of the API server itself:

  • UNIX sockets are better than TCP,
  • Inter-process communication is better than UNIX sockets,
  • Same-process data storage is best.

Building a local data storage is not a big deal in Java; having this storage synchronised across servers is more challenging — and we had an even higher goal: zero configuration. Avoiding the need to store access information when replicating data across cache servers give us the ultimate freedom to add and remove API servers at our convenience, free of any care about IP addresses, configuration, manual cache loading etc.

To achieve this goal we used an excellent, but often forgotten feature that seemed designed exactly for such a purpose: when one host needs to speak to many other hosts not knowing where they are: multicast. Multicast is a special part of the Internet Protocol which, together with some help by the underlying network layers is capable of implementing in bare metal the publish/subscribe workflow. Here is what happens:

  1. When an API sever starts, it sends a special packet to the network that is wants to join a pre-defined multicast group.
  2. The network switch intercepts this package and adds the API sever to the list of subscribers of this multicast group.
  3. When a packet is sent to this multicast group, the switch delivers it to all API servers that are subscribed.

Because this is implemented inside the Internet Protocol, and because it uses UDP as datagram container, as this is extremely efficient and has zero overhead (unlike any other higher-level application which implements the same flow).

Using this nice mechanism we solved the problem of zero configuration: all Apilator servers are members of the same multicast group. When a new piece of data (which we call session object) has to be stored for a client, it is written to the local in-memory cache. Immediately the cache sends a multicast packet to all other servers that a session object has been created – and includes its own IP address from which any other Apilator server can retrieve it using regular TCP. Normally, this is exactly what happens – when an Apilator server receives such a multicast update notification, it uses the provided IP address to fetch the object from the API server that initially created it. Thus each local cache is always in-sync with the rest and it does not matter when next request lands – cached data will be available locally. The whole update cycle takes less than 10 ms which guarantees that the relevant cache data would have been propagated across all Apilator servers before next network request arrives. When an object is updated or deleted from the cache, information about this is distributed the same way (with the obvious exception that when deletion occurs, there is no TCP exchange – each API server just deletes the local copy of the session object).

The whole process is greatly assisted by the fact that Java seamlessly serialises and deserialises the session objects with very little overhead making the TCP transportation very easy.

The described mechanism requires network communication only when a new session is created or updated, which happens much less frequently than network requests from clients. As a result, the API server is faster and the inner network load is lower.
The same tool was employed to solve two other related problems: the problem of introducing a new API server to a working cluster and the problem of the missing session ID in local cache. Obviously, when a new API server starts, it’s cache initially is empty and when a client supplies its session ID, it won’t be found in the cache. In this case:

  1. The API server first sends a multicast packets asking if any other server has the desired object stored.
  2. Only servers which have the object are allowed to respond – also via multicast.
  3. If a response comes back, the session object is retrieved from the server which answered using TCP.
  4. If no response comes back in a pre-set time (e.g., 10 ms), the session is considered expired and new one is opened, then automatically propagated to all other servers.

Finally, to further improve the situation with an API server joining in with empty cache in the case this server has been restarted, we added an on-disk dump of the local cache which is performed every now and then. When the API server starts, it will check if a local dump is available and will load it, thus greatly reducing the need to seek existing sessions on the network.

To Be or Not to Be

When a session object is created, something has to take care of its removal when it is no longer needed. Since sessions are often used to track users of a particular service, it can be assumed that once the user leaves the service, its session can safely be destroyed. While this is in fact true, it leaves an open gap: because the web is stateless in nature, the user can always terminate its communication with the server without the server knowing that – most commonly by simply closing the web browser’s window.

To accommodate for this case and to avoid endless storage of unused sessions, we attach a time-to-live (TTL) tag to each session object. When the TTL expires, each local cache will remove the session object.

This approach requires a proper strategy to setting TTL. Apilator provides three different ways to do it:

  1. Non-persistent TTL – session ID will be supplied in a HTTP cookie with no TTL set and the browser will remove it once closed. The session object will be created with a short TTL which won’t get changed before expiration.
  2. Persistent absolute TTL – once set, it cannot be modified. This is suitable for services which want to re-ask user’s credentials after a fixed amount expires since their last presentation.
  3. Persistent rolling TTL – each time the session ID is used, the TTL of the session object will be extended with a fixed amount of time. This is suitable for services which want to have a remember me function, but still want to ask the user’s credentials if the service has not been used for a prolonged period.

Serving the Client

After putting a fully automated session management in place, we decided to do the last step and also fully automate the session communication with the client.

Thus, when Apilator receives a request, it fist checks whether a cookie with the session ID is supplied. If there is none, a new session is created (and automatically propagated to other API servers). Then a cookie is returned to the customer with its new session ID.
If the client does supply a cookie with a session ID, the session is first sought in the local storage and then on the network. If not found, it is considered expired and a new session is created, propagated and fed back to the client as described above. If the session is found (either locally or fetched from the network), it is automatically made available to the API endpoint method which processes the request. If the method modifies the session object, it is stored locally (replacing the old one) and is propagated to all other API servers.
To avoid inconsistency due to network outages, each session object stores the timestamp of its creation. Thus API severs only update their cache if they receive a multicast announcement for an object that they either do not have or is newer in version.

This entry was posted in Нули и единици. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.