Open source Linux clustering
Just 751,075,200 seconds after the PC launch, they've come a long way
By Mark Gibbs, Network World | Network World US | Published: 01:00, 31 May 2005
Today, it is exactly 23 years, nine months, 20 days; or 8,695 days; or 751,161,600 seconds; or 12,519,360 minutes; or 208,656 hours; or just less than 1,242 weeks since the launch of the original IBM PC on 12 August 1981.
A wonderful service on timeanddate.com told us so. This site provides a number of almost useful calculators that determine such timely things as the duration between two dates, or when alternative birthdays (such as when you are 1 billion seconds old) will occur.
Other than providing you with yet another site on which to waste your highly valuable time when you should be doing far more productive things, we bring this up as a fairly thin, albeit not completely, uninteresting way for us to note how far PCs have come in the short time since their launch.
What brought this ooh-ah moment home for us was receiving a fantastic new book titled The Linux Enterprise Cluster by Karl Kopper.
The Linux Enterprise Cluster is a how-to book and explains how to convert two or more PCs into a high-reliability, high-availability cluster based on Linux and inexpensive hardware using free and mainly open source software - what would have been an unthinkable configuration back when mainframes ruled the earth.
The book starts by exploring what is meant when we talk about a "cluster" and offers the definition of a system that can be used as "a single computing resource" using "a local computing system comprising a set of independent computers and a network interconnecting them."
Key to the whole concept is that a cluster must not have a single point of failure. Should any of the individual computers in the cluster (the "nodes") fail, there must not be a failure of any service provided by the cluster. This means that any node in the cluster can fail and be rebooted without users of the cluster being aware of the events.
This leads to the four basic properties of a cluster, which are all about what we could quite reasonably, call "transparency":
Users accessing cluster services don't know that they are using a cluster.
Nodes that comprise the cluster don't need to be aware that they are part of a cluster.
Applications running on nodes don't need to know they are running in a cluster environment.
Servers that are not part of the cluster don't need to know when they are providing services to nodes in a cluster.
The basic architectural elements of a cluster are a load balancer, shared data storage and output devices. The load balancer sits between the nodes and the users and distributes the incoming workload to the node services. The shared data storage must support lock arbitration to ensure exclusive access for each process to items (files, blocks or bytes, as required) in the file system. The final basic architectural element, output devices, covers printers, fax lines, and so on.
To manage a cluster, we can have one more optional architectural element, a Cluster Node Manager. The cluster node manager can provide an application licence service -- a centralised user database and a performance-monitoring console.
Building a true enterprise-class cluster system is obviously quite a complex and challenging task. The book's approach is to use a number of readily available subsystems. These subsystems include server data synchronisation using the rsync package; failover management using the open source Heartbeat software, which includes Stonith (which stands for "Shoot The Other Node In The Head") to ensure a failed system is really dead; the Linux Virtual Server project kernel patches to enable load balancing; and the Ganglia package for collecting and displaying node and cluster performance statistics.
This book is fascinating, and while it is quite technical in places, it also explains the topics clearly enough for those not quite so familiar with Linux to develop an understanding of what a cluster is.
Over the next week or two, we'll look at some of these subsystems and how they work. Maybe we'll even try to get a test cluster running under VMware. Will the fun never end?