How to handle legacy data in a service-oriented architecture
Making use of that old code.
Service-oriented architecture (SOA) is an increasingly popular basis on which organisations can build new systems. The concepts are now well understood, and there are plenty of tools that make SOA development almost as easy for the programmer as, say, a traditional desktop application or a browser-based package. There’s a catch, though: after all, if a new technology seems too good to be true, then it usually is.
The catch is that most of us developing new applications do so in an environment full of old ones – and there aren’t very many 1980s and 1990s applications whose programmers foresaw the SOAP/WSDL protocols! We’re therefore stuck with the problem of integrating our new application to legacy (ie existing) systems and data.
Aspects of legacy dataWhen we’re building a new application that has to interface to legacy data, there are several aspects we have to consider. There are three key aspects:
- Getting to the data
- Extracting it from the application
- Mapping it from its current form to one we can use
Legacy computers and networksThe first part of the problem is physically getting access to the systems that the legacy data is stored on. This might not be as simple as it sounds: it’s still common to find companies with disparate systems of different makes and models that have never been connected to each other – finance systems on Token Ring networks, for example, or CAD systems on FDDI, where the rest of the company runs on Ethernet. So physically getting the data from A to B can be a non-trivial task.
The easiest solution presents itself when you’re decommissioning the old systems and applications altogether, as you can just do a one-off export of the data via disk, tape or some other storage media In this case all you need to do is find some way, no matter how messy, to get the data off onto a medium that can be read by the new system (and if you’re lucky you might even be able to unhook the storage from the old system and plug it into the new one) or, at a push, handled by a company that specialises in translating from wacky media onto modern alternatives.
In most cases, though, the task isn’t just one of data import, i.e. the need is to have some kind of connection between old and new that can be used repeatedly. This means you’ll need to provide some kind of network gateway between the new system and the old. If you’re to connect the two, you’ll need to consider two things:
- Handling legacy data in a service-oriented architecture
- The network protocols (layer 3) each can handle
Ideally, you’ll have an easy answer. So if you’re in an Ethernet network and the old finance server is based on Token Ring but is actually a NetWare system on an x86 server, the answer is simply to install an Ethernet card in the server (or, for that matter, a Token Ring card in a system in the new network). Sometimes, though, there won’t be a common denominator; the answer will be to install some kind of gateway device between the two worlds.
What type of gateway device you have depends on whether the two ends are incompatible only on physical network capabilities (i.e. you can’t get them both connected to Token Ring, PCNet, LocalTalk, packet radio, 100VG-AnyLAN, or whatever) or whether they also can’t talk each other’s layer 3 protocol. If, say, both systems can talk IPX but there’s no way to physically connect them, you’ll just need to buy a router (or make one – Linux is great for legacy network support and lends itself to this task superbly) to handle the layer 1/2 conversion. If you can’t find a layer 3 protocol they have in common, though, you’ll have to acquire or devise a more complex gateway device that adds a layer 3 translation function as well.
Legacy applicationsOnce you’ve got the two systems communicating with each other, you’re usually more than halfway toward solving the technology problems. The remaining step is to get access to the data now you’ve got access to the machine.
You have two options for getting access to the data. One is to have direct access to the files on the disks, via some kind of filesharing protocol (SMB, FTP, AppleShare, NFS, …) and to delve into this data in its raw form – in which case you can leap straight to “data mapping”, below. The other is to go through the legacy application itself, generally through some kind of API or import/export process, or perhaps through some portion of the application.
As an example of this concept of using “some portion” of the application, imagine your legacy application was some kind of stock control system that had a front end of some kind and a database back end. It might be possible to forget about the front end portion and make direct queries to the back-end database from your new application.
Data mappingOnce you have physical access to the data, the final step is technologically the simplest, although it may well be the most labour- and testing-intensive: devising a means of mapping the data (i.e. transforming it) between the legacy format and the new format. This mapping process may be uni-directional (i.e. the new system only reads from, or only writes to the legacy one) or it may have to be bi-directional (so you have to devise old-to-new and new-to-old mappings). You’re bound to have the usual data mapping gotchas such as different date formats, strings on machine A that are too long to fit into the equivalent on machine B, and so on. The tricky bit is that it’s common to find legacy data structures that were not properly normalised when they were built (so, for instance, instead of a purchase record holding a reference to a customer record, the system instead holds the entire customer details in each purchase record) and so the mapping process may be non-trivial.
Expect, then, for the data mapping exercise to be the longest, most tedious and most error-susceptible aspect of the integration process, and arrange your project plan and testing regime accordingly.
It’s worth mentioning at this point, that there’s no guarantee that what the organisation wants to do is in fact possible. The legacy integration aspect of the project should be a two-way street; that is, rather than saying: “We need to pull stock levels out of system X on demand”, the plan should say: “Investigate whether there is a way to pull stock levels out of system X on demand”. Sometimes you just can’t do what you’d like to do.
Let’s take the example of a records management system in an academic establishment which was to be linked into some new data processing software; the only option available was to do an automated data export based on a schedule, and to FTP it to a repository from which it could be received. The initial wish was for the new software to query the old system interactively, but this wasn’t technically possible within the budget available, so a compromise had to be reached. Such compromises are common in real-life developments, so it’s important to be aware that not everything is possible with the resources available.
Applying this to SOA What we’ve said so far sounds more like a general discussion of legacy integration concepts. And indeed that’s precisely what it is – SOA is merely one way to construct an application infrastructure, and as such it shares all the pitfalls of legacy integration that developers of more traditionally architected application face.
The benefit with SOA, though, is that the legacy integration aspect can be dealt with as just another service (or, more likely, a set of services) in the architecture. Just as you’ll have customer record interrogation services, order processing services, and so on and so on, so you’ll add legacy integration services that deal with the low-level aspects of fetching data from, and sending data to your legacy systems and applications. These services will present their interfaces to the applications in the organisation in the same way as any other: by providing abstract interfaces that the developers of the business logic can reference without any knowledge (and, more importantly, without the need for any knowledge) of precisely what’s going on underneath.
The benefit of using SOA in a legacy integration implementation, though, comes from what we discussed first of all: getting physical access to the system it’s on. If you’re writing a desktop application with legacy integration, you have to provide each desktop with a means of connecting to the legacy system – which may not even be possible if the legacy system can’t talk directly to the corporate network. If you’re writing a client-server application, you have to provide the server with a means of connecting to the legacy system, which either puts an extra on-board load onto the server or adds a time penalty by forcing the server to make requests from an external gateway device. With SOA, though, the legacy interface is just another service, perhaps even with the work shared among several physical servers, which puts the legacy access closer to the consuming application without adding to the hardware or software complexity of the client systems at all.
We’ll deal with how to go about implementing this abstraction phase in another feature.