Case Study: Weta Digital - extraordinary rendition
Triumphantly awash in a sea of digital data
Weta Digital is an extraordinary company. It was formed 13 years ago in, of all places, New Zealand, by film director Peter Jackson and others to provide visual effects to movies. After having its physical effects part split off, one of its first projects was to provide the special effects (SFX) for the film Heavenly Creatures. This needed one computer, a film scanner and a film recorder. Since then the Lord of the Rings trilogy has happened, followed by King Kong, and Weta D now has up to 600 people working on projects using a so-called Renderwall suppercomputer with 4,400 processors backed by a BlueArc Titan super-NAS system.
Its storage history has seen an accelerated evolution as it coped with millions of files and the need to deliver data to thousands of processors and workstations and to store the results of their work. This history can conveniently be divided into periods where particular suppliers were prominent: Foundry; Silicon Graphics; Network Appliance; and BlueArc. The events throw light on storage technology evolution with particular emphasis on tape format choices, nearline and offline storage and a choice of commodity storage and networking followed by a move into specialised filer hardware.
Phase 1 - Foundry
When Weta realised it was going to work on the Lord of the Rings (LOTR) trilogy in 1998 it set about looking for a computing infrastructure that would serve its purposes. The raw outline was obvious: video workstations, rendering servers and an online storage facility that could cope with the data amount and delivery needs.
In 2001 It had only one rack of 32 processors. In 2002 its 'renderwall' housed 392 processors. (This rendering wall combines computer-animated and live action elements into digital movie files.) There were also over 400 graphics workstations. For the second film of the LOTR trilogy, The Two Towers, the then-CTO Jon Labrie expected the company to use about 1,200 processors. In the third film the complex SFX scenes, like the battle of Pelennor Fields, needing thousands of hellish SFX Orc warriors, needed 3,200 processors, rendering data at teraflop speeds using a 10Gbit/s Ethernet infrastructure.
An incredible 1,000 processors were added in the final ten weeks of processing. All these processors were IBM blade servers by the way.
In November, 2005, the period of King Kong, Weta appeared four times in the list of the world's top 500 supercomputers:-
Rank System Procs Rmax Rpeak Vendor
Rank - 109 IBM BladeCenter HS20 Cluster, Xeon EM64T 3.6 GHz - Gig-Ethernet - 1000 processors
Rank - 323 IBM BladeCenter HS20 Cluster, Xeon EM64T 3.6 GHz - Gig-Ethernet - 512 processors
Rank - 335 IBM BladeCenter Cluster Xeon 2.8 GHz, Gig-Ethernet - 1176 processors
Rank - 338 IBM< BladeCenter Cluster Xeon 2.8 GHz, Gig-Ethernet - 1080 processors
That was a total of 3,768 processors which, with workstation CPUs, sent the processor total past 4,000. Most of them ran Linux although Windows, Mac O/S and Irix are also used.
The storage statistics are equally staggering. The third LOTR film needed 60TB of disk, 72TB of nearline storage and half a petabyte of tape storage. How did it come to this?
What network infrastructure?
In 2002 Labrie described how Unix workstations and servers needed a network technology to link them to storage: "The amount of material we have to move over the network every day is pretty extraordinary," - up to a terabyte a day is created by Weta D workers. The choices came down to two: fast Ethernet or Fibre Channel. This of course also meant a decision about storage; files over Ethernet or blocks over a SAN fabric.
Labrie said: "A couple of things concerned me about Fibre Channel. First, the cost was a bit prohibitive, and it wasn’t going to be a seamless, plug-and-play sort of environment." So he decided to choose Gigabit Ethernet using BigIron and FastIron switches from Foundry.
"The decision to go with Gigabit Ethernet was important,” he said. “If I’d gone with something more esoteric, like Fibre Channel, I’d have been far more limited in my choices of expanding my machine room infrastructure… And I wouldn’t have been able to take advantage of the natural and healthy competition between switch vendors."
Foundry says its BigIron Layer 3 switches provide 'a non-blocking architecture to support massive scaleability.'
With the three LOTR films shot back-to-back the need for scalability was acute. "We needed the network architecture to hit the ground running and grow with us for the next two years," Labrie said at the time. "Foundry has the architecture and performance we need in its core and edge devices." There was the obvious bandwidth expansion route into 10 gigabit Ethernet as one aspect of that architecture.
Another aspect of the scalability need was this: The third LOTR film, The Return of the King, had nearly 50 percent more SFX shots than The Two Towers. It was composed of more data than the first two films combined.
[Part 2 of this case study can be read here.]