| | | | | Michael Osterman, goes on to say, "Organizations are |
| | | | | not meeting their targets for messaging system |
| The year 2008 began with a dire prediction from | | | | availability," and adds that the average e-mail system |
| Subodh Bapat, a vice president in the eco-computing | | | | experiences about 70 minutes of downtime during a |
| team at Sun Microsystems, when he declared, "You?ll | | | | typical month, which translates to 99.84% uptime. To |
| see a massive failure in a year." He went on to say, | | | | this he poses the question, "Is this good enough?" ¹¹ |
| "We are going to see a data center failure of that | | | | |
| scale," referring to the worm that took down 5% of | | | | Ceryx Inc., a Hosted Microsoft Exchange provider with |
| worldwide UNIX boxes in 1988.¹ | | | | data center facilities in Canada and the United States, |
| This time he isn't citing security lapses as the root | | | | doesn?t think so. They were the first in the industry to |
| cause but rather failure caused by the massive | | | | offer a real 100% SLA based on their multi-data center |
| computing power required to run today's applications. | | | | architecture and software design. Customers? data is |
| Though certainly an extreme position, the past year | | | | replicated in real-time and resides in both data centers |
| has seen a rash of data center failures that brings into | | | | – more than 500 miles apart – so that even in |
| question how reliable single data centers are for the | | | | the event of catastrophic failure, the primary system |
| delivery of mission-critical applications. | | | | would fail over with almost no impact to the end-user. |
| Vulnerabilities ranging from the most common, like | | | | "We operate on the premise that even the best data |
| natural disasters and infrastructure failure (data center | | | | center can and will experience failure due to |
| power outage, burst pipes, construction work | | | | circumstances beyond anyone's control," says Dr. |
| damaging fibre lines, ) to hardware failure, storage or | | | | David Penny, CIO at Ceryx. "We focus our R&D |
| database failure and common software problems, | | | | on keeping the application highly available and rely on |
| have been causing regular disruptions to businesses | | | | our replication technology to mitigate the vulnerabilities |
| and come with a high price tag. | | | | that exist on the data center level. And then we make |
| Recent events in the news support the fact that even | | | | the operating and capital investments necessary to |
| with good planning, resourcing and design, some of the | | | | execute daily." |
| most sophisticated facilities can still experience | | | | For the past 4 years Penny and his team have |
| catastrophic failure. | | | | worked with Enterprise Messaging systems, like Lotus |
| Last summer, the state-of-the-art 365 Data Center, in | | | | Notes and Microsoft Exchange, developing technology |
| San Francisco – built with more than $125 million - | | | | to deliver high availability. Since 2004 they have been |
| was offline for hours due to a power grid outage by | | | | providing a geo-replicated Microsoft® Exchange |
| Pacific Gas & Electric that put a significant portion | | | | 2003 service to medium and large-sized companies |
| of San Francisco in the dark. Subsequently, the backup | | | | who see the cost and performance benefits of the |
| generators at the facility also experienced failure and | | | | Ceryx solution. |
| had to be manually started.² | | | | Most recently Dr. Penny and his team have been |
| "When researching data centers, new facilities often | | | | working with Geographic Clustering in Server 2008 |
| boast N+2 levels of redundancy," says Roger Smith, | | | | and native Microsoft Exchange 2007 CCR (Cluster |
| V.P. of Operations at Ceryx Inc. "However, as these | | | | Continuous Replication) technology. What this allows |
| same facilities fill up and age, that often becomes N+1, | | | | for is clustering over a wide area network. Traditional |
| or in some areas no redundancy at all." | | | | clusters, which rely on the same RAID system in order |
| According to Sun Microsystems Executives, the typical | | | | to continue to function properly, are susceptible to |
| life span of a data center is only about 10 to 12 years | | | | logical corruption and certain physical corruptions that |
| and many data centers - built at the beginning of the | | | | can propagate across an entire RAID array causing |
| dot-com era – now need to be rebuilt. | | | | complete failure. Geo-Clustering eliminates the reliance |
| "As the person who is accountable for uptime I have | | | | of redundant servers on the same set of disks |
| to balance which applications are considered critical by | | | | thereby eliminating a very common single point of |
| upper-management and clearly communicate the cost | | | | failure. |
| and investment required to provide high-availability," | | | | "Even with WAN replication we need to ensure that |
| says Roger Smith. "When you present the facts, it | | | | the corruption itself isn?t replicated," says Dr. Penny. |
| becomes clear to everyone that an in-house data | | | | For this they are utilizing log-shipping with delayed |
| center couldn't possibly provide the levels of | | | | application rather than block-level replication, thereby |
| redundancy required and even on a co-location level | | | | avoiding the replication of corruptions caused by |
| we would need redundancy." | | | | application defects. By monitoring performance on the |
| In many cases, no contingency plan could avoid the | | | | primary system closely they can stop bad changes |
| issues that plague individual data centers. On July 14th | | | | from being committed to the secondary system. |
| of this year, the Peer 1 data center in downtown | | | | Beyond the physical vulnerabilities of a single data |
| Vancouver – one of the largest facilities in Canada | | | | center, Ceryx is protected against a number of other |
| – was offline for almost an entire day. An | | | | vulnerabilities anyone using a single data center is |
| underground fire caused massive power outages | | | | exposed to. "When negotiating our contract, our |
| throughout downtown Vancouver. While backup | | | | provider knows how easy it is for us to move |
| generators at Peer 1 started without issue, the | | | | facilities," says Roger Smith. "The data is already |
| water-based cooling system failed as firefighters – | | | | replicated and we don?t need to physically migrate |
| in their attempt to douse the fire - depleted the water | | | | servers. Migration to a new facility can occur without |
| pressure required to keep the cooling systems | | | | any impact to our customers. We can't be held |
| operational. This caused the backup generators to | | | | hostage to a bad contract or radical increases in |
| overheat and any failover to UPS was limited to a | | | | pricing or continued poor performance." |
| short battery life.³ | | | | Ceryx also has a lot of flexibility where routing is |
| In a similar event this summer, The Planet, a prominent | | | | concerned and should a backbone be down or |
| hosting provider in Houston, experienced a major | | | | congested, Ceryx with front-end servers operating at |
| explosion in their data center, taking more than 9000 | | | | both facilities, has the flexibility to route traffic through a |
| customer servers offline for several days. Backup | | | | separate facility and bypass potential network |
| generators worked perfectly, but again the fire | | | | congestion that can plague operators running out of a |
| department would not allow the facility to resume | | | | single data center. |
| power until it was deemed safe. In some cases | | | | While there are a number of solutions in the market |
| servers were physically migrated to a new facility. | | | | that provide continuity through an interim e-mail system |
| In the aftermath of this disaster the Planet was | | | | in the event of downtime, the Ceryx system is |
| applauded for their response to the crisis; allocating | | | | different in that it doesn't require the user to even |
| every resource they could to address the problem and | | | | change settings when the e-mail system fails over to |
| proactively communicating status reports and issuing | | | | the secondary facility. Moreover, things like e-mail |
| SLA credits. | | | | history, sent items and calendar entries all remain intact. |
| Google, whose Enterprise App customers | | | | In this respect the Ceryx solution is not a continuity |
| experienced multiple outages on August 6th, 11th and | | | | solution but rather a high-availability solution that |
| 15th of this year, took a more reactive stance, | | | | provides layers of redundancy, from the software |
| promising to build a communication dashboard and | | | | level up to the facility level. |
| issuing a blanket credit for all customers, regardless of | | | | |
| whether they were impacted by the outage. | | | | Hosted archiving solutions – a good plan for any |
| The real question remains, what is the cost of data | | | | company facing regulatory and legal compliance - also |
| center failure and the resulting downtime for | | | | provides a layer of assurance and access to e-mail |
| organizations? Is it covered by SLA credits? Most | | | | records, should the primary facility suffer complete |
| SLA credits reflect the cost of the services rendered | | | | failure. However, these solutions will not provide |
| and almost never provide for business losses. | | | | business continuity or availability. |
| At the Continuity Insights Management Conference in | | | | Moreover, if the primary e-mail provider experiences |
| 2006, Agility Recovery Solutions stated that 78% of | | | | failure due to data corruption, the data being archived |
| businesses who suffer a catastrophe without a | | | | may be corrupt as well. Large data stores, even at the |
| contingency plan are out of business within 2 years. | | | | mailbox level, lead to corruption and the current trend |
| And 90% of companies unable to resume business | | | | of Hosted Exchange vendors selling e-mail accounts |
| operations within 5 days of a disaster are out of | | | | with massive storage allowances is introducing a |
| business within 1 year. | | | | higher probability of data corruption and subsequent |
| Clearly some applications are considered more critical | | | | failure. A good archiving strategy can be used to keep |
| and have more visibility than others. Large companies | | | | mailbox sizes manageable and subsequently reduce |
| feel the impact immediately when their ERP, CRM is | | | | the likelihood of corruption. |
| still plagued by a prime-time outage more than two | | | | So while extremely valuable in today's world of |
| years ago caused by a failure with an Oracle | | | | mission-critical e-mail, archiving to an external hosted |
| Database Cluster ?), Business Intelligence or E-mail | | | | facility should not be mistaken for a multi-data center |
| systems become unavailable. | | | | strategy. Instead archiving is a good backup plan and |
| However, with the proliferation of mobile devices and | | | | will not provide the protection businesses today need |
| 'everywhere access', e-mail clearly stands out as the | | | | against the inevitable vulnerabilities that exist with a |
| premier mission-critical application of today. Systems | | | | single-data center strategy. |
| like Lotus Notes® and Microsoft® Exchange | | | | These vulnerabilities are typically covered in the fine |
| maintain a living record of a company's existence, | | | | print of a facility?s SLAs, under the term "Force |
| storing every activity, process and thought an | | | | Majeure"; a phrase often translated as an "Act of |
| organization and its employees have. It's no surprise | | | | God? or the literal French translation, "Superior Force" |
| public companies are now required to maintain a | | | | and is included as a clause to excuse interruptions in |
| record of e-mail activity for compliance purposes. | | | | services caused by extraordinary circumstances |
| While the vast majority of businesses rely on e-mail | | | | beyond the control of the provider. Circumstances that |
| everyday to send contracts, proposals, quotes and the | | | | - as demonstrated over the past year - are becoming |
| majority of correspondence, most e-mail systems | | | | more and more common. |
| have not yet reached the point of reliability that phone | | | | Michael Osterman concludes, in his presentation on the |
| service provides (99.999% or 5.2 minutes of downtime | | | | Importance of E-mail Continuity, that the only solution to |
| per year) | | | | the inevitable problems that plague mission-critical |
| According to Osterman Research, most North | | | | service delivery is with a geo-replicated, multi-data |
| American businesses experience more than one | | | | center solution, like the one being offered by Ceryx. |
| e-mail outage every month -- and many indicate that | | | | |
| they could lose more than $100,000 as the result of a | | | | Footnotes: |
| single major e-mail outage.¹ | | | | ¹ CNET News: |
| Osterman also found that the average business | | | | |
| experiences nearly seven hours of e-mail downtime | | | | ² Data Center Knowledge: |
| every year and that outages can bring many workers | | | | |
| to a virtual standstill, who on average are 25% less | | | | ³ Data Center Knowledge: |
| productive during e-mail downtime. | | | | |
| "Forget the fact my billing rate gets impacted if I can't | | | | 4 Data Center Knowledge: |
| access my email system," says a partner at a major | | | | |
| North American law firm who prefers to remain | | | | 5 Center Networks: |
| anonymous. "My company image gets tarnished | | | | |
| immeasurably when I am working on a multi-million | | | | 6 CIO WebBlog: |
| dollar, highly-confidential deal and I have to send out a | | | | 7 London Chamber of Commerce Study, 2006 |
| set of documents using my Hotmail account because | | | | 8 The Importance of Messaging in the Enterprise: A |
| my email system is down. Somebody gets fired for | | | | survey of email application continuity, |
| that." | | | | Applicationcontinuity. |