Bandwidth communism

Promoters of remote outsourced datacenter, such as Nicholas Carr usually ignore the availability and the cost  bandwidth.  Think Netflix and all conflicts it is fighting with the local cable internet providers. We can't assume that free Internet connectivity is adequate for all business purposes.  Such an assumption is corrected called "bandwidth communism".

Yes, fiber-optic changed WAN landscape making remote services more viable and Internet tremendously more powerful.  But the devil is in details. For example file sharing for a large company over WAN is still bad idea as public bandwidth is insufficient and private WAN is costly. Also any enterprise making bet of 24x7 availability of public bandwidth for vital corporate services looks slightly suicidal because of the "tragedy of commons" problem, which already demonstrated itself in repressions against P2P enthusiasts by large ISPs.  All-in-all this "grand utility computing" vision ("bandwidth communism") is problematic because somebody needs to pay for all this expensive infrastructure. 

Fiber networks increased both Internet and LAN bandwidth substantially and revitalized distributed computing. But there is a big difference whether you distribute over LAN or WAN.  The latter is much tougher case. With all the tremendous progress of Internet available bandwidth does not increase as quickly as computing power. Nowhere close, and it never has. If anything, due to increased scrutiny and "shaping" by ISPs (they are not a charity and need to be profitable)  bandwidth "per user" might recently start decreasing as such resource hogs as YouTube and video programming distribution services (movies on demand) are becoming more and more prominent. Ability of video streams and P2P services to clog the Net in the most inopportune moment  now is well established fact and is  a real curse for  university networks.

For i/o intensive tasks, unless you pay for the quality of service,  "in the cloud" computing model stands on a very shaky ground. Reliable 24x7 bandwidth cannot be free for all users in all circumstances and for all protocols. Substantial amount of traffic with the remote datacenter is not that easy to transmit reliably and with minimal latency via public channels in rush hours.  But buying private links to remote datacenters can be extremely expensive: for mid-side companies it is usually as expensive as keeping everything in house. For multinationals it is more expensive, so only "other considerations" (like "jurisdiction over the data") can sustain the centralization wave to the large remote datacenters.  For many multinationals SOX was the last straw that made move of datacenters out of the USA desirable, costs be damned.  Now the shadow of NSA serves as keeping this scare alive and well. Cost of private high speed links limits cost efficiency of  the "in the cloud" approach to any service where disruptions or low bandwidth in certain times of the day cannot lead to substantial monetary losses.  For critical business services such as ERP public data channel can be too brittle.

But it is fair to state that the situation is different for different services. For example for SMTP mail outsourcers like Google/Postini, this problem is not relevant due to the properties of the SMTP protocol: they can communicate via regular Internet. The same is true to DNS services providers, webmail and instant messaging. CRM is also pretty close. But for ERP, file sharing and WAN based backup the situation is very different:  providing high speed networking services over WAN is a very challenging engineering task to say the least. The cost of bandwidth also puts natural limits on service providers growth as local networks are usually much cheaper and much faster. Achieving 1Gbit/s speed on LAN is extremely cheap (even laptops now have 1Gbit adapters) while it is quite expensive over WAN. There are also other limiting factors:  

  1. The possibility of creation of local LAN backbone with speeds higher the 1 Gbit/s.  10Gbit/s backbones are becoming more and more common.
     
  2. Limited bandwidth at the point of connection of provider to the Internet. Every provider is connected to the Internet via a pipe and that pipe is only so big. For example OC-1 and OC-3 have their upper limit of 51.84Mbit/s and 155.2 Mbit/s  correspondingly.  Funny the upper speed of OC-3 (which is pretty expensive) is only slightly higher that 100Mbit/s which long ago became the lowest common denominator for LANs.  Large service providers typically use OC-46 with speed up to 2488.32 Mbit/s which is similar to the speed of gigabit Ethernet.  10 Gigabit Ethernet is the fastest commonly available network standard for LANs.  It is still emerging technology with only 1 million ports shipped in 2007 but it is quickly growing in importance.   It might be eventually used in modified form for WANs too.   Anyway as WAN bandwidth is limited and shared between multiple customers the spike in activity of one customer might negatively affect others.  Networking problems at the provider level affect all its customers and recovery period might lead to additional spikes of activity.
     
  3. Local vs. remote storage of data. Recent enterprise level hardrives (Cheetah 15K)  have speed up to 164 MB/sec (Megabytes, not megabits).  From the speed and cost point of view the ability to keep data/programs local is a big technological advantage.  For I/O intensive applications it might be that the only viable role for remote providers is synchronization with local data ;-). Example of this approach is Microsoft's Live Mesh
     
  4. Interdependence of customers on the transport level. This is jut another incarnation of "tragedy of commons" problem. Bandwidth hogs like game, P2P, music and video enthusiasts  do not care a dime about your SLA and can easily put a company that uses public links into disadvantage any time of the day if and when something new and exiting like a new HD movie was released. Also  providers are not willing to sacrifice their revenue to accommodate "free-riders.": as soon as usage of bandwidth cuts into profits it is punishable and no amount of rhetoric about "Internet freedom" and "Net neutrality"  can change that. That means that enterprise customers relying on public bandwidth can suffer from the effort of providers to manage free-riding. That means the corporation which moved services to the cloud competes with various bandwidth hogs who do not want to scarifies any ground and ready to go quite far to satisfy their real or perceived needs.  My university experience suggest that corporate users can suffer from Internet clogging in the form of sluggish download speeds, slow response times and frustration with i/o intensive services that become much less useful and/or enjoyable. See for example Time Warner Cable Vs. The Horde.
     
  5. Competition for the resources at remote datacenter level.  For any successful service providing all the necessary bandwidth is costly and cuts into margins.  Recently Amazon faced the situation when bandwidth required for its Elastic Compute Cloud (EC2) proved to be higher then by all of Amazon.com’s global websites combined. You can read between lines how that affect profitability:

Adoption of Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) continues to grow. As an indicator of adoption, bandwidth utilized by these services in fourth quarter 2007 was even greater than bandwidth utilized in the same period by all of Amazon.com’s global websites combined.

Web services providers which offer customers unlimited bandwidth are banking on the fact that the majority of their customers will not use much of their bandwidth. This is essentially a marketing trick.  As soon as you exceed a fraction of what is promised they may well kick you out.  People who tried to implement software , mp3 or video sharing services on low cost ISP accounts realized that very soon. See for example references that I collected under "Unlimited bandwidth myth".  Web neutrality does not mean the tragedy of commons is not applicable.   As Bernardo A. Huberman, Rajan M. Lukose noted:

Because the Internet is a public good and its numerous users are not charged in proportion to their use, it appears rational for individuals to consume bandwidth greedily while thinking that their actions have little effect on the overall performance of the Internet. Because every individual can reason this way, the whole Internet's performance can degrade considerably, which makes everyone worse off. An analysis of the congestions created by such dilemmas predicts that they are intermittent in nature with definite statistical properties leading to short-lived spikes in congestion. Internet latencies were measured over a wide range of conditions and locations and were found to confirm these predictions, thus providing a possible microscopic mechanism for the observed intermittent congestions of the Internet.

So a company which will try to implement Web based streaming of say corporate video conference via cloud is up to nasty surprises unless it paid "arm and leg" for dedicated lines  to its headquarters and other major locations (which make the whole idea much less attractive in comparison with the local datacenter). The ability to stream video of any considerable quality in real-time between two (or more!) arbitrary points in the network is not really something that can be easily done over the current Internet.

The main point to make is that a reliable WAN network connectivity cost a lot of money  is difficult to achieve. This problem is unavoidable if your major components are "in the cloud" (in WAN). Also in the "free internet" enterprises are starting to compete for bandwidth with streaming media (films over Internet). The latter proved to be a huge resource hog and quality of a typical Internet connection now fluctuates widely during the day. That means that in order to achieve respectable quality of service for bandwidth intensive applications enterprises need to buy dedicated WAN connections. That is a very expensive habit to say the least. In typical for multinationals moves, say, relocation of  SAP/R3 instance from USA to Europe (just from one continent to another) to achieve reasonable latency for requests coming from the USA is not that easy and definitely not cheap.  The cost of  high bandwidth transatlantic connection is the major part of additional costs and eats all savings from the centralization. The same effect is true about any WAN connection: reliable high-bandwidth WAN connections are expensive.  Moreover the reliability needs to be carefully monitored and that also cost money as anybody who was responsible for the company WAN SLA can attest.

When the cloud evaporates: performance issues and dealing with WAN and provider outages

Public internet is unsuitable for handling large volume of transaction with stringent performance criteria. That means that it is dangerous to put databases at "in the cloud providers" : the more successful "in the cloud" providers are ( or if there just are too many P2P  and or multiplayer videogames enthusiasts in the same subnet), the slower your  WAN connection will be ("tragedy of commons").

Moreover, in comparison with LAN, WAN-based provision of software services is more complex system and as such is less reliable especially at bottlenecks (which are service provider "entry points" and associated infrastructure (DNS,  routers, switches, etc).  With WAN outage the situation can became  a lot worse then when just when spreadsheets or MS Word documents suddenly are inaccessible on the local server due to LAN outage (but you can still download then into USB stick directly from the server and work with the local copy until network connectivity is restored, because your departmental file server is just several dozens of yards away and friendly administrator probably can help you to get to the data. In case of WAN there is no way to put a USB stick on the server or use other shortcut to avoid effects of network downtime: if WAN connection is down you are really down.  Generally not only you can do nothing about the outage, its effects might be amplified by the fact that there are many customers affected.  All you will get is the message like this:

The service is experiencing technical difficulties. We apologize for the inconvenience. Please try again later .

That means that in some cases the effect on organization of external outage might be such that the head of the person who enthusiastically recommended company to move "into the cloud" rolls down independent of his position, real or faked IQ and technical competence. Recently both Gmail and Amazon services experienced multiple outages. As Brad Stone noted in NYT:

There is plenty of disappointment to go around these days. Such technology stalwarts as Yahoo, Amazon.com and Research in Motion, the company behind the BlackBerry, have all suffered embarrassing technical problems in the last few months.

About a month ago, a sudden surge of visitors to Mr. Payne’s site began asking about the normally impervious Amazon. That site was ultimately down for several hours over two business days, and Amazon, by some estimates, lost more than a million dollars an hour in sales.

The Web, like any technology or medium, has always been susceptible to unforeseen hiccups. Particularly in the early days of the Web, sites like eBay and Schwab.com regularly went dark.

But since fewer people used the Internet back then, the stakes were much lower. Now the Web is an irreplaceable part of daily life, and Internet companies have plans to make us even more dependent on it.

Companies like Google want us to store not just e-mail online but also spreadsheets, photo albums, sales data and nearly every other piece of personal and professional information. That data is supposed to be more accessible than information tucked away in the office computer or filing cabinet.

The problem is that this ideal requires Web services to be available around the clock — and even the Internet’s biggest companies sometimes have trouble making that happen.

Last holiday season, Yahoo’s system for Internet retailers, Yahoo Merchant Solutions, went dark for 14 hours, taking down thousands of e-commerce companies on one of the busiest shopping days of the year. In February, certain Amazon services that power the sites of many Web start-up companies had a day of intermittent failures, knocking many of those companies offline.

The causes of these problems range widely: it might be system upgrades with unintended consequences, human error (oops, wrong button) or even just old-fashioned electrical failures. Last month, an electrical explosion in a Houston data center of the Planet, a Web hosting company, knocked thousands of Web businesses off the Internet for up to five days.

“It was prolonged torture,” said Grant Burhans, a Web entrepreneur from Florida whose telecommunications- and real-estate-related Web sites were down for four days, costing him thousands of dollars in lost business.

 I was actually surprised how much posts each "in the cloud" service outage generates and how significant were losses reported by some users; in addition to Official Gmail Blog one of the best places to track Gmail outages proved to be Twitter; there is also a site http://downforeveryoneorjustme.com/ which provides a free check for frustrated Gmail and other "in the cloud" services users). Some reactions are pretty funny:

As any self-respecting obsessive business e-mail checker could tell you, each outage is a real shock and fails on the most inopportune moment. In reality most email outages does not make users less productive, they just deprive them from their favorite tool of wasting own and other time and  procrastination ;-)