Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

Performance Basics: Scalability

Published on March 10, 2010 in Performance. Tags: parallelism, parallelization, scalability.

Scalability is a word you hear so much in IT departments that it is sometimes on the verge of being a “buzz word”, you know like “synergy”. The word has its roots in High Performance Computing and became widely used in enterprise environments when big distributed systems started to be used to solve complex problems or serve a great number of users (hundred of thousands or even millions in the case of big websites) .

How it started in HPC (High Performance Computing)

I studied parallel computing in the mid-90s and, at the time, I remember our teachers saying “maybe one day, everything you are learning will be used outside of the realm of High Performance Computing”. That was highly prophetic when the Internet was mainly a research tool and computers with more than one processor were laboratory machines, the very notion of having two cpus in a laptop would have been mind boggling at the time.

And the most important thing to know when it comes to parallel computing is that some problems cannot be parallelized. For instance, iterative calculations can be very tricky because iteration n+1 needs the results of iteration n, etc.

On the other hand, processing of 2D images is generally easy to parallelize since you can cut the image in portions and each portion will be processed by a different CPU. I am over simplifying things but this is such an important notion that a guy named Gene Amdhal came up with the Amdahl’s law.

Let me quote Wikipedia here: “it [Amdahl's law] is used to find the maximum expected improvement to an overall system when only part of the system is improved”. In other words, if you take a program and 25% of it cannot be parallelized then you will see that you will not be able to make it more that 4 times faster whatever the number of cpus you are throwing at it:

This is the exact same problem that everybody is now experiencing on their home computers equipped with several cores, some programs will just use one core and the other cores will do nothing. In some cases, it might be because the programmer is lazy or has not learned how to parallelize code, in other cases, it is simply because the problem cannot be parallelized. I always find amusing to hear or read people ranting about their favorite program not taking advantage of their shiny new 4 core machines.

Well it is because it can be very hard to parallelize some portions of code and a lot of people have spent their academic lives working on these issues. In the field of HPC, the way to measure scalability is to measure the speedup or “how much is the execution time of my program reduced with regard to the number of processors I throw at it”.

The coming of distributed systems to the Enterprise

In 1995, something called PVM (Parallel Virtual Machine) was all the rage since it allowed scientists to spread calculations over networked machines and these machines could be inexpensive workstations. Of course, Amdahl’s law still applied and it was only worth it if you could parallelize the application you were working on. Since then, other projects like MPI or OpenMP have been developped with the same goal in mind. The convergence of these research projects, although not entirely linked, and the availability of the Internet to a wide audience is quite remarkable.

The first example that comes to mind is the arrival of load balancer appliances in the very late 90s to spread web server load over several machines thus increasing the throughput. Until then, web servers often ran on a single machine on the desk of someone. But when the Internet user population numbered in hundreds of thousands instead of a few thousands this way of doing things did not cut it anymore. So programs but more often specialized appliances were invented to spread the load over more than one web server. This means that if 100 users tried to access your website simultaneously, 50 would be directed to webserver 1 and 50 to webserver 2. This is not that different a concept from what people had been doing in High Performance Computing using PVM/MPI,etc. And luckily for us serving static content is very easy to parallelize, there is no interdependency or single bottleneck.

The modern notion of Scalability for Enterprise applications

I will stop here the comparisons between the ultra specialized HPC world and its enterprise counterpart but I just wanted to show that these two worlds might sometimes benefit in looking over each other’s shoulders.

Nowadays, scalability can have multiple meanings but it often boils down to this: if I throw more distributed resources to a IT system, will it be able to serve more customers (throughput) in an acceptable time (latency)? Or, what does it take to increase the capacity of my system?

Scalability in an enterprise environment is indeed about how to handle the growing usage of a given IT system. Back in prehistoric ages, circa 1990, new generations of computer arrived every 18 months like clockwork, offered twice the processing speed and most program benefited from it since they were all mono-threaded. But nowadays, most IT systems are made of different components each with their own scalability issues.

Take a typical 3-tier web environment composed of these tiers:

Web Servers
Application servers
Database servers

The scalability of the whole system depends on the scalability of each tier. In other words, if one tier is a bottleneck, increasing capacity for other systems will not increase your overall capacity. This might seem obvious but what is often not obvious is which tier is actually the bottleneck!

The good news is that this is not exactly a new problem since it pretty much falls under Amdahl’s law. So what you need to ask yourself is :

How much of the system (and subsystems) can be improved by throwing more resources at it? In other words, how parallelized is it already?
What does it take to improve the system and its subsystems? Better code? More cpu? more IO throughput? more Memory?
What improvement will it yield? What will be the consequences? Will more customers be served? Will they be served faster or as fast? etc.

In the end, it is back to finding the bottleneck in the overall system and solving it which might be easy (e.g. serving static content is very parallelizable) or extremely difficult (e.g. lots of threads waiting on a single resource to be available). Note that IT system should usually be built with scalability in mind, which would avoid any detective work when the time to increase capacity has come, but alas it is not always the case.

Top Visited <p>Your browser does not support iframes.</p>					Switchboard
					Latest
					Past week
					Past month

NEWS CONTENTS

20171024 : The combine law of Parkinson-Murphy ( my.safaribooksonline.com )
20171023 : Optimizing HPC Applications with Intel Cluster Tools ( Oct 23, 2017 , my.safaribooksonline.com )

Old News ;-)

[Oct 24, 2017] The combine law of Parkinson-Murphy

my.safaribooksonline.com

"The increase of capacity and quantity of resources of any system does not affect the efficiency of its operation, since all new resources and even some of the old ones would be wasted on eliminations of internal problems (errors) that arise as a result of the very increase in resources.". One only has to look at the space science sphere right now.

[Oct 23, 2017] Optimizing HPC Applications with Intel Cluster Tools

Oct 23, 2017 | my.safaribooksonline.com

Table of Contents

Cover

Title

Copyright

About ApressOpen

Dedication

Contents at a Glance

Contents

About the Authors

About the Technical Reviewers

Acknowledgments

Foreword

Introduction

Chapter 1: No Time to Read This Book?

Using Intel MPI Library

Using Intel Composer XE

Tuning Intel MPI Library

Tuning Intel Composer XE

Summary

References

Chapter 2: Overview of Platform Architectures

Performance Metrics and Targets

Latency, Throughput, Energy, and Power

Peak Performance as the Ultimate Limit

Scalability and Maximum Parallel Speedup

Bottlenecks and a Bit of Queuing Theory

Roofline Model

Performance Features of Computer Architectures

Increasing Single-Threaded Performance: Where You Can and Cannot Help

Process More Data with SIMD Parallelism

Distributed and Shared Memory Systems

HPC Hardware Architecture Overview

A Multicore Workstation or a Server Compute Node

Coprocessor for Highly Parallel Applications

Group of Similar Nodes Form an HPC Cluster

Other Important Components of HPC Systems

Summary

References

Chapter 3: Top-Down Software Optimization

The Three Levels and Their Impact on Performance

System Level

Application Level

Microarchitecture Level

Closed-Loop Methodology

Summary

References

Chapter 4: Addressing System Bottlenecks

Classifying System-Level Bottlenecks

Identifying Issues Related to System Condition

Characterizing Problems Caused by System Configuration

Understanding System-Level Performance Limits

Checking General Compute Subsystem Performance

Testing Memory Subsystem Performance

Testing I/O Subsystem Performance

Characterizing Application System-Level Issues

Selecting Performance Characterization Tools

Monitoring the I/O Utilization

Analyzing Memory Bandwidth

Summary

References

Chapter 5: Addressing Application Bottlenecks: Distributed Memory

Algorithm for Optimizing MPI Performance

Comprehending the Underlying MPI Performance

Recalling Some Benchmarking Basics

Gauging Default Intranode Communication Performance

Gauging Default Internode Communication Performance

Discovering Default Process Layout and Pinning Details

Gauging Physical Core Performance

Doing Initial Performance Analysis

Is It Worth the Trouble?

Getting an Overview of Scalability and Performance

Learning Application Behavior

Choosing Representative Workload(s)

Balancing Process and Thread Parallelism

Doing a Scalability Review

Analyzing the Details of the Application Behavior

Choosing the Optimization Objective

Dealing with Load Imbalance

Classifying Load Imbalance

Addressing Load Imbalance

Optimizing MPI Performance

Classifying the MPI Performance Issues

Addressing MPI Performance Issues

Mapping Application onto the Platform

Tuning the Intel MPI Library

Optimizing Application for Intel MPI

Using Advanced Analysis Techniques

Summary

References

Chapter 6: Addressing Application Bottlenecks: Shared Memory

Profiling Your Application

Using VTune Amplifier XE for Hotspots Profiling

Hotspots for the HPCG Benchmark

Compiler-Assisted Loop/Function Profiling

Sequential Code and Detecting Load Imbalances

Thread Synchronization and Locking

Dealing with Memory Locality and NUMA Effects

Thread and Process Pinning

Controlling OpenMP Thread Placement

Thread Placement in Hybrid Applications

Summary

References

Chapter 7: Addressing Application Bottlenecks: Microarchitecture

Overview of a Modern Processor Pipeline

Pipelined Execution

Out-of-order vs. In-order Execution

Superscalar Pipelines

SIMD Execution

Speculative Execution: Branch Prediction

Memory Subsystem

Putting It All Together: A Final Look at the Sandy Bridge Pipeline

A Top-down Method for Categorizing the Pipeline Performance

Intel Composer XE Usage for Microarchitecture Optimizations

Basic Compiler Usage and Optimization

Using Optimization and Vectorization Reports to Read the Compiler's Mind

Optimizing for Vectorization

Dealing with Disambiguation

Dealing with Branches

When Optimization Leads to Wrong Results

Analyzing Pipeline Performance with Intel VTune Amplifier XE

Summary

References

Chapter 8: Application Design Considerations

Abstraction and Generalization of the Platform Architecture

Types of Abstractions

Levels of Abstraction and Complexities

Raw Hardware vs. Virtualized Hardware in the Cloud

Questions about Application Design

Designing for Performance and Scaling

Designing for Flexibility and Performance Portability

Understanding Bounds and Projecting Bottlenecks

Data Storage or Transfer vs. Recalculation

Total Productivity Assessment

Summary

References

Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: May, 20, 2018

Performance Basics: Scalability

How it started in HPC (High Performance Computing)

The coming of distributed systems to the Enterprise

The modern notion of Scalability for Enterprise applications

NEWS CONTENTS

Old News ;-)

[Oct 24, 2017] The combine law of Parkinson-Murphy

my.safaribooksonline.com

[Oct 23, 2017] Optimizing HPC Applications with Intel Cluster Tools

Oct 23, 2017 | my.safaribooksonline.com

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Etc