In a traditional sense that we will use here virtualization is the simulation of the hardware upon which other software runs.
This simulated hardware environment is called a virtual machine (VM). Classic form of virtualization, known as operating system
virtualization, provides the ability to multiple instances of OS on the same physical computer under the direction of a special
layer of software called hypervisor. There are several forms of virtualization, distinguished primarily by the hypervisor architecture.
Each such virtual instance (or guest) OS thinks that is running on a real hardware with full access to the address space but in reality
is operating in a separate VM container which maps this address space into segment of address space of the physical computer.
this operation is called address translation. Guest OS can be unmodified (so-called heavy-weight virtualization) or specifically
recompiled for the hypervisor API (para-virtualization). In light-weight virtualization a single OS instance presents itself
as multiple personalities (called jails or zones), allowing high level of isolation of applications from each other at a very low overhead.
There is entirely different type of virtualization often called application virtualization. The latter provides
a virtual instruction set and virtual implementation of the application programming interface (API) that a running application expects
to use, allowing writing compilers that compile into this vitual instruction set. Along with huge synergy it can permits applications
developed for one platform to run on another without modifying the application itself. The Java Virtual Machine (JVM) and, in
more limited way, Microsoft .Net are two prominent examples of this type of virtualization. This type acts as an intermediary
between the application code, the operating system (OS) API and instruction set of the computer. We will not discuss it here.
Virtualization was pioneered by IBM in early 1960th with its ground breaking VM/CMS. It is still superior to many existing VMs, as
it handle virtual memory management for hosts (on hosted OS, virtual memory management layer should be disabled as it provides nothing
but additional overhead). Also IBM mainframe hardware was the first virtualization friendly hardware (IBM
and HP virtualization):
Contrary to what many PC VMWARE techies believe, virtualization technology did not start with VMWARE back in 1999. It was pioneered
by IBM more than 40 years ago. It all started with the IBM mainframe back in the 1960s, with CP-40, an operating system which was
geared for the System/360 Mainframe. In 1967, the first hypervisor was developed and the second version of IBM's hypervisor (CP-67)
was developed in 1968, which enabled memory sharing across virtual machines, providing each user his or her own memory space. A hypervisor
is a type of software that allows multiple operating systems to share a single hardware host. This version was used for consolidation
of physical hardware and to more quickly deploy environments, such as development environments. In the 1970s, IBM continued to improve
on their technology, allowing you to run MVS, along with other operating systems, including UNIX on the VM/370. In 1997, some of
the same folks who were involved in creating virtualization on the mainframe were transitioned towards creating a hypervisor on IBM's
midrange platform.
One critical element that IBM's hypervisor has is the fact that virtualization is part of the system's
firmware itself, unlike other hypervisor-based solutions. This is because of the very tight integration between the
OS, the hardware, and the hypervisor, which is the systems software that sits between the OS and hardware that provides for the virtualization.
In 2001, after a four-year period of design and development, IBM released its hypervisor for its midrange
UNIX systems, allowing for logical partitioning. Advanced Power Virtualization (APV) shipped in 2004, which was IBM's first real
virtualization solution and allowed for sharing of resources. It was rebranded in 2008 to PowerVM.
As Intel CPUs became dominant in enterprise virtualization technologies invented for other CPUs were gradually reinvented for Intel.
In 1998 VMware built VMwae Workstation, which ran on a regular Intel CPU despite the fact that at this time Intel CPUs did not directly
supported virtualization extensions. The first mass deployment of virtualization on Intel Platform was not on servers but for "legacy
desktop applications" for Windows 98 when organization started moving to Windows 2000 and then Windows XP.
Advantages of virtualization
Let's discuss some of the advantages of virtualization:
Server consolidation: It is well understood that virtualization helps in saving power
and having a smaller energy footprint. Server consolidation with virtualization will also reduce the overall footprint of the entire
data center. Virtualization reduces the number of physical or bare metal servers, reducing networking stack components and other
physical components, such as racks. Ultimately, this leads to reduced floor space, power savings, and so on. This can save you more
money and also help with energy utilization. Does it also ensure increased hardware utilization? Yes, it does. We can provision virtual
machines with the exact amount of CPU, memory, and storage resources that they need and this will in turn make sure that hardware
utilization is increased.
Service isolation: Suppose no virtualization exists; in this scenario, what's the solution
to achieve service isolation? Isn't it that we need to run one application per physical server? Yes, this can make sure that we achieve
service isolation; however, will it not cause physical server sprawl, underutilized servers, and increased costs? Without any doubt,
I can say that it does. The server virtualization helps application isolation and also removes application compatibility issues by
consolidating many of these virtual machines across fewer physical servers. In short, service isolation technique this brings the
advantage of simplified administration of services.
Faster server provisioning: Provisioning a bare metal system will consume some time,
even if we have some automated process in the path. But in case of virtualization, you can spawn a virtual machine from prebuilt
images (templates) or from snapshots. It's that quick, as you can imagine. Also, you really don't have to worry about physical resource
configuration, such as "network stack", which comes as a burden for physical or bare metal server provisioning.
Disaster recovery: Disaster recovery becomes really easy when you have a virtualized
data center. Virtualization allows you to take up-to-date snapshots of virtual machines. These snapshots can be quickly redeployed
so you can reach to a state where everything was working fine. Also, virtualization offers features such as online and offline VM
migration techniques so that you can always move those virtual machines elsewhere in your data center. This flexibility assists with
a better disaster recovery plan that's easier to enact and has a higher success rate.
Dynamic load balancing: Well, this depends on the policies you set. As server workloads
vary, virtualization provides the ability for virtual machines, which are overutilizing the resources of a server, to be moved (live
migration) to underutilized servers, based on the policies you set. Most of the virtualization solutions come with such policies
for the user. This dynamic load balancing creates efficient utilization of server resources.
Faster development and test environment: Think of this, if you want to test environment
in a temporary manner. It's really difficult to deploy it in physical servers, isn't it? Also, it won't be of much worth if you set
up this environment in a temporary manner. But it's really easy to set up a development or test environment with virtualization.
Using a guest operating system/VM enables rapid deployment by isolating the application in a known and controlled environment. It
also eliminates lots of unknown factors, such as mixed libraries, caused by numerous installs. Especially, if it's a development
or test environment, we can expect severe crashes due to the experiments happening with the setup. It then requires hours of reinstallation,
if we are on physical or bare metal servers. However, in case of VMs, it's all about simply copying a virtual image and trying again.
Improved system reliability and security: A virtualization solution adds a layer of abstraction
between the virtual machine and the underlying physical hardware. It's common for data on your physical hard disk to get corrupted
due to some reason and affect the entire server. However, if it is stored in a virtual machine hard disk, the physical hard disk
in the host system will be intact, and there's no need to worry about replacing the virtual hard disk. In any other instance, virtualization
can prevent system crashes due to memory corruption caused by software such as the device drivers. The admin has the privilege to
configure virtual machines in an independent and isolated environment. This sandbox deployment of virtual machines can give more
security to the infrastructure because the admin has the flexibility to choose the configuration that is best suited for this setup.
If the admin decides that a particular VM doesn't need access to the Internet or to other production networks, the virtual machine
can be easily configured behind the network hop with a completely isolated network configuration and restrict the access to the rest
of the world. This helps reduce risks caused by the infection of a single system that then affects numerous production computers
or virtual machines.
OS independence or a reduced hardware vendor lock-in: Virtualization is all about creating
an abstraction layer between the underlying hardware and presenting a virtual hardware to the guest operating systems running on
top of the stack. Virtualization eliminates the hardware vendor lock-in, doesn't it? That being said, with virtualization the setup
has to be tied down to one particular vendor/platform/server, especially when the virtual machines don't really care about the hardware
they run on. Thus, data center admins have a lot more flexibility when it comes to the server equipment they can choose from. In
short, the advantage of virtualization technology is its hardware independence and encapsulation. These features enhance availability
and business continuity. One of the nice things about virtualization is the abstraction between software and hardware.
A complex instruction set computer (CISC/ˈsɪsk/) is a computer in which single
instructions can execute
several low-level operations (such as a load from
memory, an
arithmeticoperation,
and a memory store)
or are capable of multi-step operations or addressing
modes within single instructions. The term was retroactively coined in contrast to
reduced instruction set computer (RISC)[1][2]
and has therefore become something of an
umbrella term for everything
that is not RISC, from large and complex mainframe
computers to simplistic microcontrollers where memory load and store operations are not separated from arithmetic instructions.
A modern RISC processor can therefore be much more complex than, say, a modern microcontroller using a CISC-labeled
instruction set, especially
in the complexity of its electronic circuits, but also in the number of instructions or the complexity of their encoding patterns.
The only typical differentiating characteristic is that most RISC designs use uniform instruction length for almost all instructions,
and employ strictly separate load/store-instructions.
Examples of instruction set architectures that have been retroactively labeled CISC are
System/360 through
z/Architecture, the
PDP-11 and
VAX architectures,
Data General Nova and many others. Well known
microprocessors and microcontrollers that have also been labeled CISC in many academic publications include the
Motorola 6800,
6809 and
68000-families; the Intel
8080,
iAPX432 and
x86-family; the Zilog
Z80,
Z8 and
Z8000-families; the National Semiconductor
32016 and
NS320xx-line; the MOS Technology
6502-family; the Intel
8051-family; and others.
Some designs have been regarded as borderline cases by some writers. For instance, the Microchip Technology
PIC has been labeled
RISC in some circles and CISC in others. The 6502
and 6809 have both been described as "RISC-like",
although they have complex addressing modes as well as arithmetic instructions that operate on memory, contrary to the RISC-principles.
A central processing unit (CPU), also called a central processor or main processor, is the
electronic circuitry within a
computer that carries out the
instructions
of a computer program by performing the basic
arithmetic, logic, controlling, and
input/output (I/O) operations specified by the instructions.
The computer industry has used the term "central processing unit" at least since the early 1960s.[1]
Traditionally, the term "CPU" refers to a
processor, more specifically to its
processing unit and control unit (CU), distinguishing
these core elements of a computer from external components such as
main memory and
I/O circuitry.[2]
The form, design, and implementation
of CPUs have changed over the course of their history, but their fundamental operation remains almost unchanged. Principal components
of a CPU include the arithmetic logic
unit (ALU) that performs arithmetic and
logic operations,
processor registers that supply
operands to the ALU and store the results of ALU operations and
a control unit that orchestrates the fetching (from memory) and execution of instructions by directing the coordinated operations
of the ALU, registers and other components.
Most modern CPUs are microprocessors, meaning
they are contained on a single integrated circuit
(IC) chip. An IC that contains a CPU may also contain memory,
peripheral interfaces, and other components of a computer;
such integrated devices are variously called microcontrollers
or systems on a chip (SoC). Some computers
employ a multi-core processor, which
is a single chip containing two or more CPUs called "cores"; in that context, one can speak of such single chips as
"sockets".[3]
Array processors or vector processors have
multiple processors that operate in parallel, with no unit considered central. There also exists the concept of
virtual CPUs which
are an abstraction of dynamical aggregated computational resources.[4]
The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored
instructions
that is called a program. The instructions to be executed are kept in some kind of
computer memory. Nearly
all CPUs follow the fetch, decode and execute steps in their operation, which are collectively known as the
instruction cycle.
After the execution of an instruction, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence
instruction because of the incremented value in the
program counter. If a jump instruction was executed,
the program counter will be modified to contain the address of the instruction that was jumped to and program execution continues
normally. In more complex CPUs, multiple instructions can be fetched, decoded and executed simultaneously. This section describes
what is generally referred to as the "classic
RISC pipeline", which is quite common among the simple CPUs used in many electronic devices (often called microcontroller). It
largely ignores the important role of CPU cache, and therefore
the access stage of the pipeline.
Some instructions manipulate the program counter rather than producing result data directly; such instructions are generally called
"jumps" and facilitate program behavior like loops,
conditional program execution (through the use of a conditional jump), and existence of
functions.[c]
In some processors, some other instructions change the state of bits in a
"flags" register. These flags can be used to
influence how a program behaves, since they often indicate the outcome of various operations. For example, in such processors a "compare"
instruction evaluates two values and sets or clears bits in the flags register to indicate which one is greater or whether they are
equal; one of these flags could then be used by a later jump instruction to determine program flow.
The first step, fetch, involves retrieving an
instruction
(which is represented by a number or sequence of numbers) from program memory. The instruction's location (address) in program memory
is determined by a program counter (PC), which stores a number that identifies the address of the next instruction to be fetched.
After an instruction is fetched, the PC is incremented by the length of the instruction so that it will contain the address of the
next instruction in the sequence.[d]
Often, the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the
instruction to be returned. This issue is largely addressed in modern processors by caches and pipeline architectures (see below).
The instruction that the CPU fetches from memory determines what the CPU will do. In the decode step, performed by the circuitry
known as the instruction decoder, the instruction is converted into signals that control other parts of the CPU.
The way in which the instruction is interpreted is defined by the CPU's instruction set architecture (ISA).[e]
Often, one group of bits (that is, a "field") within the instruction, called the opcode, indicates which operation is to be performed,
while the remaining fields usually provide supplemental information required for the operation, such as the operands. Those operands
may be specified as a constant value (called an immediate value), or as the location of a value that may be a
processor register or a memory address,
as determined by some addressing mode.
In some CPU designs the instruction decoder is implemented as a hardwired, unchangeable circuit. In others, a
microprogram is used to translate
instructions into sets of CPU configuration signals that are applied sequentially over multiple clock pulses. In some cases the memory
that stores the microprogram is rewritable, making it possible to change the way in which the CPU decodes instructions.
After the fetch and decode steps, the execute step is performed. Depending on the CPU architecture, this may consist of a single
action or a sequence of actions. During each action, various parts of the CPU are electrically connected so they can perform all
or part of the desired operation and then the action is completed, typically in response to a clock pulse. Very often the results
are written to an internal CPU register for quick access by subsequent instructions. In other cases results may be written to slower,
but less expensive and higher capacity
main memory.
For example, if an addition instruction is to be executed, the
arithmetic logic unit (ALU) inputs
are connected to a pair of operand sources (numbers to be summed), the ALU is configured to perform an addition operation so that
the sum of its operand inputs will appear at its output, and the ALU output is connected to storage (e.g., a register or memory)
that will receive the sum. When the clock pulse occurs, the sum will be transferred to storage and, if the resulting sum is too large
(i.e., it is larger than the ALU's output word size), an arithmetic overflow flag will be set.
Block diagram of a basic uniprocessor-CPU computer. Black lines indicate data flow, whereas red lines indicate control flow; arrows
indicate flow directions.
Hardwired into a CPU's circuitry is a set of basic operations it can perform, called an
instruction set. Such operations
may involve, for example, adding or subtracting two numbers, comparing two numbers, or jumping to a different part of a program.
Each basic operation is represented by a particular combination of bits,
known as the machine language opcode; while executing instructions
in a machine language program, the CPU decides which operation to perform by "decoding" the opcode. A complete machine language instruction
consists of an opcode and, in many cases, additional bits that specify arguments for the operation (for example, the numbers to be
summed in the case of an addition operation). Going up the complexity scale, a machine language program is a collection of machine
language instructions that the CPU executes.
The actual mathematical operation for each instruction is performed by a
combinational logic circuit within the
CPU's processor known as the arithmetic
logic unit or ALU. In general, a CPU executes an instruction by fetching it from memory, using its ALU to perform an operation,
and then storing the result to memory. Beside the instructions for integer mathematics and logic operations, various other machine
instructions exist, such as those for loading data from memory and storing it back, branching operations, and mathematical operations
on floating-point numbers performed by the CPU's
floating-point unit (FPU).[59]
The control unit of the
CPU contains circuitry that uses electrical signals to direct the entire computer system to carry out stored program instructions.
The control unit does not execute program instructions; rather, it directs other parts of the system to do so. The control unit communicates
with both the ALU and memory.
Symbolic representation of an ALU and its input and output signals
The arithmetic logic unit (ALU) is a digital circuit within
the processor that performs integer arithmetic and
bitwise logic operations. The
inputs to the ALU are the data words to be operated on (called
operands), status information from previous
operations, and a code from the control unit indicating which operation to perform. Depending on the instruction being executed,
the operands may come from internal CPU registers
or external memory, or they may be constants generated by the ALU itself.
When all input signals have settled and propagated through the ALU circuitry, the result of the performed operation appears at
the ALU's outputs. The result consists of both a data word, which may be stored in a register or memory, and status information that
is typically stored in a special, internal CPU register reserved for this purpose.
Most high-end microprocessors (in desktop, laptop, server computers) have a memory management unit, translating logical addresses
into physical RAM addresses, providing memory
protection and paging abilities, useful for
virtual memory. Simpler processors, especially
microcontrollers, usually don't include an MMU.
Most CPUs are
synchronous circuits, which means they
employ a clock signal to pace their sequential operations.
The clock signal is produced by an external
oscillator circuit that generates
a consistent number of pulses each second in the form of a periodic
square wave. The frequency of the clock pulses determines
the rate at which a CPU executes instructions and, consequently, the faster the clock, the more instructions the CPU will execute
each second.
To ensure proper operation of the CPU, the clock period is longer than the maximum time needed for all signals to propagate (move)
through the CPU. In setting the clock period to a value well above the worst-case
propagation delay, it is possible to design
the entire CPU and the way it moves data around the "edges" of the rising and falling clock signal. This has the advantage of simplifying
the CPU significantly, both from a design perspective and a component-count perspective. However, it also carries the disadvantage
that the entire CPU must wait on its slowest elements, even though some portions of it are much faster. This limitation has largely
been compensated for by various methods of increasing CPU parallelism (see below).
However, architectural improvements alone do not solve all of the drawbacks of globally synchronous CPUs. For example, a clock
signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult
to keep the clock signal in phase (synchronized) throughout the entire unit. This has led many modern CPUs to require multiple identical
clock signals to be provided to avoid delaying a single signal significantly enough to cause the CPU to malfunction. Another major
issue, as clock rates increase dramatically, is the amount of heat that is
dissipated by the CPU. The constantly
changing clock causes many components to switch regardless of whether they are being used at that time. In general, a component that
is switching uses more energy than an element in a static state. Therefore, as clock rate increases, so does energy consumption,
causing the CPU to require more
heat dissipation in the
form of CPU cooling solutions.
One method of dealing with the switching of unneeded components is called
clock gating, which involves turning off the clock
signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore
does not see common usage outside of very low-power designs. One notable recent CPU design that uses extensive clock gating is the
IBM PowerPC-based
Xenon used in the
Xbox 360; that way, power requirements of the Xbox 360 are
greatly reduced.[60]
Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While
removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs
carry marked advantages in power consumption and
heat dissipation in comparison
with similar synchronous designs. While somewhat uncommon, entire
asynchronous CPUs
have been built without using a global clock signal. Two notable examples of this are the
ARM compliant
AMULET and the
MIPS R3000 compatible MiniMIPS.
Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous, such
as using asynchronous ALUs in conjunction
with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous
designs can perform at a comparable or better level than their synchronous counterparts, it is evident that they do at least excel
in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very
suitable for embedded computers.[61]
Every CPU represents numerical values in a specific way. For example, some early digital computers represented numbers as familiar
decimal (base 10)
numeral system values, and others have employed
more unusual representations such as ternary
(base three). Nearly all modern CPUs represent numbers in
binary form,
with each digit being represented by some two-valued physical quantity such as a "high" or "low"
voltage.[f]
A six-bit word containing the binary encoded representation of decimal value 40. Most modern CPUs employ word sizes that are a power
of two, for example 8, 16, 32 or 64 bits.
Related to numeric representation is the size and precision of integer numbers that a
CPU can represent. In the case of a binary CPU, this is measured by the number of bits (significant digits of a binary encoded integer)
that the CPU can process in one operation, which is commonly called
word size, bit
width, data path width, integer precision, or integer size. A CPU's integer size determines the range of
integer values it can directly operate on.[g]
For example, an 8-bit CPU can directly manipulate integers represented
by eight bits, which have a range of 256 (28) discrete integer values.
Integer range can also affect the number of memory locations the CPU can directly address (an address is an integer value representing
a specific memory location). For example, if a binary CPU uses 32 bits to represent a memory address then it can directly address
232 memory locations. To circumvent this limitation and for various other reasons, some CPUs use mechanisms (such as
bank switching) that allow additional memory to
be addressed.
CPUs with larger word sizes require more circuitry and consequently are physically larger, cost more and consume more power (and
therefore generate more heat). As a result, smaller 4- or 8-bit
microcontrollers are commonly used in modern
applications even though CPUs with much larger word sizes (such as 16, 32, 64, even 128-bit) are available. When higher performance
is required, however, the benefits of a larger word size (larger data ranges and address spaces) may outweigh the disadvantages.
A CPU can have internal data paths shorter than the word size to reduce size and cost. For example, even though the
IBM System/360instruction set was a 32-bit
instruction set, the System/360 Model
30 and Model 40 had 8-bit
data paths in the arithmetic logical unit, so that a 32-bit add required four cycles, one for each 8 bits of the operands, and, even
though the Motorola 68000 series
instruction set was a 32-bit instruction set, the Motorola
68000 and Motorola 68010 had 16-bit data paths
in the arithmetic logical unit, so that a 32-bit add required two cycles.
To gain some of the advantages afforded by both lower and higher bit lengths, many
instruction sets have different
bit widths for integer and floating-point data, allowing CPUs implementing that instruction set to have different bit widths for
different portions of the device. For example, the IBM
System/360 instruction set was primarily
32 bit, but supported 64-bit floating
point values to facilitate greater accuracy and range in floating point numbers.[29]
The System/360 Model 65 had an 8-bit adder for decimal and fixed-point binary arithmetic and a 60-bit adder for floating-point arithmetic.[62]
Many later CPU designs use similar mixed bit width, especially when the processor is meant for general-purpose usage where a reasonable
balance of integer and floating point capability is required.
Model of a subscalar CPU, in which it takes fifteen clock cycles to complete three instructions
The description of the basic operation
of a CPU offered in the previous section describes the simplest form that a CPU can take. This type of CPU, usually referred to as
subscalar, operates on and executes one instruction on one or two pieces of data at a time, that is less than one
instruction per clock cycle (IPC
< 1).
This process gives rise to an inherent inefficiency in subscalar CPUs. Since only one instruction is executed at a time, the entire
CPU must wait for that instruction to complete before proceeding to the next instruction. As a result, the subscalar CPU gets "hung
up" on instructions which take more than one clock cycle to complete execution. Even adding a second
execution unit (see below) does not improve performance
much; rather than one pathway being hung up, now two pathways are hung up and the number of unused transistors is increased. This
design, wherein the CPU's execution resources can operate on only one instruction at a time, can only possibly reach scalar
performance (one instruction per clock cycle, IPC = 1). However, the performance is nearly always subscalar (less than one instruction
per clock cycle, IPC < 1).
Attempts to achieve scalar and better performance have resulted in a variety of design methodologies that cause the CPU to behave
less linearly and more in parallel. When referring to parallelism in CPUs, two terms are generally used to classify these design
techniques:
instruction-level
parallelism (ILP), which seeks to increase the rate at which instructions are executed within a CPU (that is, to increase
the use of on-die execution resources);
Each methodology differs both in the ways in which they are implemented, as well as the relative effectiveness they afford in
increasing the CPU's performance for an application.[h]
Basic five-stage pipeline. In the best case scenario, this pipeline can sustain a completion rate of one instruction per clock cycle.
One of the simplest methods used to accomplish increased parallelism is to begin the first steps of instruction fetching and decoding
before the prior instruction finishes executing. This is the simplest form of a technique known as
instruction pipelining, and is
used in almost all modern general-purpose CPUs. Pipelining allows more than one instruction to be executed at any given time by breaking
down the execution pathway into discrete stages. This separation can be compared to an assembly line, in which an instruction is
made more complete at each stage until it exits the execution pipeline and is retired.
Pipelining does, however, introduce the possibility for a situation where the result of the previous operation is needed to complete
the next operation; a condition often termed data dependency conflict. To cope with this, additional care must be taken to check
for these sorts of conditions and delay a portion of the
instruction pipeline
if this occurs. Naturally, accomplishing this requires additional circuitry, so pipelined processors are more complex than subscalar
ones (though not very significantly so). A pipelined processor can become very nearly scalar, inhibited only by pipeline stalls (an
instruction spending more than one clock cycle in a stage).
A simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per clock cycle
can be completed.
Further improvement upon the idea of instruction pipelining led to the development of a method that decreases
the idle time of CPU components even further. Designs that are said to be superscalar include a long instruction pipeline
and multiple identical execution units, such as
load-store units,
arithmetic-logic
units, floating-point units and
address generation units.[63]
In a superscalar pipeline, multiple instructions are read and passed to a dispatcher, which decides whether or not the instructions
can be executed in parallel (simultaneously). If so they are dispatched to available execution units, resulting in the ability for
several instructions to be executed simultaneously. In general, the more instructions a superscalar CPU is able to dispatch simultaneously
to waiting execution units, the more instructions will be completed in a given cycle.
Most of the difficulty in the design of a superscalar CPU architecture lies in creating an effective dispatcher. The dispatcher
needs to be able to quickly and correctly determine whether instructions can be executed in parallel, as well as dispatch them in
such a way as to keep as many execution units busy as possible. This requires that the instruction pipeline is filled as often as
possible and gives rise to the need in superscalar architectures for significant amounts of
CPU cache. It also makes
hazard-avoiding
techniques like branch prediction,
speculative execution,
register renaming,
out-of-order execution and
transactional memory crucial to maintaining
high levels of performance. By attempting to predict which branch (or path) a conditional instruction will take, the CPU can minimize
the number of times that the entire pipeline must wait until a conditional instruction is completed. Speculative execution often
provides modest performance increases by executing portions of code that may not be needed after a conditional operation completes.
Out-of-order execution somewhat rearranges the order in which instructions are executed to reduce delays due to data dependencies.
Also in case of single instruction stream, multiple data stream-a case
when a lot of data from the same type has to be processed-, modern processors can disable parts of the pipeline so that when a single
instruction is executed many times, the CPU skips the fetch and decode phases and thus greatly increases performance on certain occasions,
especially in highly monotonous program engines such as video creation software and photo processing.
In the case where a portion of the CPU is superscalar and part is not, the part which is not suffers a performance penalty due
to scheduling stalls. The Intel P5Pentium had two superscalar
ALUs which could accept one instruction per clock cycle each, but its FPU could not accept one instruction per clock cycle. Thus
the P5 was integer superscalar but not floating point superscalar. Intel's successor to the P5 architecture,
P6, added superscalar capabilities
to its floating point features, and therefore afforded a significant increase in floating point instruction performance.
Both simple pipelining and superscalar design increase a CPU's ILP by allowing a single processor to complete execution of instructions
at rates surpassing one instruction per clock cycle.[i]
Most modern CPU designs are at least somewhat superscalar, and nearly all general purpose CPUs designed in the last decade are superscalar.
In later years some of the emphasis in designing high-ILP computers has been moved out of the CPU's hardware and into its software
interface, or ISA. The strategy
of the very long instruction
word (VLIW) causes some ILP to become implied directly by the software, reducing the amount of work the CPU must perform to boost
ILP and thereby reducing the design's complexity.
One technology used for this purpose was multiprocessing
(MP).[66]
The initial flavor of this technology is known as
symmetric multiprocessing
(SMP), where a small number of CPUs share a coherent view of their memory system. In this scheme, each CPU has additional hardware
to maintain a constantly up-to-date view of memory. By avoiding stale views of memory, the CPUs can cooperate on the same program
and programs can migrate from one CPU to another. To increase the number of cooperating CPUs beyond a handful, schemes such as
non-uniform memory access
(NUMA) and
directory-based coherence protocols were introduced in the 1990s. SMP systems are limited to a small number of CPUs while NUMA
systems have been built with thousands of processors. Initially, multiprocessing was built using multiple discrete CPUs and boards
to implement the interconnect between the processors. When the processors and their interconnect are all implemented on a single
chip, the technology is known as chip-level multiprocessing (CMP) and the single chip as a
multi-core processor.
It was later recognized that finer-grain parallelism existed with a single program. A single program might have several threads
(or functions) that could be executed separately or in parallel. Some of the earliest examples of this technology implemented
input/output processing such as
direct memory access as a separate
thread from the computation thread. A more general approach to this technology was introduced in the 1970s when systems were designed
to run multiple computation threads in parallel. This technology is known as
multi-threading
(MT). This approach is considered more cost-effective than multiprocessing, as only a small number of components within a CPU is
replicated to support MT as opposed to the entire CPU in the case of MP. In MT, the execution units and the memory system including
the caches are shared among multiple threads. The downside of MT is that the hardware support for multithreading is more visible
to software than that of MP and thus supervisor software like operating systems have to undergo larger changes to support MT. One
type of MT that was implemented is known as
temporal multithreading, where
one thread is executed until it is stalled waiting for data to return from external memory. In this scheme, the CPU would then quickly
context switch to another thread which is ready to run, the switch often done in one CPU clock cycle, such as the
UltraSPARC T1. Another type of MT is
simultaneous multithreading,
where instructions from multiple threads are executed in parallel within one CPU clock cycle.
For several decades from the 1970s to early 2000s, the focus in designing high performance general purpose CPUs was largely on
achieving high ILP through technologies such as pipelining, caches, superscalar execution, out-of-order execution, etc. This trend
culminated in large, power-hungry CPUs such as the Intel Pentium
4. By the early 2000s, CPU designers were thwarted from achieving higher performance from ILP techniques due to the growing disparity
between CPU operating frequencies and main memory operating frequencies as well as escalating CPU power dissipation owing to more
esoteric ILP techniques.
CPU designers then borrowed ideas from commercial computing markets such as
transaction processing, where the
aggregate performance of multiple programs, also known as throughput
computing, was more important than the performance of a single thread or process.
This reversal of emphasis is evidenced by the proliferation of dual and more core processor designs and notably, Intel's newer
designs resembling its less superscalar
P6 architecture. Late designs in
several processor families exhibit CMP, including the x86-64Opteron and
Athlon 64 X2, the
SPARCUltraSPARC T1, IBM
POWER4 and
POWER5, as well as several
video game console CPUs like the
Xbox 360's triple-core PowerPC design, and the
PlayStation 3's 7-core
Cell microprocessor.
A less common but increasingly important paradigm of processors
(and indeed, computing in general) deals with data parallelism. The processors discussed earlier are all referred to as some type
of scalar device.[j]
As the name implies, vector processors deal with multiple pieces of data in the context of one instruction. This contrasts with scalar
processors, which deal with one piece of data for every instruction. Using
Flynn's taxonomy, these two schemes of dealing
with data are generally referred to as single instruction stream, multiple
data stream (SIMD) and single instruction stream, single data stream
(SISD), respectively. The great utility in creating processors that deal with vectors of data lies in optimizing tasks that tend
to require the same operation (for example, a sum or a dot
product) to be performed on a large set of data. Some classic examples of these types of tasks include
multimedia applications (images, video and sound), as well
as many types of scientific
and engineering tasks. Whereas a scalar processor must complete the entire process of fetching, decoding and executing each instruction
and value in a set of data, a vector processor can perform a single operation on a comparatively large set of data with one instruction.
This is only possible when the application tends to require many steps which apply one operation to a large set of data.
Most early vector processors, such as the Cray-1, were associated
almost exclusively with scientific research and cryptography
applications. However, as multimedia has largely shifted to digital media, the need for some form of SIMD in general-purpose processors
has become significant. Shortly after inclusion of
floating-point units started to become
commonplace in general-purpose processors, specifications for and implementations of SIMD execution units also began to appear for
general-purpose processors.[when?]
Some of these early SIMD specifications - like HP's
Multimedia
Acceleration eXtensions (MAX) and Intel's
MMX - were integer-only. This proved
to be a significant impediment for some software developers, since many of the applications that benefit from SIMD primarily deal
with floating-point numbers.
Progressively, developers refined and remade these early designs into some of the common modern SIMD specifications, which are usually
associated with one ISA. Some notable modern examples include Intel's
SSE and the PowerPC-related
AltiVec (also known as VMX).[k]
Cloud computing can involve subdividing CPU
operation into virtual central processing units[
(vCPUs).
A host is the virtual equivalent of a physical machine, on which a virtual system is operating. When there are several physical
machines operating in tandem and managed as a whole, the grouped computing and memory resources form a
cluster. In some systems, it is possible to
dynamically add and remove from a cluster. Resources available at a host and cluster level can be partitioned out into
resources pools with fine
granularity.
Emulaters
Emulation are the aldest type of vuritaion machines.
Gates and Allen set out to write some software that would make it possible for hobbyists to create their own programs on the Altair.
Specifically, they decided to write an interpreter for the programming language known as BASIC that would run on the Altair's Intel
8080 microprocessor. It would become the first commercial native high-level programming language for a microprocessor. In other words,
it would launch the personal computer software industry.
They wrote a letter to MITS, the fledgling Albuquerque company that made the Altair, claiming that they had created a BASIC language
interpreter that could run on the 8080. "We are interested in selling copies of this software to hobbyists through you." In reality,
they did not yet have any software. But they knew they could scramble and write it if MITS expressed interest.
When they did not hear back, they decided to call. Gates suggested that Allen place the call, because he was older. "No, you should
do it; you're better at this kind of thing," Allen argued. They came up with a compromise: Gates would call, disguising his squeaky
voice, but he would use the name Paul Allen, because they knew it would be Allen who would fly out to Albuquerque if they got lucky.
"I had my beard going and at least looked like an adult, while Bill still could pass for a high school sophomore," recalled Allen.
When the founder of MITS, Ed Roberts, answered the phone, Gates put on a deep voice and said, "This is Paul Allen in Boston. We've
got a BASIC for the Altair that's just about finished, and we'd like to come out and show it to you." Roberts replied that he had
gotten many such calls. The first person to walk through his door in Albuquerque with a working BASIC would get the contract. Gates
turned to Allen and exulted, "God, we gotta get going on this!'"
Because they did not have an Altair to work on, Allen had to emulate one on the PDP-10 mainframe at the Aiken Lab. So they
bought a manual for the 8080 microprocessor and within weeks Allen had the simulator and other development tools ready.
Meanwhile, Gates was furiously writing the BASIC interpreter code on yellow legal pads. "I can still see him alternately pacing
and rocking for long periods before jotting on a yellow legal pad, his fingers stained from a rainbow of felt-tip pens," Allen recalled.
"Once my simulator was in place and he was able to use the PDP-10, Bill moved to a terminal and peered at his legal pad as he rocked.
Then he'd type a flurry of code with those strange hand positions of his, and repeat. He could go like that for hours at a stretch."
One night they were having dinner at Currier House, sitting at the table with the other math geeks, and they began complaining
about facing the tedious task of writing the floating-point math routines, which would give the program the ability to deal with
both very small and very large numbers in scientific notation. A curly-haired kid from Milwaukee named Monte Davidoff piped up, "I've
written those types of routines." It was the benefit of being at Harvard. Gates and Allen began peppering him with questions about
his capacity to handle floating-point code. Satisfied they knew what he was talking about, they brought him to Gates's room and negotiated
a fee of $400 for his work. He became the third member of the team, and would eventually earn a lot more.
Gates ignored the exam cramming he was supposed to be doing and even stopped playing poker. For eight weeks, he, Allen, and Davidoff
holed up day and night at the Aiken lab making history. Occasionally they would break for dinner at Harvard House of Pizza or at
Aku Aku, an ersatz Polynesian restaurant. In the wee hours of the morning, Gates would sometimes fall asleep at the terminal. "He'd
be in the middle of a line of code when he'd gradually tilt forward until his nose touched the keyboard," Allen said. "After dozing
an hour or two, he'd open his eyes, squint at the screen, blink twice, and resume precisely where he'd left off-a prodigious feat
of concentration."
They would scribble away at their notepads, competing to see who could execute a subroutine in the fewest lines. "I can do it
in nine," one would shout. Another would shoot back, "Well, I can do it in five!" As Allen noted, "We knew that each byte saved would
leave that much more room for users to add to their applications." The goal was to get the program into less than the 4K of memory
that an enhanced Altair would have, so there would be a little room left over for the consumer to use. (A 16GB smartphone has four
million times that memory.) At night they would fan out the printouts onto the floor and search for ways to make it more elegant
and compact. By late February 1975, after eight weeks of intense coding, they got it down, brilliantly, into 3.2K. "It wasn't a question
of whether I could write the program, but rather a question of whether I could squeeze it into under 4K and make it super fast,"
said Gates. "It was the coolest program I ever wrote." Gates checked it for errors one last time, then commanded the Aiken lab's
PDP-10 to spew out a punch-tape of it so Allen could take it down to Albuquerque.
The idea of protection rings
Before we discuss virtualization further and dive into the next type of virtualization, (hypervisor-based/software virtualization)
it would be useful to be aware of some jargon in computer science. That being said, let's start with something called "protection rings".
In computer science, various hierarchical protection domains/privileged rings exist. These are the mechanisms that protect data or faults
based on the security enforced when accessing the resources in a computer system. These protection domains contribute to the security
of a computer system.
As shown in the preceding figure, the protection rings are numbered from the most privileged to the least privileged. Ring 0 is the
level with the most privileges and it interacts directly with physical hardware, such as the CPU and memory.
The resources, such as memory, I/O ports, and CPU instructions are protected via these privileged rings. Ring 1 and 2 are mostly
unused. Most of the general purpose systems use only two rings, even if the hardware they run on provides more CPU modes (https://en.m.wikipedia.org/wiki/CPU_modes)
than that. The main two CPU modes are the kernel mode and user mode. From an operating system's point of view, Ring 0 is called the
kernel mode/supervisor mode and Ring 3 is the user mode. As you assumed, applications run in Ring 3.
Operating systems, such as Linux and Windows use supervisor/kernel and user mode. A user mode can do almost nothing to the outside
world without calling on the kernel or without its help, due to its restricted access to memory, CPU, and I/O ports. The kernels can
run in privileged mode, which means that they can run on ring 0. To perform specialized functions, the user mode code (all the applications
run in ring 3) must perform a system call (https://en.m.wikipedia.org/wiki/System_call)
to the supervisor mode or even to the kernel space, where a trusted code of the operating system will perform the needed task and return
the execution back to the user space. In short, the operating system runs in ring 0 in a normal environment. It needs the most privileged
level to do resource management and provide access to the hardware.
The following image explains this:
Full virtualization
Full virtualization
In full virtualization, privileged instructions are emulated to overcome the limitations
arising from the guest operating system running in ring 1 and VMM runnning in Ring 0. Full virtualization was implemented in first-generation
x86 VMMs. It relies on techniques, such as binary translation (https://en.wikipedia.org/wiki/Binary_translation)
to trap and virtualize the execution of certain sensitive and non-virtualizable instructions. This being said, in binary translation,
some system calls are interpreted and dynamically rewritten. Following diagram depicts how Guest OS access the host computer hardware
through Ring 1 for privileged instructions and how un-privileged instructions are executed without the involvement of Ring 1:
With this approach, the critical instructions are discovered (statically or dynamically
at runtime) and replaced with traps into the VMM that are to be emulated in software. A binary translation can incur a large performance
overhead in comparison to a virtual machine running on natively virtualized architectures.
However, as shown in the preceding image, when we use full virtualization we can use the unmodified guest operating systems. This
means that we don't have to alter the guest kernel to run on a VMM. When the guest kernel executes
privileged operations, the VMM provides the CPU emulation to handle and modify the protected CPU operations, but as mentioned earlier,
this causes performance overhead compared to the other mode of virtualization, called paravirtualization.
Paravirtualization
In paravirtualization, the guest operating system needs to be modified in order to allow
those instructions to access Ring 0. In other words, the operating system needs to be modified to communicate between the VMM/hypervisor
and the guest through the "backend" (hypercalls) path. We can also call VMM a hypervisor.
Paravirtualization (https://en.wikipedia.org/wiki/Paravirtualization)
is a technique in which the hypervisor provides an API and the OS of the guest virtual machine
calls that API which require host operating system modifications. Privileged instruction calls are exchanged with the API functions
provided by the VMM. In this case, the modified guest operating system can run in ring 0.
As you can see, under this technique the guest kernel is modified to run on the VMM. In other terms, the guest kernel knows that
it's been virtualized. The privileged instructions/operations that are supposed to run in ring 0 have been replaced with calls known
as hypercalls, which talk to the VMM. The hypercalls invoke the VMM to perform the task on behalf of the guest kernel. As the guest
kernel has the ability to communicate directly with the VMM via hypercalls, this technique results in greater performance compared to
full virtualization. However, This requires specialized guest kernel which is aware of para virtualization technique and come with needed
software support.
Hardware-assisted virtualization
This is the idea to obtain paravertualization benefits without need to recompile the code. Intel now support paravitualization approach
of 386-based OSes on all 64-bit CPUs. In this case the kernel in modified to replace privileged. instructions with
calls to hypervisor. That requires access to kernel source code and special compilation of kernel which is an inconvenience.
There can also be a compromised approach then binary are modified on the fly in the process of execution.
Docker represents light weight virtualization. In this case kernel for VM is the same as host machine.
This is how any 386 CPU based OS can run on 64 bit Intel hardware. It can be called domain based virtualization and was peopnned
by Sun Microsystems.
Intel and AMD realized that full virtualization and paravirtualization are the major challenges of virtualization on the x86 architecture
(as the scope of this book is limited to x86 architecture, we will mainly discuss the evolution of this architecture here) due to the
performance overhead and complexity in designing and maintaining the solution. Intel and AMD independently created new processor extensions
of the x86 architecture, called Intel VT-x and AMD-V respectively. On the Itanium architecture, hardware-assisted virtualization is
known as VT-i. Hardware assisted virtualization is a platform virtualization method designed to efficiently use full virtualization
with the hardware capabilities. Various vendors call this technology by different names, including accelerated virtualization, hardware
virtual machine, and native virtualization.
For better support of for virtualization, Intel and AMD introduced Virtualization
Technology (VT) and Secure Virtual Machine (SVM), respectively, as extensions
of the IA-32 instruction set.
These extensions allow the VMM/hypervisor to run a guest OS that expects to run in kernel mode, in lower privileged rings. Hardware
assisted virtualization not only proposes new instructions, but also introduces a new privileged access level, called ring -1, where
the hypervisor/VMM can run. Hence, guest virtual machines can run in ring 0.
With hardware-assisted virtualization, the operating system has direct access to resources without any emulation or OS modification.
The hypervisor or VMM can now run at the newly introduced privilege level, Ring -1, with the guest operating systems running on Ring
0.
Also, with hardware assisted virtualization, the VMM/hypervisor is relaxed and needs to perform less work compared to the other techniques
mentioned, which reduces the performance overhead.
In simple terms, this virtualization-aware hardware provides the support to build the VMM and also ensures the isolation of
a guest operating system. This helps to achieve better performance and avoid the complexity of designing a virtualization solution.
Modern virtualization techniques make use of this feature to provide virtualization. One example is KVM, which we are going to discuss
in detail in the scope of this book.
32-bit Suse can be run in this way on VMware on 64 bit servers.
In simple terms, this virtualization-aware hardware provides the support to build the VMM
and also ensures the isolation of a guest operating system. This helps to achieve better performance and avoid the complexity of designing
a virtualization solution. One example of hypervisor that users this approach is KVM, which we are going to discuss in detail
Type 1 and Type 2 hypervisors
As its name suggests, the VMM or hypervisor is a piece of software that is responsible for monitoring and controlling virtual machines
or guest operating systems. The hypervisor/VMM is responsible for ensuring different virtualization management tasks, such as providing
virtual hardware, VM life cycle management, migrating of VMs, allocating resources in real time, defining policies for virtual machine
management, and so on. The VMM/hypervisor is also responsible for efficiently controlling physical platform resources, such as memory
translation and I/O mapping. One of the main advantages of virtualization software is its capability to run multiple guests operating
on the same physical system or hardware. The multiple guest systems can be on the same operating system or different ones. For example,
there can be multiple Linux guest systems running as guests on the same physical system. The VMM is responsible to allocate the resources
requested by these guest operating systems. The system hardware, such as the processor, memory, and so on has to be allocated to these
guest operating systems according to their configuration, and VMM can take care of this task. Due to this, VMM is a critical component
in a virtualization environment.
Depending on the location of the VMM/hypervisor and where it's placed, it is categorized
either as type 1 or type 2.
Hypervisors are mainly categorized as either Type 1 or Type 2 hypervisors, based on where they reside in the system or, in other
terms, whether the underlying operating system is present in the system or not. But there is no clear or standard definition of Type
1 and Type 2 hypervisors. If the VMM/hypervisor runs directly on top of the hardware, its generally considered to be a Type 1 hypervisor.
If there is an operating system present, and if the VMM/hypervisor operates as a separate layer, it will be considered as a Type 2 hypervisor.
Once again, this concept is open to debate and there is no standard definition for this.
A Type 1 hypervisor directly interacts with the system hardware; it does not need any host operating system. You can directly install
it on a bare metal system and make it ready to host virtual
machines. Type 1 hypervisors are also called
Bare Metal, Embedded, or Native Hypervisors.
oVirt-node is an example of a Type 1 Linux hypervisor. The following figure provides an illustration of the Type 1 hypervisor design
concept:
Here are the advantages of Type 1 hypervisors:
Easy to install and configure
Small in size, optimized to give most of the physical resources to the hosted guest (virtual machines)
Generates less overhead, as it comes with only the applications needed to run virtual machines
More secure, because problems in one guest system do not affect
the other guest systems running on the hypervisor
However, a type 1 hypervisor doesn't favor customization. Generally, you will not be allowed
to install any third party applications or drivers on it.
On the other hand, a Type 2 hypervisor resides on top of the operating system, allowing you to do numerous customizations. Type 2
hypervisors are also known as hosted hypervisors. Type 2 hypervisors are dependent on the host operating system for their operations.
The main advantage of Type 2 hypervisors is the wide range of hardware support, because the underlying host OS is controlling hardware
access. The following figure provides an illustration of the Type 2 hypervisor design concept:
Deciding on the type of hypervisor to use mainly depends on the infrastructure of where you are going to deploy virtualization.
Also,
there is a concept that Type 1 hypervisors perform better when compared to Type 2 hypervisors, as they are placed directly on top of
the hardware. It does not make much sense to evaluate performance without a formal definition of Type 1 and Type 2 hypervisors.
Major players in on Intel architecture
There were four major players virtualization solutions for Intel-based computers, XEN, VMware, Microsoft Virtual PC and Docker.
VMwareVMware gets the most traction partially due to extremely aggressive marketing
and also as the first reliable solution for old Intel CPUs. It become visible in enterprise server space around 2005-2006. It's market
share is mainly on Windows. From the beginning it was heavy weight virtualization. For enterprises, which needed a way to run old
applications when moving to new PCs, VMware provided an alternative to providing the second desktop to the user. A user can run Windows
XP in a virtual machine while running Windows 8 or 10 as the main desktop. Microsoft now supports the latter solution with its free
VM.
Microsoft Virtual PC. Can be run on windows Professional 10 directly. Used in Microsoft cloud.
Xen can perform both full and paravitualization approach
to this problem but can do heavy metal virtualization as well. Xen derivatives drive Amazon cloud.
The following table is a list of open source virtualization projects in Linux:
VMware Workstation
Player is an ideal utility for running a single virtual machine on a Windows or Linux PC. Organizations use Workstation Player
to deliver managed corporate desktops, while students and educators use it for learning and training.
The free version is available
for non-commercial, personal and home use. We also encourage students and non-profit organizations to benefit from this offering.
Commercial organizations require commercial licenses to use Workstation Player.
Need a more advanced virtualization solution? Check out
Workstation
Pro.
For system administrators, programmers and consultants VMware desktop provided an opportunity to run linux on the same PCs as Windows.
This was very convenient for various demo and such configuration became holy grail for all types of consultants who became major promoter
of VMware and ensures its quick penetration of VMware at the enterprise level. It quickly became common solution for training as it
permits to provide to each student a set of virtual desktop and servers that would too costly to provide in physical hardware.
The other important area is experimentation: you can create set of virtual machines in no time without usual bureaucratic overhead
typical for large organizations.
More problematic area is usage of virtualization for server consolidation. VMware found here its niche in consolidating "fake"
servers -- servers that run applications with almost no users and no load. For servers with heavy computation loads blades provide much
more solid alternative with similar capabilities and cost.
Still VMware was a huge success in server space but it is difficult to say how much of it is due to advantages of virtualization
and how much due to technical incompetence of corporate IT which simply follows the current fashion.
Bridge between RAM and CPU became the bottleneck
I was actually surprised that VMware got so much traction with the exorbitant extortion level prices they charge. Pricing that is
designed almost perfectly to channel all the savings to VMware itself instead of organization that this deploying VMware hypervisor.
For me blades were more always simpler and more promising server consolidation solution with better price/performance ration. So when
companies look at virtualization as the way to cut costs they might be looking at the wrong solution. First of all you cannot defy gravity
with virtualization: you still have a single channel of access to RAM and with several OS running concurrently
the bridge between RAM and CPU became a bottleneck. Only if the application is mostly idle (for example hosts a
low traffic websites, etc) it makes sense to consolidate it. So the idea works when you consolidate small and not very loaded servers
into fewer, larger, more heavily-loaded physical servers. It also works perfectly well for development and quality servers which by
definition are mainly circulating air. For everything else your mileage may vary. For example why on earth I would put Oracle on a virtual
machine? to benefit from the ability to migrate to another server? That's fake benefit as it almost never happens in real life without
Oracle version upgrade. To provide more uniform environment for all my Oracle installations? Does it worth troubles with disk i/o that
I will get ?
So it is very important to avoid excessive zeal in implementing virtualization on enterprise environment and calculate five years
total ownership difference between various variants before jumping into the water. If overdone server consolidation via virtualization
can bring up a whole new set of complications. And other things equal one should consider cheaper alternatives to VMware like Xen, especially
for Linux servers, as again the truth about VMware is that the lion share of saving goes to VMware, not to the company that implement
it.
It is very important to avoid excessive zeal. If overdone server consolidation via virtualization
can bring up a whole new set of complications.
In short there is no free lunch. If used in moderation and with Xen instead of VMware to avoid excessive licensing costs, this new
techno fashion can help to get rid of "low load" servers as well as cut maintenance cost replacing some servers with specific applications
run by "virtual appliances". Also provisioning became really fast which is extremely important in research and lab environment. One
can get a server to experiment in 5-10 min instead of 5-10 days :-). This is a win-win situation. It is quite beneficial for environment
and for the enterprise as it opens, an Intel server that costs $35K will never be able to replace seven reasonably loaded low end servers
costing $5K each. And using separate server you do need to worry that they are not too loaded or that peak loads for different servers
happens at different time. The main competition here are blade servers. For example the cost of VMware server is approximately $5K with
annual maintenance cost of $500. If we can run just four virtual instances under it and the server cost, say $20K, while a small 1U
server capable of running one instance costs $5K (no savings on hardware due to higher margins on medium servers) you lose approximately
$1K a year per instance in comparison with using physical servers or blades. Advantages due to better maintainability are marginal (if
we assume 1U servers are identical and use kickstart and, say, Acronis images fro OS restore) and stability is lower and behavior under
simultaneous peaks is highly problematic. In other words virtualization is far from being a free lunch.
At the same time the heavy reliance on virtualized servers for production applications, as well as the task of managing and provisioning
them, are fairly new areas in the "new brave" virtualized IT world increases the importance of monitoring applications and enterprise
schedulers. In large enterprises that means additional value provided by already installed HP Operations Manager, Tivoli and other ESM
applications. Virtualization also has changed configuration management, capacity management, provisioning, patch management, back-ups,
and software licensing. It is inherently favorable toward open source software and OS solutions, where you do not pay for each core
or physical CPU on the server.
Types of virtualization
Virtualization is the simulation of the software and/or hardware upon which guest operating systems run. This simulated environment
is called a virtual machine (VM). Each instance of an OS and its applications runs in a separate VM called a guest operating
system. Those VMs are managed by the hypervisor. There are several forms of virtualization, distinguished by the architecture
of hypervisor.
In full virtualization (or emulation), an unchanged version of OS is running on top of virtual hardware. The hypervisor
provides same hardware interfaces as those provided by the hardware's physical platform. Full virtualization is often used to enable
the use of applications which run only on an older version of an OS that does not have drivers for the current hardware. It is by
definition very inefficient but there some tricks to increase efficiency. First of them is the
dynamic recompilation which compiles
blocks of machine-instructions the first time they execute replacing privileged instructions with calls to hypervisor, and then executes
this translated code. It can be called dynamic paravirtualization (see below). VMware is based on this approach and calls it "binary
translation" or BT. The translated code is stored in spare memory, typically at the end of the address space, which segmentation
mechanisms can protect and make invisible. That's why VMware operates dramatically faster than emulators, running at more than 80%
of the native speed of OS on given hardware, In one study VMware claims a slowdown over native ranging from 0 to 6% for the VMware
ESX Server. This is probably PR, but in general overhead is not prohibitive and that's why VMware is so popular. You generally should
expect 10-20% performance hit for this type of hypervisors.
Paravirtualizationruns specially
compiled, "hypervisor friendly" version of OS kernel. In such kernel all calls to privileged instructions are replaced to calls to
hypervisor. Additional changes, like disabling virtual memory mechanism, can be made as memory management should generally be relegated
to the hypervisor level. This essentially converts the guest OS into an application as it is deprives it from all direct access to
the hardware layer. Paravirtualization requires the guest operating system kernel and drivers to be explicitly ported for special
para-API provided by particular hypervisor. You can't run unmodified OS using paravirtualization based hypervisor.
As some work done during the recompilation, this is usually more efficient way to run guest OSes then full virtualization. Xen is
the most prominent example of paravirtualization solution on Intel platform. By nature of paravirtualization Xen is more efficient
then VMware can be with Linux (you can't legally recompile Windows even if you have an access to source code).
Paravirtualized kernel can provide faster access for resources such as hard drives and networks. Different types of paravirtualization
are offered by different hypervisors systems. IBM was the pioneer in this type of virtualization, creating VM/360 in 1972. POWER
servers from IBM running AIX also implement paravirtualization. VM/360 actually did more then a typical paravirtualization hypervisor
on Intel (like Xen): in VM/CMS environment guest OSes delegates all virtual memory management to the hypervisor level. The latter
is very important as it provides tremendous savings in memory (common segments of different guests can be loaded only once) and better
efficiency. In case of CMS, multitasking is also delegated to the hypervisor level. But generally it is up to the designer of paravirtualization
to decide if it handles memory allocation or not.
The performance hit can be in single digits and that makes running two guests on a single server "dual guest virtualization
( see above)" very attractive (you cut number of servers in half while getting almost native performance from each guest OS). For
that reason "dual guest virtualization" that we mentioned above is widely used for AIX servers and is one of the major attractions
for AIX.
Light weight virtualization presuppose not only that all guests are running on the same CPU, but that all guests are the
same OS of the exactly the same version. It provides mainly isolation of applications (processes). Docker and
Solaris zones are the most prominent example of this type of virtualization
and they are the most efficient virtualization solution available on the market, virtualization solution with minimal overhead. Historically
this type of virtualization was pioneered by Free BSD jails concept. In this case running multiple similar applications on
several guests can get some boost in performance (instead of penalty) as memory allocation can take into account existence of identical
applications. No amount of VMware PR about overcommitting memory (term that means that hypervisor provides its own layer of virtual
memory management, making OS-based layer redundant) cannot hide the fact the VMware solution sucks in comparison with Solaris zones
if we are talking about running applications in virtual environment that consists of copies of the same OS.
Full virtualization has some negative security implications. Virtualization adds layers of technology, which can increase the security
management burden by necessitating additional security controls. Also, combining many systems onto a single physical computer can cause
a larger impact if a security compromise occurs, especially grave if it occurs on VM level (access to VM console). Further, some virtualization
systems make it easy to share information between the systems; this convenience can turn out to be an attack vector if it is not carefully
controlled. In some cases, virtualized environments are quite dynamic, which makes creating and maintaining the necessary security boundaries
more complex.
There are two types of hypervisors:
Bare metal hypervisor, the hypervisor which runs directly on the underlying hardware, without a host OS; the hypervisor
can even be built into the computer's firmware.
Hosted hypervisor, the hypervisor runs on top of the host OS; the host OS can be almost any common operating system such
as Windows, Linux, or MacOS. Hosted virtualization architectures usually have an virtualization application) running in the
guest OS that provides utilities to control the virtualization while in the guest OS, such as the ability to share files with the
host OS. Hosted virtualization architectures also allow users to run applications such as web browsers and email clients alongside
the hosted virtualization application, unlike bare metal architectures, which can only run applications within virtualized
In both bare metal and hosted virtualization, each guest OS appears to have its own hardware, like a regular computer. This includes:
Virtualized Networking
Virtualized storage
But in reality it is difficult to virtualize storage and networking, so some additional overhead is imminent. Some hypervisors also
provide direct memory access (DMA) to high-speed storage controllers and Ethernet controllers, if such features are supported in the
hardware CPU on which the hypervisor is running. DMA access from guest OSs can significantly increase the
speed of disk and network access, although this type of acceleration prevents some useful virtualization features such
as snapshots and moving guest OSs while they are running.
Virtualized Networking
Hypervisors usually provide networking capabilities to the individual guest OSs enabling them to communicate with one another while
simultaneously limiting access to the external physical network. The network interfaces that the guest OSs see may be virtual Ethernet
controller, physical Ethernet controller, or both. Typical hypervisors offer three primary forms of network access:
Network Bridging. The guest OS is given direct access to the host's network interface cards (NIC) independent of the host
OS.
Network Address Translation (NAT). The guest OS is given a virtual NIC that is connected to a simulated NAT inside the
hypervisor. As in a traditional NAT, all outbound network traffic is sent through the virtual NIC to the host OS for forwarding,
usually to a physical NIC on the host system.
Host Only Networking. The guest OS is given a virtual NIC that does not directly route to a physical NIC. In this scenario,
guest OSs can be configured to communicate with one another and, potentially, with the host OS.
When a number of guest OSes exist on a single host, the hypervisor can provide a virtual network for these guest OSs. The hypervisor
may implement virtual switches, hubs, and other network devices. Using a hypervisor's networking for communications between guests on
a single host has the advantage of greatly increased speed because the packets never hit physical networking devices. Internal host-only
networking can be done in many ways by the hypervisor. In some systems, the internal network looks like a virtual switch. Others use
virtual LAN (VLAN) standards to allow better control of how the guest systems are connected. Most hypervisors also provide internal
network address and port translation (NAPT) that acts like a virtual router with NAT.
Networks that are internal to a hypervisor's networking structure can pose an operational disadvantage, however. Many networks rely
on tools that watch traffic as it flows across routers and switches; these tools cannot view traffic as it moves in a hypervisor's network.
There are some hypervisors that allow network monitoring, but this capability is generally not as robust as the tools that many organizations
have come to expect for significant monitoring of physical networks. Some hypervisors provide APIs that allow a privileged VM to have
full visibility to the network traffic. Unfortunately, these APIs may also provide additional ways for attackers to attempt to monitor
network communications. Another concern with network monitoring through a hypervisor is the potential for performance degradation or
denial of service conditions to occur for the hypervisor because of high volumes of traffic.
Virtualized Storage
Hypervisor systems have many ways of simulating disk storage for guest OSs. All hypervisors, at a minimum, provide virtual hard drives
mapped to files, while some of them also have more advanced virtual storage options. In addition, most hypervisors can use advanced
storage interfaces on the host system, such as network-attached storage (NAS) and storage area networks (SAN) to present different storage
options to the guest OSs.
All hypervisors can present the guest OSs with virtual hard drives though the use of disk images. A disk image is a file on
the host that looks to the guest OS like an entire disk drive. Whatever the guest OS writes onto the virtual hard drive goes into the
disk image. With hosted virtualization, the disk image appears in the host OS as a file or a folder, and it can be handled like other
files and folders. As speed of read access is important this is a natural area of application for SSD disks.
Most virtualization systems also allow a guest OS to access physical hard drives as if they were connected to the guest OS directly.
This is different than using disk images. Disk image is a virtual representation of a real drive. The main advantage of using physical
hard drives is that unless SSD is used, accessing them is faster than accessing disk images.
Typically virtual systems in enterprise environment use SAN storage. that's probably why EMC bought VMware. This is an active area
of development in the virtualization market as it permits migration of guest OS from one physical server (more loaded or less powerful)
to another (less loaded and./or more powerful) if one of virtual images experience a bottleneck.
Guest OS Images
A full virtualization hypervisor encapsulates all of the components of a guest OS, including its applications and the virtual resources
they use, into a single logical entity. An image is a file or a directory that contains, at a minimum, this encapsulated information.
Images are stored on hard drives, and can be transferred to other systems the same way that any file can (note, however, that images
are often many gigabytes in size). Some virtualization systems use a virtualization image metadata standard called the Open Virtualization
Format (OVF)
that supports interoperability for image metadata and components across virtualization
solutions. A snapshot is a record of the state of a running image, generally captured as the differences between an image and
the current state. For example, a snapshot would record changes within virtual storage, virtual memory, network connections, and other
state-related data. Snapshots allow the guest OS to be suspended and subsequently resumed without having to shut down or reboot the
guest OS. Many, but not all, virtualization systems can take snapshots.
On some hypervisors, snapshots of the guest OS can even be resumed on a different host. While a number of issues may be introduced
to handle real-time migration, including the transfer delay and any differences that may exist between the two physical servers (e.g.,
IP address, number of processors or hard disk space), most live-migration solutions provide mechanisms to resolve these issues.
By heavy-weight virtualization we will understand full hardware virtualization as exemplified by
VMware. It help to utilize that fact that CPU vendors now are paying huge attention
to this type of virtualization as they can no longer increase the CPU frequency and are forced to the path of increasing the number
of cores. Intel latest CPU that are now dominant in server space are a classic example of this trend. Now upper limit of cores
per CPU exceed 16 and that means that you can have 32 or even 48 cores on a single two socket server. It in clear that Intel is
putting money on the virtualization trend. Sun UltraSparc T1/T2/T3 is another example of CPUs with a large amount
of cores.
All new Intel CPUs are "virtualization-friendly" and with the exception of cheapest models contain instructions and hardware capabilities
that make heavy-weight virtualization more efficient. First of all this is related to the capability of "zero address relocation": availability
of a special register which is added to each address calculation by regular instruction and thus provides illusion of multiple "zero
addresses" to the programs.
VMware is the most popular representative of this approach to the design of hypervisor
and recently it was greatly helped by Intel and AMD who incorporated virtualization extensions in their CPUs. VMware started to
gain popularity before the latest Intel CPUs with virtualization instruction set extensions and demonstrated that it is possible to
implement it reasonably efficiently even without hardware support. VMware officially supports a dozen of different types
of guests: it can run Linux (Red Hat and Suse), Solaris and Windows as virtual instances(guests) on one physical server.
Para-virtualization is a variant of native virtualization, where the VM (hypervisor) emulates only part of hardware and provides
a special API requiring OS modifications. The most popular representative of this approach is
Xen with AIX as a distant second:
With Xen virtualization, a thin software layer known as the Xen hypervisor is inserted between the server's hardware and the operating
system. This provides an abstraction layer that allows each physical server to run one or more "virtual servers," effectively decoupling
the operating system and its applications from the underlying physical server.
IBM LPARs for AIX are currently the king of the hill in this area because of higher stability in comparison with alternatives. IBM
actually pioneered this class of VM machines in late 60 with the release of famous VM/CMS. Until recently Power5 based servers
with AIX 5.3 and LPARs were the most battle-tested and reliable virtualized environments based on paravirtualization.
Xen is the king of paravirtualization hill in Intel space. Work on Xen was initially supported by UK EPSRC grant GR/S01894, Intel
Research, HP Labs and Microsoft Research (Yes, despite naive Linux zealots wining Microsoft did contributed code to Linux ;-). Other
things equal it provides higher speed and less overhead then native virtualization.
NetBSD was the first to implement Xen. Currently the key platform for
Xen is linux with Novell supporting it in production version of Suse.
Xen is now resold commercially by Oracle and several other companies. XenSource, the company create for commercialization of Xen
technology, was bought by Cytrix.
The main advantage of Xen is that like VMware it supports live relocation capability. It is also more cost effective solution the
VMware that is definitely overpriced.
The main problem is that para-virtualization requires OS kernel modification to be aware of the environment it is running and pass
control to hypervisor in case of executing all privileged instructions. Therefore it is not suitable for running legacy OSes and for
running Microsoft Windows (although Xen can run it in newer Intel CPU series)
Para-virtualization improves speed in comparison with heavy-weight virtualization (much less context switching), but does little
beyond that. It is unclear how much faster is para-virtualized instance of OS in comparison with heavy-weight virtualization on "virtualization-friendly"
CPUs. Xen page claims that:
Xen offers near-native performance for virtual servers with up to 10 times less overhead than proprietary
offerings, and benchmarked overhead of well under 5% in most cases compared to 35% or higher overhead rates for other
virtualization technologies.
It's unclear was this difference measured of old Intel CPU or new 5xxx series that support virtualization extensions. I suspect the
difference on newer CPUs should be smaller.
I would like to stress it again that the level of modification OS is very basic and important idea of factoring out common functions
like virtual memory management that was implemented in classic VM/CMS is not utilized. Therefore all the redundant processing typical
for heavy-weight virtualization is present in para-virtualization environment.
Note: Xen 3.0 and above support both para-virtualization and full (heavy-weight) virtualization to leverage the built-in hardware
support built into the Intel-VT-x and AMD Pacifica processors. According to
XenSource Products - Xen 3.0 page:
With the 3.0 release, Xen extends its feature leadership with functionality required to virtualize the servers found in today's enterprise
data centers. New features include:
Support for up to 32-way SMP guest
Intel® VT-x and AMD Pacifica hardware virtualization support
PAE support for 32 bit servers with over 4 GB memory
x86/64 support for both AMD64 and EM64T
One very interesting application of paravirtualization are so called virtual appliances. This is a wholenew area that we discuss
on a separate page.
Another very interesting application of paravirtualization is "cloud" environment like Amazon Elastic cloud.
All-in-all paravirtualization along with light-weight virtualization (BSD jail and Solaris zones) looks like the most promising types
of virtualization.
This type of virtualization was pioneered in Free BCD (jails) and was further developed by Sun and introduced in Solaris 10 as concept
of Zones. There are various experimental add-ons of this type for
Linux but none got any prominence.
Solaris 10 11/06 and later are capable to clone a Zone as well as relocate it to another box, through a feature called Attach/Detach.
The key advantage is that you have a single instance of OS so the price that you paid in case of heavy-weight virtualization is waived.
That means that light-weight virtualization is the most efficient resources-wise. It also has great security value. Memory can become
a bottleneck here as all memory accesses are channeled via a single controller. Also now it is possible to run Linux applications
in zones on X86 servers (branded zones).
Zones are really revolutionary and underappreciated development which were hurt greatly by inept Sun management and subsequent acquisition
by Oracle. The key advantage is that you have a single instance of OS so the price that you paid in case of heavy-weight virtualization
is waived. That means that light-weight virtualization is the most efficient resources-wise. It also has great security value. Memory
can become a bottleneck here as all memory accesses are channeled via a single controller, but you have a single virtual system for
all zones -- great advantage that permits to reuse memory for similar processes.
IBM's "lightweight" product would be "Workload manager"
for AIX which is an older (2001 ???)and less elegant technology then BSD Jails and Solaris zones:
Current UNIX offerings for partitioning and workload management have clear architectural differences. Partitioning creates isolation
between multiple applications running on a single server, hosting multiple instances of the operating system. Workload management
supplies effective management of multiple, diverse workloads to efficiently share a single copy of the operating system and a common
pool of resources
IBM lightweight virtualization in version of AIX before 6 operated under a different paradigm with the most close thing to zone being
a "class". The system administrator (root) can delegate the administration of the subclasses of each superclass to a superclass administrator
(a non-root user). Unlike zones classes can be nested:
The central concept of WLM is the class. A class is a collection of processes (jobs) that has a single
set of resource limits applied to it. WLM assigns processes to the various classes and controls the allocation of
system resources among the different classes. For this purpose, WLM uses class assignment rules and per-class
resource shares and limits set by the system administrator. T he resource entitlements and limits are enforced at
the class level. This is a way of defining classes of service and regulating the resource utilization of each class of applications
to prevent applications with very different resource utilization patterns from interfering with each
other when they are sharing a single server.
In AIX 6 IBM adopted Solaris style light-weight virtualization.
One very interesting application of paravirtualization are so called virtual appliances. This is a wholenew area that we discuss
on a separate page.
Another very interesting application of paravirtualization is "cloud" environment like Amazon Elactinc cloud.
Blade servers are an increasingly important part of the enterprise datacenters, with consistent double-digit growth which is outpacing
the overall server market. IDC estimated that 500,000 blade servers were sold in 2005, or 7% of the total market, with customers spending
$2.1 billion.
While blades are not virtualization in pure technical sense, the rack with blades (bladesystem) possesses some additional management
capabilities that are similar to virtualized system and that are not present in stand-alone set of 1U servers. Blades usually have shared
I/O channel to NAS. They also have shared remote management capaibilities (ILO on HP blades).
They can be viewed as "hardware factorization" approach to server construction, which is not that different from virtualization.
The first shot in this direction is the new generation of bladesystems like IBM BladeCenter H system has offered I/O virtualization
since February, 2006 and HP BladeSystem c-Class. A bladesystem saves up to 30% power in comparison
with rack mounted 1U servers with identical CPU and memory configurations.
Sun also offers blades but it is a minor player in this area. It offers pretty interesting and innovative
Sun Blade 8000 Modular System which target higher end that usual
blade servers. Here is how Cnet described the key idea behind the server if the article
Sun defends big blade
server 'Size matters':
Sun co-founder
Andy Bechtolsheim, the company's top x86 server designer and a respected computer engineer, shed light on his technical reasoning
for the move.
"It's not that our blade is too large. It's that the others are too small," he said.
Today's dual-core processors will be followed by models with four, eight and 16 cores, Bechtolsheim said. "There are two megatrends
in servers: miniaturization and multicore--quad-core, octo-core, hexadeci-core. You definitely want bigger blades with more memory
and more input-output."
When blade server leaders IBM and HP introduced their second-generation blade chassis earlier this year, both chose larger products.
IBM's grew 3.5 inches taller, while HP's grew 7 inches taller. But opinions vary on whether Bechtolsheim's prediction of even larger
systems will come true.
"You're going to have bigger chassis," said IDC analyst John Humphries, because blade server applications
are expanding from lower-end tasks such as e-mail to higher-end tasks such as databases. On the more cautious side
is Illuminata analyst Gordon Haff, who said that with IBM and HP just at the beginning of a new blade chassis generation, "I don't
see them rushing to add additional chassis any time soon."
Business reasons as well as technology reasons led Sun to re-enter the blade server arena with big blades rather than more conventional
smaller models that sell in higher volumes, said the Santa Clara, Calif.-based company's top server executive, John Fowler. "We believe
there is a market for a high-end capabilities. And sometimes you go to where the competition isn't," Fowler said.
As a result of such factorization more and more functions move to the blade enclosure. As a result power consumption improves dramatically
as blades typically use low power dissipating CPUs and all blades typically share the same power supply that in case of full or nearly
full rack permits power supply to work with much greater power efficiency (twice of more efficient then on a typical server). That cuts
air conditioning costs too. Also newer blades monitor air flow and adjust fans accordingly. As a result energy bill can be half of the
same amount of U1 servers.
Blades generally solves the problem of memory bandwidth typical for most types of virtualization except domain-based. Think about
them are predefined partitions with fixed amount of CPU and memory. Dynamic swap of images between blades is possible. Some I/O
can be local and with high speed solid drives very reliable and fast. That permits offloading OS-related IO from application related
I/O.
Think about them are predefined (fixed) partitions with fixed number of CPUs and size of memory. Dynamic swap of images between blades
is possible. Some I/O can be local as blade typically can carry 2 (half-size blades) or 4 (full size blades) 2.5" disks. With solid
state drive being a reliable and fast, albeit expensive alternative to tradition rotating hardrives and memory cards like ioDrive local
disk speed can be as good as better as on the large server with, say, sixteen 15K RPM hardrives.
POWER systems, which can run partitions with electrical isolation but this is not widely used. Still that fits "super-heavyweight"
category as defined above.
Mainframes. Classic VM implementation is still famous
IBM's VM/CMS That was the first successful
commercial virtualization product that was created when almost nobody even thought about virtualization.
Medium-weight (para-virtualization) IBM sells servers with XEN preinstalled. Power 5 and 6
Logical partitions (LPARs) fit the para-virtualization category and are extremely popular with AIX users (the technology originated
in classic IBM VM on System 3xx architecture which was the first successful "heavy-virtualization" implementation). See
P5 virtualization
IBM is weak in light-weight virtualization pioneered by FreeBSD and Solaris and missed the train. They will catch it with
AIX 6.
IBM blade servers are slightly behind HP blades in factorization but not by much.
Sun now competes in all five categories but its presence in heavyweight category is simply symbolic as only recently this capability
was made available. Sun blade enclosures can mix and match both UltraSparc and Opteron-based blades.
Microsoft competes mainly in two categories (heavy-weight virtualization and para-virtualization)
FreeBSD in just one category (light-weight virtualization) but it pioneered this category.
Linux vendors compete mainly in one category (para-virtualization using Xen).
Conclusions
There is no free lunch and virtualization is not panacea. It increases the complexity of environment and puts severe stress of a
single server that host multiple instances on virtual machines. Failure of this server lead to failure of all instances. The same
is true about failure of hypervisor.
All-in-all paravirtualization along with light-weight virtualization (BSD jail and Solaris zones) looks like the most promising types
of virtualization.
The natural habitat of virtualization are:
Development, test and stage servers,
Demos
Virtual appliances (only paravirtualization and light-weight virtualization)
legacy versions of OS support on new hardware
Almost idle servers that servers various enterprise consoles and similar low CPU intensive applications (for example specialized
internal Web servers and e-commerce servers).
Startup IT infrastructure before startup achieves some level of maturity (as such infrastructure is highly dynamic and mistakes
in physical server acquisition/allocation are costly).
Other highly dynamic setups where ability to move guests to a different higher performance server can be of critical importance.
At the same time virtualization opens new capabilities for running multiple instances of the same application, for example Web server
and some types of virtualization like paravirtualization and light-weight virtualization (zones) can do it more not less efficiently
then a similar single physical server with multiple web servers running on different ports.
Sometimes it make sense to run a single instance of virtual machine on the server to get such advantages as on the fly relocation
of instances, virtual images manipulation capabilities, etc. With technologies like Xen that claims less then 5% overhead that approach
becomes feasible. "Binary servers" -- servers that host just two applications also look very promising as in this case you still can
buy low cost servers and in case of Xen do not need to pay for hypervisor.
Migration of rack-mounted servers to blade servers is probably the most safe approach to server consolidation. Managers without experience
of work in partitioned environment shouldn't underestimate what their administrators need to learn and the set of new problems that
virtualization creates One good advice is "Make sure you put the training dollars in."
There are also other problems. A lot of software vendors won't certify applications as virtual environment compatible, for example
VMware compatible. In such cases running the application in virtual environment means that you need to assume the risks and cannot count
on vendor tech support to resolve your issues.
All-in all virtualization is mainly played now in desktop and low end servers space. It make sense to proceed slowly testing the
water before jumping in. Those that have adopted virtualization have, on average, only about 20% of their environment virtualized, according
to IDC. VMware pricing structure is a little bit ridiculous and nullifies hardware savings, if any. Their maintenance costs are even
worse. That means that alternative solutions like Xen3 or Microsoft should be considered on Intel side and IBM and Sun on Unix side.
As vendor consolidation is ahead if you don't have a clear benefit from virtualization today, you can wait or limit yourself to "sure
bets" like development, testing and staging servers. The next version of Windows Server will put serious pressure on VMware in a year
or so. Xen is also making progress with IBM support behind it. With those competitive pressures, VMware could become significantly less
expensive in the future.
VMs are also touted as a solution to the computer security problem. It's pretty obvious that they can improve security. After all,
if you're running your browser on one VM and your mailer on another, a security failure by one shouldn't affect the other. If one virtual
machine is compromised you can just discard it and create an fresh image. There is some merit to that argument, and in many situations
it's a good configuration to use. But at the same time the transient nature of Virtual Machines introduces new security and compliance
challenges not addressed by traditional systems management processes and tools. For example virtual images are more portable and possibility
of stealing the whole OS images and running them on a different VM are very real. New security risks inherent in virtualized environments
need to be understood and mitigated.
"(Virtual machines) offer the ability to partition the resources of a large machine between a large
number of users in such a way that those users can't interfere with one another. Each user gets a virtual machine running a separate
operating system with a certain amount of resources assigned to it. Getting more memory, disks, or processors is a matter of changing
a configuration, which is far easier than buying and physically installing the equivalent hardware."
And FreeBSD and Solaris users has their lightweight VM built in the OS. Actually FreeBSD jails, Solaris 10 zone and
Xen are probably the most democratic light weight VM. To counter the threat from free VMs VMware now produces a free version too. VMware Player is able to run virtual machines made in VMware
Workstation. There are many free OS's on the website. Most of them are community made. There are also freeware tools for creating VM's,
mounting, manipulating and converting VMware disks and floppies, so it is possible to create, run and maintain virtual machines for
free (even for commercial use).
Here is how this class of virtual machines is described in
Wikipedia
Conventional emulators like Bochs emulate
the microprocessor, executing each guest CPU instruction
by calling a software subroutine on the host machine that simulates the function of that CPU instruction. This abstraction allows
the guest machine to run on host machines with a different type of microprocessor, but is also very slow.
An improvement on this approach is
dynamically recompiling blocks of
machine instructions the first time they are executed, and later using the translated code directly when the code runs a second time.
This approach is taken by Microsoft's
Virtual PC for
Mac OS X.
VMware Workstation takes an even more optimized approach and uses the CPU to run code directly when
this is possible. This is the case for user mode and
virtual 8086 mode code on x86. When direct
execution is not possible, code is rewritten dynamically. This is the case for kernel-level and
real mode code. In VMware's case, the translated code is put
into a spare area of memory, typically at the end of the address space, which can then be protected and made invisible using the
segmentation mechanisms. For these reasons, VMware is dramatically faster than emulators, running at more than 80% of the speed that
the virtual guest OS would run on hardware. VMware boasts an overhead as small as 3%–6% for computationally intensive applications.
Although VMware virtual machines run in user mode, VMware Workstation itself requires installing various
drivers in the host operating system, notably in order to dynamically switch the
GDT and the
IDT tables.
One final note: it is often erroneously believed that virtualization products like VMware or Virtual
PC replace offending instructions or simply run kernel code in user mode. Neither of these approaches can work on x86.
Replacing instructions means that if the code reads itself it will be surprised not to find the expected content; it is not possible
to protect code against reading and at the same time allow normal execution; replacing in place is complicated. Running the code
unmodified in user mode is not possible either, as most instructions which just read the machine state do not cause an exception
and will betray the real state of the program, and certain instructions silently change behavior in user mode. A rewrite is always
necessary; a simulation of the current program counter
in the original location is performed when necessary and notably hardware code
breakpoints are remapped.
The Xen open source virtual machine partitioning project is picking
up momentum since acquiring the backing of venture capitalists at the end of 2004. Now, server makers and Linux operating system providers
are starting to line up to support the project, contribute code, and make it a feature of their systems at some point in the future.
Work on Xen has been supported by UK EPSRC grant GR/S01894, Intel Research, HP Labs and Microsoft Research. Novell and Advanced Micro
Devices also back Xen. See also
While everybody seemed to get interested in the open source Xen virtual machine partitioning hypervisor
just when XenSource incorporated and made its plans clear for
the Linux platform, the NetBSD variant of the BSD Unix platform
has been Xen-compatible for over a year now, and will be as fully embracing the technology as Linux is expected to.
Xen has really taken off since Dec, 2004, when the leaders of the Xen project formed a corporation
to sell and support Xen and they immediately secured $6 million from venture capitalists Kleiner Perkins Caufield & Byers and Sevin
Rosen Funds.
Xen is headed up by Ian Pratt, a senior faculty member at the University of Cambridge in the United
Kingdom, who is the chief technology officer at XenSource, the company that has been created to commercialize Xen. Pratt told me
in December that he had basically been told to start a company to support Xen because some big financial institutions on Wall Street
and in the City (that's London's version of Wall Street for the Americans reading this who may not have heard the term) insisted
that he do so because they loved what Xen was doing.
Seven years ago, Ian Pratt joined the senior faculty at the University of Cambridge in the United
Kingdom, and after being on the staff for two years, he came up with a schematic for a futuristic, distributed computing platform
for wide area network computing called Xenoserver. The idea behind the Xenoserver project is one that now sounds familiar, at least
in concept, but sounded pretty sci-fi seven years ago: hundreds of millions of virtual machines running on tens of millions of servers,
connected by the Internet, and delivering virtualized computing resources on a utility basis where people are charged for the computing
they use. The Xenoserver project consisted of the Xen virtual machine monitor and hypervisor abstraction layer, which allows multiple
operating systems to logically share the hardware on a single physical server, the Xenoserver Open Platform for connecting virtual
machines to distributed storage and networks, and the Xenoboot remote boot and management system for controlling servers and their
virtual machines over the Internet.
Work on the Xen hypervisor began in 1999 at Cambridge, where Pratt was irreverently called the "XenMaster"
by project staff and students. During that first year, Pratt and his project team identified how to do secure partitioning on 32-bit
X86 servers using a hypervisor and worked out a means for shuttling active virtual machine partitions around a network of machines.
This is more or less what VMware does with its ESX Server partitioning
software and its VMotion add-on to that product. About 18 months ago, after years of coding the hypervisor in C and the interface
in Python, the Xen portion of the Xenoserver project was released as Xen 1.0. According to Pratt, it had tens of thousands
of downloads. This provided the open source developers working on Xen with a lot of feedback, which was used to create Xen 2.0, which
started shipping last year. With the 2.0 release, the Xen project added the Live Migration feature for moving virtual machines between
physical machines, and then added some tweaks to make the code more robust.
Xen and VMware's GSX Server and EXS Server have a major architectural difference. VMware's hypervisor
layer completely abstracts the X86 system, which means any operating system supported on X86 processors can be loaded into a virtual
machine partition. This, said Pratt, puts tremendous overhead on the systems. Xen was designed from the get-go with an architecture
focused on running virtual machines in a lean and mean fashion, and Xen does this by having versions of open source operating systems
tweaked to run on the Xen hypervisor. That is why Xen 2.0 only supports Linux 2.4, Linux 2.6, FreeBSD 4.9 and 5.2, and NetBSD 2.0
at the moment; special tweaks of NetBSD and Plan 9 are in the works, and with Solaris 10 soon to be open-source, that will be available
as well. With Xen 1.0, Pratt had access to the source code to Windows XP from Microsoft, which allowed the Xen team to put Windows
XP inside Xen partitions. With the future "Pacifica" hardware virtualization features in single-core and dual-core Opterons and
Intel creating a version of its "Vanderpool" virtualization hardware
features in Xeon and Itanium processors also being made for Pentium 4 processors (this is called "Silvervale" for some reason), both
Xen and VMware partitioning software will have hardware-assisted virtual machine partitioning. While no one is saying this because
they cannot reveal how Pacifica or Vanderpool actually work, these technologies may do most of the X86 abstraction work, and therefore
should allow standard, compiled operating system kernels run inside Xen or VMware partitions. That means Microsoft can't stop Windows
from being supported inside Xen over the long haul.
Thor Lancelot Simon, one of the key developers and administrators at the NetBSD Foundation that controls
the development of NetBSD, reminded everyone that NetBSD has been supporting the Xen 1.2 hypervisor and monitor within a variant
of the NetBSD kernel (that's NetBSD/xen instead of NetBSD/i386) since March of last year. Moreover, the foundation's own servers
are all equipped with Xen, which allows programmers to work in isolated partitions with dedicated resources and not stomp all over
each other as they are coding and compiling. "We aren't naive enough to think that any system has perfect security; but Xen helps
us isolate critical systems from each other, and at the same time helps keep our systems physically compact and easy to manage,"
he said. "When you combine virtualization with Xen with NetBSD's small size, code quality, permissive license, and comprehensive
set of security features, it's pretty clear you have a winning combination, which is why we run it on our own systems." NetBSD contributor
Manuel Bouyer has done a lot of work to integrate the Xen 2.0 hypervisor and monitor into the NetBSD-current branch, and he said
he would be making changes to the NetBSD/i386 release that would all integrate /xen kernels into it and will allow Xen partitions
to run in privileged and unprivileged mode.
The Xen 3.0 hypervisor and monitor is expected some time in late 2005 early 2006, with support for
64-bit Xeon and Opteron processors. XenSource's Pratt told me recently that Xen 4.0 is due to be released in the second half of 2005,
and it will have better tools for provisioning and managing partitions. It is unclear how the NetBSD project will absorb these changes,
but NetBSD 3.0 is expected around the middle of 2005. The project says that they plan to try to get one big release of NetBSD out
the door once a year going forward.
The error message tells you that your current user can't access the docker engine, because you're lacking permissions to access
the unix socket to communicate with the engine.
As a temporary solution, you can use sudo to run the failed command as root.
However it is recommended to fix the issue by adding the current user to the docker group :
Run this command in your favourite shell and then completely log out of your account and log back in (if in doubt, reboot!):
sudo usermod -a -G docker $USER
sudo usermod -a -G docker $USER
sudo usermod -a -G docker $USER
After doing that, you should be able to run the command without any issues. Run docker run hello-world as a normal
user in order to check if it works. Reboot if the issue still persists.
Logging out and logging back in is required because the group change will not have an effect unless your session is closed.
Control Docker Service
Now you have Docker installed onto your machine, start the Docker service in case if it is not started automatically after the
installation
Once the service is started, verify your installation by running the following command.
# docker run -it centos echo Hello-World
Let's see what happens when we run " docker run " command. Docker starts a container with centos base image since we are running
this centos container for the first time, the output will look like below.
Docker looks for centos image locally, and it is not found, it starts downloading the centos image from Docker registry. Once
the image has been downloaded, it will start the container and echo the command " Hello-World " in the console which you can see
at the end of the output.
Containers and Virtual Machines are often seen as conflicting technology, however, this is
often a misunderstanding.
Virtual Machines are a way to take a physical server and provide a fully functional
operating environment that shares those physical resources with other virtual machines. A
Container is generally used to isolate a running process within a single host to ensure that
the isolated processes cannot interact with other processes within that same system. In fact
containers are closer to BSD Jails and chroot 'ed processes than full virtual
machines.
What Docker provides on top of containers
Docker itself is not a container runtime environment; in fact Docker is actually container
technology agnostic with efforts planned for Docker to support Solaris Zones and BSD Jails . What Docker provides is a
method of managing, packaging, and deploying containers. While these types of functions may
exist to some degree for virtual machines they traditionally have not existed for most
container solutions and the ones that existed, were not as easy to use or fully featured as
Docker.
Now that we know what Docker is, let's start learning how Docker works by first installing
Docker and deploying a public pre-built container.
Starting with Installation
As Docker is not installed by default step 1 will be to install the Docker package; since
our example system is running Ubuntu 14.0.4 we will do this using the Apt package manager.
# apt-get install docker.io
Reading package lists... Done Building dependency tree Reading state information... Done The following extra packages will be installed: aufs-tools cgroup-lite git git-man liberror-perl Suggested packages: btrfs-tools debootstrap lxc rinse git-daemon-run git-daemon-sysvinit git-doc git-el git-email git-gui gitk gitweb git-arch git-bzr git-cvs git-mediawiki git-svn The following NEW packages will be installed: aufs-tools cgroup-lite docker.io git git-man liberror-perl 0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded. Need to get 7,553 kB of archives. After this operation, 46.6 MB of additional disk space will be used. Do you want to continue? [Y/n] y
To check if any containers are running we can execute the docker command using the
ps option.
# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
The ps function of the docker command works similar to the Linux
ps command. It will show available Docker containers and their current status. Since
we have not started any Docker containers yet, the command shows no running
containers.
Deploying a pre-built nginx Docker container
One of my favorite features of Docker is the ability to deploy a pre-built container in the
same way you would deploy a package with yum or apt-get . To explain this
better let's deploy a pre-built container running the nginx web server. We can do this by
executing the docker command again, however, this time with the run
option.
The run function of the docker command tells Docker to find a specified
Docker image and start a container running that image. By default, Docker containers run in the
foreground, meaning when you execute docker run your shell will be bound to the
container's console and the process running within the container. In order to launch this
Docker container in the background I included the -d (detach) flag.
By executing docker ps again we can see the nginx container running.
# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f6d31ab01fc9 nginx:latest nginx -g 'daemon off 4 seconds ago Up 3 seconds 443/tcp, 80/tcp desperate_lalande
In the above output we can see the running container desperate_lalande and that
this container has been built from the nginx:latest image.
Docker Images
Images are one of Docker's key features and is similar to a virtual machine image. Like
virtual machine images, a Docker image is a container that has been saved and packaged. Docker
however, doesn't just stop with the ability to create images. Docker also includes the ability
to distribute those images via Docker repositories which are a similar concept to package
repositories. This is what gives Docker the ability to deploy an image like you would deploy a
package with yum . To get a better understanding of how this works let's look back at
the output of the docker run execution.
# docker run -d nginx Unable to find image 'nginx' locally
The first message we see is that docker could not find an image named nginx
locally. The reason we see this message is that when we executed docker run we told
Docker to startup a container, a container based on an image named nginx . Since Docker is
starting a container based on a specified image it needs to first find that image. Before
checking any remote repository Docker first checks locally to see if there is a local image
with the specified name.
Since this system is brand new there is no Docker image with the name nginx , which means
Docker will need to download it from a Docker repository.
This is exactly what the second part of the output is showing us. By default, Docker uses
the Docker Hub repository, which is a
repository service that Docker (the company) runs.
Like GitHub, Docker Hub is free for public repositories but requires a subscription for
private repositories. It is possible however, to deploy your own Docker repository, in fact it
is as easy as docker run registry . For this article we will not be deploying a custom
registry service.
Stopping and Removing the Container
Before moving on to building a custom Docker container let's first clean up our Docker
environment. We will do this by stopping the container from earlier and removing it.
To start a container we executed docker with the run option, in order to
stop this same container we simply need to execute the docker with the kill
option specifying the container name.
# docker kill desperate_lalande desperate_lalande
If we execute docker ps again we will see that the container is no longer
running.
# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
However, at this point we have only stopped the container; while it may no longer be running
it still exists. By default, docker ps will only show running containers, if we add
the -a (all) flag it will show all containers running or not.
# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f6d31ab01fc9 5c82215b03d1 nginx -g 'daemon off 4 weeks ago Exited (-1) About a minute ago desperate_lalande
In order to fully remove the container we can use the docker command with the
rm option.
# docker rm desperate_lalande desperate_lalande
While this container has been removed; we still have a nginx image available. If we were to
re-run docker run -d nginx again the container would be started without having to
fetch the nginx image again. This is because Docker already has a saved copy on our local
system.
To see a full list of local images we can simply run the docker command with the
images option.
# docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE nginx latest 9fab4090484a 5 days ago 132.8 MB
Building our own custom image
At this point we have used a few basic Docker commands to start, stop and remove a common
pre-built image. In order to "Dockerize" this blog however, we are going to have to build our
own Docker image and that means creating a Dockerfile .
With most virtual machine environments if you wish to create an image of a machine you need
to first create a new virtual machine, install the OS, install the application and then finally
convert it to a template or image. With Docker however, these steps are automated via a
Dockerfile. A Dockerfile is a way of providing build instructions to Docker for the creation of
a custom image. In this section we are going to build a custom Dockerfile that can be used to
deploy this blog.
Understanding the Application
Before we can jump into creating a Dockerfile we first need to understand what is required
to deploy this blog.
The blog itself is actually static HTML pages generated by a custom static site generator
that I wrote named; hamerkop . The generator is very simple and more about getting the job done
for this blog specifically. All the code and source files for this blog are available via a
public GitHub repository. In
order to deploy this blog we simply need to grab the contents of the GitHub repository, install
Python along with some Python modules and execute the hamerkop application. To serve
the generated content we will use nginx ; which means we will also need nginx to be
installed.
So far this should be a pretty simple Dockerfile, but it will show us quite a bit of the
Dockerfile Syntax
. To get started we can clone the GitHub repository and creating a Dockerfile with our favorite
editor; vi in my case.
The first instruction of a Dockerfile is the FROM instruction. This is used to
specify an existing Docker image to use as our base image. This basically provides us with a
way to inherit another Docker image. In this case we will be starting with the same nginx image
we were using before.
If we wanted to start with a blank slate we could use the Ubuntu Docker image by
specifyingubuntu:latest .
## Dockerfile that generates an instance of http://bencane.com FROM nginx:latest MAINTAINER Benjamin Cane <[email protected]>
In addition to the FROM instruction, I also included a MAINTAINER
instruction which is used to show the Author of the Dockerfile.
As Docker supports using # as a comment marker, I will be using this syntax quite a
bit to explain the sections of this Dockerfile.
Running a test build
Since we inherited the nginx Docker image our current Dockerfile also inherited all the
instructions within the
Dockerfile used to build that nginx image. What this means is even at this point we are
able to build a Docker image from this Dockerfile and run a container from that image. The
resulting image will essentially be the same as the nginx image but we will run through a build
of this Dockerfile now and a few more times as we go to help explain the Docker build
process.
In order to start the build from a Dockerfile we can simply execute the docker
command with the build option.
# docker build -t blog /root/blog Sending build context to Docker daemon 23.6 MB Sending build context to Docker daemon Step 0 : FROM nginx:latest ---> 9fab4090484a Step 1 : MAINTAINER Benjamin Cane <[email protected]> ---> Running in c97f36450343 ---> 60a44f78d194 Removing intermediate container c97f36450343 Successfully built 60a44f78d194
In the above example I used the -t (tag) flag to "tag" the image as "blog". This
essentially allows us to name the image, without specifying a tag the image would only be
callable via an Image ID that Docker assigns. In this case the Image ID is
60a44f78d194 which we can see from the docker command's build success
message.
In addition to the -t flag, I also specified the directory /root/blog .
This directory is the "build directory", which is the directory that contains the Dockerfile
and any other files necessary to build this container.
Now that we have run through a successful build, let's start customizing this
image.
Using RUN to execute apt-get
The static site generator used to generate the HTML pages is written in Python and because
of this the first custom task we should perform within this Dockerfile is to install
Python . To install the Python package we will use the Apt package manager. This means we will
need to specify within the Dockerfile that apt-get update and apt-get
install python-dev are executed; we can do this with the RUN instruction.
## Dockerfile that generates an instance of http://bencane.com FROM nginx:latest MAINTAINER Benjamin Cane <[email protected]> ## Install python and pip RUN apt-get update RUN apt-get install -y python-dev python-pip
In the above we are simply using the RUN instruction to tell Docker that when it
builds this image it will need to execute the specified apt-get commands. The
interesting part of this is that these commands are only executed within the context of this
container. What this means is even though python-dev and python-pip are being
installed within the container, they are not being installed for the host itself. Or to put it
simplier, within the container the pip command will execute, outside the container,
the pip command does not exist.
It is also important to note that the Docker build process does not accept user input during
the build. This means that any commands being executed by the RUN instruction must
complete without user input. This adds a bit of complexity to the build process as many
applications require user input during installation. For our example, none of the commands
executed by RUN require user input.
Installing Python modules
With Python installed we now need to install some Python modules. To do this outside of
Docker, we would generally use the pip command and reference a file within the blog's
Git repository named requirements.txt . In an earlier step we used the git
command to "clone" the blog's GitHub repository to the /root/blog directory; this
directory also happens to be the directory that we have created the Dockerfile . This
is important as it means the contents of the Git repository are accessible to Docker during the
build process.
When executing a build, Docker will set the context of the build to the specified "build
directory". This means that any files within that directory and below can be used during the
build process, files outside of that directory (outside of the build context), are
inaccessible.
In order to install the required Python modules we will need to copy the
requirements.txt file from the build directory into the container. We can do this
using the COPY instruction within the Dockerfile .
## Dockerfile that generates an instance of http://bencane.com FROM nginx:latest MAINTAINER Benjamin Cane <[email protected]> ## Install python and pip RUN apt-get update RUN apt-get install -y python-dev python-pip ## Create a directory for required files RUN mkdir -p /build/ ## Add requirements file and run pip COPY requirements.txt /build/ RUN pip install -r /build/requirements.txt
Within the Dockerfile we added 3 instructions. The first instruction uses
RUN to create a /build/ directory within the container. This directory will
be used to copy any application files needed to generate the static HTML pages. The second
instruction is the COPY instruction which copies the requirements.txt file
from the "build directory" ( /root/blog ) into the /build directory within
the container. The third is using the RUN instruction to execute the pip
command; installing all the modules specified within the requirements.txt file.
COPY is an important instruction to understand when building custom images. Without
specifically copying the file within the Dockerfile this Docker image would not contain the
requirements.txt file. With Docker containers everything is isolated, unless
specifically executed within a Dockerfile a container is not likely to include required
dependencies.
Re-running a build
Now that we have a few customization tasks for Docker to perform let's try another build of
the blog image again.
# docker build -t blog /root/blog Sending build context to Docker daemon 19.52 MB Sending build context to Docker daemon Step 0 : FROM nginx:latest ---> 9fab4090484a Step 1 : MAINTAINER Benjamin Cane <[email protected]> ---> Using cache ---> 8e0f1899d1eb Step 2 : RUN apt-get update ---> Using cache ---> 78b36ef1a1a2 Step 3 : RUN apt-get install -y python-dev python-pip ---> Using cache ---> ef4f9382658a Step 4 : RUN mkdir -p /build/ ---> Running in bde05cf1e8fe ---> f4b66e09fa61 Removing intermediate container bde05cf1e8fe Step 5 : COPY requirements.txt /build/ ---> cef11c3fb97c Removing intermediate container 9aa8ff43f4b0 Step 6 : RUN pip install -r /build/requirements.txt ---> Running in c50b15ddd8b1 Downloading/unpacking jinja2 (from -r /build/requirements.txt (line 1)) Downloading/unpacking PyYaml (from -r /build/requirements.txt (line 2)) <truncated to reduce noise> Successfully installed jinja2 PyYaml mistune markdown MarkupSafe Cleaning up... ---> abab55c20962 Removing intermediate container c50b15ddd8b1 Successfully built abab55c20962
From the above build output we can see the build was successful, but we can also see another
interesting message; ---> Using cache . What this message is telling us is that
Docker was able to use its build cache during the build of this image.
Docker build
cache
When Docker is building an image, it doesn't just build a single image; it actually builds
multiple images throughout the build processes. In fact we can see from the above output that
after each "Step" Docker is creating a new image.
The last line from the above snippet is actually Docker informing us of the creating of a
new image, it does this by printing the Image ID ; cef11c3fb97c . The useful thing
about this approach is that Docker is able to use these images as cache during subsequent
builds of the blog image. This is useful because it allows Docker to speed up the build process
for new builds of the same container. If we look at the example above we can actually see that
rather than installing the python-dev and python-pip packages again, Docker
was able to use a cached image. However, since Docker was unable to find a build that executed
the mkdir command, each subsequent step was executed.
The Docker build cache is a bit of a gift and a curse; the reason for this is that the
decision to use cache or to rerun the instruction is made within a very narrow scope. For
example, if there was a change to the requirements.txt file Docker would detect this
change during the build and start fresh from that point forward. It does this because it can
view the contents of the requirements.txt file. The execution of the apt-get
commands however, are another story. If the Apt repository that provides the Python packages
were to contain a newer version of the python-pip package; Docker would not be able to
detect the change and would simply use the build cache. This means that an older package may be
installed. While this may not be a major issue for the python-pip package it could be
a problem if the installation was caching a package with a known vulnerability.
For this reason it is useful to periodically rebuild the image without using Docker's cache.
To do this you can simply specify --no-cache=True when executing a Docker
build.
Deploying the rest of the blog
With the Python packages and modules installed this leaves us at the point of copying the
required application files and running the hamerkop application. To do this we will
simply use more COPY and RUN instructions.
## Dockerfile that generates an instance of http://bencane.com FROM nginx:latest MAINTAINER Benjamin Cane <[email protected]> ## Install python and pip RUN apt-get update RUN apt-get install -y python-dev python-pip ## Create a directory for required files RUN mkdir -p /build/ ## Add requirements file and run pip COPY requirements.txt /build/ RUN pip install -r /build/requirements.txt ## Add blog code nd required files COPY static /build/static COPY templates /build/templates COPY hamerkop /build/ COPY config.yml /build/ COPY articles /build/articles ## Run Generator RUN /build/hamerkop -c /build/config.yml
Now that we have the rest of the build instructions, let's run through another build and
verify that the image builds successfully.
# docker build -t blog /root/blog/ Sending build context to Docker daemon 19.52 MB Sending build context to Docker daemon Step 0 : FROM nginx:latest ---> 9fab4090484a Step 1 : MAINTAINER Benjamin Cane <[email protected]> ---> Using cache ---> 8e0f1899d1eb Step 2 : RUN apt-get update ---> Using cache ---> 78b36ef1a1a2 Step 3 : RUN apt-get install -y python-dev python-pip ---> Using cache ---> ef4f9382658a Step 4 : RUN mkdir -p /build/ ---> Using cache ---> f4b66e09fa61 Step 5 : COPY requirements.txt /build/ ---> Using cache ---> cef11c3fb97c Step 6 : RUN pip install -r /build/requirements.txt ---> Using cache ---> abab55c20962 Step 7 : COPY static /build/static ---> 15cb91531038 Removing intermediate container d478b42b7906 Step 8 : COPY templates /build/templates ---> ecded5d1a52e Removing intermediate container ac2390607e9f Step 9 : COPY hamerkop /build/ ---> 59efd1ca1771 Removing intermediate container b5fbf7e817b7 Step 10 : COPY config.yml /build/ ---> bfa3db6c05b7 Removing intermediate container 1aebef300933 Step 11 : COPY articles /build/articles ---> 6b61cc9dde27 Removing intermediate container be78d0eb1213 Step 12 : RUN /build/hamerkop -c /build/config.yml ---> Running in fbc0b5e574c5 Successfully created file /usr/share/nginx/html//2011/06/25/checking-the-number-of-lwp-threads-in-linux Successfully created file /usr/share/nginx/html//2011/06/checking-the-number-of-lwp-threads-in-linux <truncated to reduce noise> Successfully created file /usr/share/nginx/html//archive.html Successfully created file /usr/share/nginx/html//sitemap.xml ---> 3b25263113e1 Removing intermediate container fbc0b5e574c5 Successfully built 3b25263113e1
Running a custom container
With a successful build we can now start our custom container by running the docker
command with the run option, similar to how we started the nginx container
earlier.
# docker run -d -p 80:80 --name=blog blog 5f6c7a2217dcdc0da8af05225c4d1294e3e6bb28a41ea898a1c63fb821989ba1
Once again the -d (detach) flag was used to tell Docker to run the container in the
background. However, there are also two new flags. The first new flag is --name ,
which is used to give the container a user specified name. In the earlier example we did not
specify a name and because of that Docker randomly generated one. The second new flag is
-p , this flag allows users to map a port from the host machine to a port within the
container.
The base nginx image we used exposes port 80 for the HTTP service. By default, ports bound
within a Docker container are not bound on the host system as a whole. In order for external
systems to access ports exposed within a container the ports must be mapped from a host port to
a container port using the -p flag. The command above maps port 80 from the host, to
port 80 within the container. If we wished to map port 8080 from the host, to port 80 within
the container we could do so by specifying the ports in the following syntax -p
8080:80 .
From the above command it appears that our container was started successfully, we can verify
this by executing docker ps .
# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d264c7ef92bd blog:latest nginx -g 'daemon off 3 seconds ago Up 3 seconds 443/tcp, 0.0.0.0:80->80/tcp blog
Wrapping up
At this point we now have a running custom Docker container. While we touched on a few
Dockerfile instructions within this article we have yet to discuss all the instructions. For a
full list of Dockerfile instructions you can checkout Docker's reference page , which explains
the instructions very well.
Another good resource is their Dockerfile Best
Practices page which contains quite a few best practices for building custom Dockerfiles.
Some of these tips are very useful such as strategically ordering the commands within the
Dockerfile. In the above examples our Dockerfile has the COPY instruction for the
articles directory as the last COPY instruction. The reason for this is that
the articles directory will change quite often. It's best to put instructions that
will change oftenat the lowest point possible within the Dockerfile to optimize steps that can
be cached.
In this article we covered how to start a pre-built container and how to build, then deploy
a custom container. While there is quite a bit to learn about Docker this article should give
you a good idea on how to get started. Of course, as always if you think there is anything that
should be added drop it in the comments below.
If you are using the Docker Docker package supplied by Red Hat / CentOS, the package name is docker . You can check
the installed package by executing:
rpm -q docker
If you are using the Docker package supplied by Red Hat / CentOS, the dockerroot group is automatically added to
the system. You will need to edit (or create) /etc/docker/daemon.json to include the following:
{
"group": "dockerroot"
}
Restart Docker after editing or creating the file. After restarting Docker, you can check the group permission of the Docker socket
( /var/run/docker.sock ), which should show dockerroot as group:
It is better to stop docker and start it after change of daemon.json. Restart does not always work as intended and socket can remain
owned incorrectly.
Is there a way I can download a Docker image/container using, for example, Firefox and not using the built-in docker-pull
.
I am blocked by the company firewall and proxy, and I can't get a hole through it.
My problem is that I cannot use Docker to get images, that is, Docker save/pull and other Docker supplied functions since it
is blocked by a firewall.
i cannot get access to the docker hub. I get a x509: Certificate signed by unknown authority. My company are using zScaler as
man-in-the-middle firewall – Ephreal
Jun 19 '16 at 10:38
Thank you; didn't know you could save an image into a tar ball. I will try this. –
Ephreal
Dec 15 '16 at 15:43
I just had to deal with this issue myself - downloading an image from a restricted machine with Internet access, but no Docker
client for use on a another restricted machine with the Docker client, but no Internet access. I posted my question to the
DevOps Stack Exchange site :
With help from the Docker Community I was able to find a resolution to my problem. What follows is my solution.
So it turns out that the Moby Project has a shell script on the
Moby GitHub account which can download images from
Docker Hub in a format that can be imported into Docker:
In practice I would have to first copy the data from the Internet client (which does not have Docker installed) to
the target/destination machine (which does have Docker installed):
Docker is an application that makes it simple and easy to run application processes in a
container, which are like virtual machines, only more portable, more resource-friendly, and
more dependent on the host operating system. For a detailed introduction to the different
components of a Docker container, check out
The Docker Ecosystem: An Introduction to Common Components .
There are two methods for installing Docker on CentOS 7. One method involves installing it
on an existing installation of the operating system. The other involves spinning up a server
with a tool called
Docker Machine that auto-installs Docker on it.
In this tutorial, you'll learn how to install and use it on an existing installation of
CentOS 7.
Prerequisites
64-bit CentOS 7 Droplet
Non-root user with sudo privileges. A CentOS 7 server set up using Initial
Setup Guide for CentOS 7 explains how to set this up.
Note: Docker requires a 64-bit version of CentOS 7 as well as a kernel version equal to or
greater than 3.10. The default 64-bit CentOS 7 Droplet meets these requirements.
All the commands in this tutorial should be run as a non-root user. If root access is
required for the command, it will be preceded by sudo . Initial
Setup Guide for CentOS 7 explains how to add users and give them sudo access.
Step 1
-- Installing Docker
The Docker installation package available in the official CentOS 7 repository may not be the
latest version. To get the latest and greatest version, install Docker from the official Docker
repository. This section shows you how to do just that.
But first, let's update the package database:
sudo yum check-update
Now run this command. It will add the official Docker repository, download the latest
version of Docker, and install it:
curl -fsSL https://get.docker.com/ | sh
After installation has completed, start the Docker daemon:
sudo systemctl start docker
Verify that it's running:
sudo systemctl status docker
The output should be similar to the following, showing that the service is active and
running:
Output ● docker.service - Docker Application Container Engine Loaded: loaded
(/lib/systemd/system/docker.service; enabled; vendor preset: enabled) Active: active (running)
since Sun 2016-05-01 06:53:52 CDT; 1 weeks 3 days ago Docs: https://docs.docker.com Main PID:
749 (docker)
Lastly, make sure it starts at every server reboot:
sudo systemctl enable docker
Installing Docker now gives you not just the Docker service (daemon) but also the
docker command line utility, or the Docker client. We'll explore how to use the
docker command later in this tutorial.
Step 2 -- Executing Docker Command
Without Sudo (Optional)
By default, running the docker command requires root privileges -- that is, you
have to prefix the command with sudo . It can also be run by a user in the docker
group, which is automatically created during the installation of Docker. If you attempt to run
the docker command without prefixing it with sudo or without being in
the docker group, you'll get an output like this:
Output docker: Cannot connect to the
Docker daemon. Is the docker daemon running on this host?. See 'docker run --help'.
If you want to avoid typing sudo whenever you run the docker
command, add your username to the docker group:
sudo usermod -aG docker $(whoami)
You will need to log out of the Droplet and back in as the same user to enable this
change.
If you need to add a user to the docker group that you're not logged in as,
declare that username explicitly using:
sudo usermod -aG docker username
The rest of this article assumes you are running the docker command as a user
in the docker user group. If you choose not to, please prepend the commands with
sudo .
Step 3 -- Using the Docker Command
With Docker installed and working, now's the time to become familiar with the command line
utility. Using docker consists of passing it a chain of options and subcommands
followed by arguments. The syntax takes this form:
docker [option] [command] [arguments]
To view all available subcommands, type:
docker
As of Docker 1.11.1, the complete list of available subcommands includes:
Output attach
Attach to a running container build Build an image from a Dockerfile commit Create a new image
from a container's changes cp Copy files/folders between a container and the local filesystem
create Create a new container diff Inspect changes on a container's filesystem events Get real
time events from the server exec Run a command in a running container export Export a
container's filesystem as a tar archive history Show the history of an image images List images
import Import the contents from a tarball to create a filesystem image info Display system-wide
information inspect Return low-level information on a container or image kill Kill a running
container load Load an image from a tar archive or STDIN login Log in to a Docker registry
logout Log out from a Docker registry logs Fetch the logs of a container network Manage Docker
networks pause Pause all processes within a container port List port mappings or a specific
mapping for the CONTAINER ps List containers pull Pull an image or a repository from a registry
push Push an image or a repository to a registry rename Rename a container restart Restart a
container rm Remove one or more containers rmi Remove one or more images run Run a command in a
new container save Save one or more images to a tar archive search Search the Docker Hub for
images start Start one or more stopped containers stats Display a live stream of container(s)
resource usage statistics stop Stop a running container tag Tag an image into a repository top
Display the running processes of a container unpause Unpause all processes within a container
update Update configuration of one or more containers version Show the Docker version
information volume Manage Docker volumes wait Block until a container stops, then print its
exit code
To view the switches available to a specific command, type:
docker docker-subcommand --help
To view system-wide information, use:
docker info
Step 4 -- Working with Docker Images
Docker containers are run from Docker images. By default, it pulls these images from Docker
Hub, a Docker registry managed by Docker, the company behind the Docker project. Anybody can
build and host their Docker images on Docker Hub, so most applications and Linux distributions
you'll need to run Docker containers have images that are hosted on Docker Hub.
To check whether you can access and download images from Docker Hub, type:
docker run hello-world
The output, which should include the following, should indicate that Docker in working
correctly:
Output Hello from Docker. This message shows that your installation appears to be
working correctly. ...
You can search for images available on Docker Hub by using the docker command
with the search subcommand. For example, to search for the CentOS image, type:
docker search centos
The script will crawl Docker Hub and return a listing of all images whose name match the
search string. In this case, the output will be similar to this:
Output NAME DESCRIPTION
STARS OFFICIAL AUTOMATED centos The official build of CentOS. 2224 [OK] jdeathe/centos-ssh
CentOS-6 6.7 x86_64 / CentOS-7 7.2.1511 x8... 22 [OK] jdeathe/centos-ssh-apache-php CentOS-6
6.7 x86_64 / Apache / PHP / PHP M... 17 [OK] million12/centos-supervisor Base CentOS-7 with
supervisord launcher, h... 11 [OK] nimmis/java-centos This is docker images of CentOS 7 with
dif... 10 [OK] torusware/speedus-centos Always updated official CentOS docker imag... 8 [OK]
nickistre/centos-lamp LAMP on centos setup 3 [OK] ...
In the OFFICIAL column, OK indicates an image built and supported by the company behind the
project. Once you've identifed the image that you would like to use, you can download it to
your computer using the pull subcommand, like so:
docker pull centos
After an image has been downloaded, you may then run a container using the downloaded image
with the run subcommand. If an image has not been downloaded when
docker is executed with the run subcommand, the Docker client will
first download the image, then run a container using it:
docker run centos
To see the images that have been downloaded to your computer, type:
docker images
The output should look similar to the following:
[secondary_lable Output]
REPOSITORY TAG IMAGE ID CREATED SIZE
centos latest 778a53015523 5 weeks ago 196.7 MB
hello-world latest 94df4f0ce8a4 2 weeks ago 967 B
As you'll see later in this tutorial, images that you use to run containers can be modified
and used to generate new images, which may then be uploaded ( pushed is the technical
term) to Docker Hub or other Docker registries.
Step 5 -- Running a Docker Container
The hello-world container you ran in the previous step is an example of a
container that runs and exits, after emitting a test message. Containers, however, can be much
more useful than that, and they can be interactive. After all, they are similar to virtual
machines, only more resource-friendly.
As an example, let's run a container using the latest image of CentOS. The combination of
the -i and -t switches gives you interactive shell access into the container:
docker run -it centos
Your command prompt should change to reflect the fact that you're now working inside the
container and should take this form:
Output [root@59839a1b7de2 /]#
Important: Note the container id in the command prompt. In the above example, it is
59839a1b7de2 .
Now you may run any command inside the container. For example, let's install MariaDB server
in the running container. No need to prefix any command with sudo , because you're
operating inside the container with root privileges:
yum install mariadb-server
Step 6 -- Committing Changes in a Container to a Docker Image
When you start up a Docker image, you can create, modify, and delete files just like you can
with a virtual machine. The changes that you make will only apply to that container. You can
start and stop it, but once you destroy it with the docker rm command, the changes
will be lost for good.
This section shows you how to save the state of a container as a new Docker image.
After installing MariaDB server inside the CentOS container, you now have a container
running off an image, but the container is different from the image you used to create it.
To save the state of the container as a new image, first exit from it:
exit
Then commit the changes to a new Docker image instance using the following command. The -m
switch is for the commit message that helps you and others know what changes you made, while -a
is used to specify the author. The container ID is the one you noted earlier in the tutorial
when you started the interactive docker session. Unless you created additional repositories on
Docker Hub, the repository is usually your Docker Hub username:
docker commit -m "What did you do to the image" -a "Author Name" container-id repository
/ new_image_name
For example:
docker commit -m "added mariadb-server" -a "Sunday Ogwu-Chinuwa" 59839a1b7de2
finid/centos-mariadb
Note: When you commit an image, the new image is saved locally, that is, on your
computer. Later in this tutorial, you'll learn how to push an image to a Docker registry like
Docker Hub so that it may be assessed and used by you and others.
After that operation has completed, listing the Docker images now on your computer should
show the new image, as well as the old one that it was derived from:
docker images
The output should be of this sort:
Output REPOSITORY TAG IMAGE ID CREATED SIZE
finid/centos-mariadb latest 23390430ec73 6 seconds ago 424.6 MB centos latest 778a53015523 5
weeks ago 196.7 MB hello-world latest 94df4f0ce8a4 2 weeks ago 967 B
In the above example, centos-mariadb is the new image, which was derived from the existing
CentOS image from Docker Hub. The size difference reflects the changes that were made. And in
this example, the change was that MariaDB server was installed. So next time you need to run a
container using CentOS with MariaDB server pre-installed, you can just use the new image.
Images may also be built from what's called a Dockerfile. But that's a very involved process
that's well outside the scope of this article. We'll explore that in a future
article.
Step 7 -- Listing Docker Containers
After using Docker for a while, you'll have many active (running) and inactive containers on
your computer. To view the active ones, use:
docker ps
You will see output similar to the following:
Output CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES f7c79cc556dd centos "/bin/bash" 3 hours ago Up 3 hours silly_spence
To view all containers -- active and inactive, pass it the -a switch:
docker ps -a
To view the latest container you created, pass it the -l switch:
docker ps -l
Stopping a running or active container is as simple as typing:
docker stop container-id
The container-id can be found in the output from the docker ps
command.
Step 8 -- Pushing Docker Images to a Docker Repository
The next logical step after creating a new image from an existing image is to share it with
a select few of your friends, the whole world on Docker Hub, or other Docker registry that you
have access to. To push an image to Docker Hub or any other Docker registry, you must have an
account there.
This section shows you how to push a Docker image to Docker Hub.
To create an account on Docker Hub, register at Docker Hub . Afterwards, to push your image, first log into
Docker Hub. You'll be prompted to authenticate:
docker login -u docker-registry-username
If you specified the correct password, authentication should succeed. Then you may push your
own image using:
It will take sometime to complete, and when completed, the output will be of this
sort:
Output The push refers to a repository [docker.io/finid/centos-mariadb] 670194edfaf5:
Pushed 5f70bf18a086: Mounted from library/centos 6a6c96337be1: Mounted from library/centos ...
After pushing an image to a registry, it should be listed on your account's dashboard, like
that show in the image below.
"... If you're running Debian 8 Jessie, you can install Docker 1.6.2, through backports. This version was released on May 14, 2015. That's 3 years old, but Debian Jessie is fairly old as well. ..."
Last week, a new version of docker.io, the Docker package provided by Debian, was uploaded to
Debian Unstable. Quickly afterwards, the package moved to
Debian Testing. This is good news for Debian users, as
before that the package was more or less abandoned in "unstable", and the future was uncertain.
The most striking fact about this change: it's the first time in two years that docker.io has migrated to "testing".
Another interesting fact is that, version-wise, the package is moving from 1.13.1 from early 2017 to version 18.03
from March 2018: that's a one-year leap forward.
Let me give you a very rough summary of how things came to be. I personally started to work on that early in 2018. I joined the
Debian Go Packaging Team and I started to work on the many, many Docker dependencies that needed to be updated in order
to update the Docker package itself. I could get some of this work uploaded to Debian, but ultimately I was a bit stuck on how to
solve the circular dependencies that plague the Docker package. This is where another Debian Developer, Dmitry Smirnov, jumped in.
We discussed the current status and issues, and then he basically did all the job, from updating the package to tackling all the
long-time opened bugs.
This is for the short story, let me know give you some more details.
The Docker package in Debian
To better understand why this update of the docker.io package is such a good news, let's have quick look at the current
Debian offer:
rmadison -u debian docker.io
If you're running Debian 8 Jessie, you can install Docker 1.6.2, through backports. This version was released on May 14, 2015.
That's 3 years old, but Debian Jessie is fairly old as well.
If you're running Debian 9 Stretch (ie. Debian stable), then you have no install candidate. No-thing. The current Debian doesn't
provide any package for Docker. That's a bit sad.
What's even more sad is that for quite a while, looking into Debian unstable didn't look promising either. There used to be a
package there, but it had bugs that prevented it to migrate to Debian testing. This package was stuck at the version 1.13.1
, released on Feb 8, 2017. Looking at the git history, there was not much happening.
As for the reason for this sad state of things, I can only guess. Packaging Docker is a tedious work, mainly due to a very big
dependency tree. After handling all these dependencies, there are other issues to tackle, some related to Go packaging itself, and
others due to Docker release process and development workflow. In the end, it's quite difficult to find the right approach to package
Docker, and it's easy to make mistakes that cost hours of works. I did this kind of mistakes. More than once.
So packaging Docker is not for the faint of heart, and maybe it's too much of a burden for one developer alone. There was a
docker-maint mailing list that suggests an attempt to coordinate the effort, however this list was already dead by the
time I found it. It looks like the people involved walked away.
Another explanation for the disinterest in the Docker package could be that Docker itself already provides a Debian package on
docker.com. One can always fall back to this solution, so why bothering with the extra-work of doing a Debian package proper?
That's what the next part is about!
Docker.io vs Docker-ce
You have two options to install Docker on Debian: you can get the package from docker.com (this package is named docker-ce
), or you can get it from the Debian repositories (this package is named docker.io ). You can rebuild both of these
packages from source: for docker-ce you can fetch the source code with git (it includes the packaging files), and for
docker.io you can just get the source package with apt , like for every other Debian package.
So what's the difference between these two packages?
No suspense, straight answer: what differs is the build process, and mostly, the way dependencies are handled.
Docker is written in Go, and Golang comes with some tooling that allows applications to keep a local copy of their dependencies
in their source tree. In Go-talk, this is called vendoring . Docker makes heavy use of that (like many other Go applications),
which means that the code is more or less self-contained. You can build Docker without having to solve external dependencies, as
everything needed is already in-tree.
That's how the docker-ce package provided by Docker is built, and that's what makes the packaging files for this
package trivial. You can look at these files at
https://github.com/docker/docker-ce/tree/master/components/packaging/deb
. So everything is in-tree, there's almost no external build dependency, and hence it's real easy for Docker to provide a new package
for 'docker-ce' every month.
On the other hand, the docker.io package provided by Debian takes a completely different approach: Docker is built
against the libraries that are packaged in Debian, instead of using the local copies that are present in the Docker source tree.
So if Docker is using libABC version 1.0, then it has a build dependency on libABC . You can have a look at the
current build dependencies at https://salsa.debian.org/docker-team/docker/blob/master/debian/control
.
There are more than 100 dependencies there, and that's one reason why the Debian package is a quite time-consuming to maintain.
To give you a rough estimation, in order to get the current "stable" release of Docker to Debian "unstable", it took up to 40 uploads
of related packages to stabilize the dependency tree.
It's quite an effort. And once again, why bother? For this part I'll quote Dmitry as he puts it better than me:
> Debian cares about reusable libraries, and packaging them individually allows to
> build software from tested components, as Golang runs no tests for vendored
> libraries. It is a mind blowing argument given that perhaps there is more code
> in "vendor" than in the source tree.
>
> Private vendoring have all disadvantages of static linking
,
> making it impossible to provide meaningful security support. On top of that, it
> is easy to lose control of vendored tree; it is difficult to track changes in
> vendored dependencies and there is no incentive to upgrade vendored components.
That's about it, whether it matters is up to you and your use-case. But it's definitely something you should know about if you
want to make an informed decision on which package you're about to install and use.
To finish with this article, I'd like to give more details on the packaging of docker.io , and what was done to get this
new version in Debian.
Under the hood of the docker.io package
Let's have a brief overview of the difficulties we had to tackle while packaging this new version of Docker.
The most outstanding one is circular dependencies. It's especially present in the top-level dependencies of Docker: docker/swarmkit
, docker/libnetwork , containerd ... All of these are Docker build dependencies, and all of these depend
on Docker to build. Good luck with that ;)
To solve this issue, the new docker.io package leverages MUT (Multiple Upstream Tarball) to have these different components
downloaded and built all at once, instead of being packaged separately. In this particular case it definitely makes sense, as we're
really talking about different parts of Docker. Even if they live in different git repositories, these components are not standalone
libraries, and there's absolutely no good reason to package them separately.
Another issue with Docker is "micro-packaging", ie. wasting time packaging small git repositories that, in the end, are only used
by one application (Docker in our case). This issue is quite interesting, really. Let me try to explain.
Golang makes it extremely easy to split a codebase among several git repositories. It's so easy that some projects (Docker in
our case) do it extensively, as part of their daily workflow. And in the end, at a first glance you can't really say if a dependency
of Docker is really a standalone project (that would require a proper packaging), or only just a part of Docker codebase, that happens
to live in a different git repository. In this second case, there's really no reason to package it independently of Docker.
As a packager, if you're not a bit careful, you can easily fall in this trap, and start packaging every single dependency without
thinking: that's "micro-packaging". It's bad in the sense that it increases the maintenance cost on the long-run, and doesn't bring
any benefit. As I said before, docker.io has currently 100+ dependencies, and probably a few of them fall in this category.
While working on this new version of docker.io , we decided to stop packaging such dependencies. The guideline is that
if a dependency has no semantic versioning , and no consumer other than Docker,
then it's not a library, it's just a part of Docker codebase.
Even though some tools like dh-make-golang
make it very easy to package simple Go packages, it doesn't mean that everything should be packaged. Understanding that, and taking
a bit of time to think before packaging, is the key to successful Go packaging!
Last words
I could go on for a while on the technical details, there's a lot to say, but let's not bore you to death, so that's it. I hope
by now you understand that:
There's now an up-to-date docker.io package in Debian.
docker.io and docker-ce both give you a Docker binary, but through a very different build process.
Maintaining the 'docker.io' package is not an easy task.
If you care about having a Docker package in Debian, feel free to try it out, and feel free to join the maintenance effort!
Let's finish with a few credits. I've been working on that topic, albeit sparingly, for the last 4 months, thanks to the support
of Collabora . As for Dmitry Smirnov, the work he did on the docker.io
package represents a three weeks, full-time effort, which was sponsored by Libre
Solutions Pty Ltd .
I'd like to thank the Debian Go Packaging Team for their support,
and also the reviewers of this article, namely Dmitry Smirnov and Héctor Orón Martínez.
Last but not least, I will attend DebConf18 in Taiwan, where I will
give a speak on this topic. There's also a BoF on Go Packaging planned.
"... It isn't a full Virtual Machine, so it avoids that overhead and inefficiency, but it does isolate your applications from "update and die" problems, most of the time. "Docker" is a big one. ..."
Sidebar on Containers: The basic idea is to isolate a bit of production application from all
the rest of the system and make sure it has a consistent environment. So you package up your
DNS server with the needed files and systems config and what-all and stick it in a container
that runs under a host operating system.
It isn't a full Virtual Machine, so it avoids that overhead and inefficiency, but it
does isolate your applications from "update and die" problems, most of the time. "Docker" is a
big one.
Lately Red Hat et. al. have been pushing for a strongly systemD dependent kubernets
instead.
The need to rapidly toss a VM into production and bring up a 'container' application on it
drove (IMHO) much of the push to move all sorts of stuff into systemD to make booting very fast
(even if it then doesn't work reliably /snarc;)
Much of the commercial world has moved to putting things in Docker or other container
systems.
On BSD their equivalent is called "jails" as it keeps each application instance isolated
from the system and from other applications. On "my Cray" we used a precursor tech of change
root "chroot" to isolate things for security; but I got off that train before it reached the
"jails" and "docker" station.
The main benefit of Docker is that it automatically solves the problems with versioning and
cross-platform deployment, as the images can be easily recombined to form any version and can
run in any environment where Docker is installed. "Run anywhere" meme...
James Lee ,
former Software Engineer at Google (2013-2016) Answered Jul
12 · Author has 106 answers and 258.1k answer views
There are many beneifits of Docker. Firstly, I would mention the beneifits of Docker and
then let you know about the future of Docker. The content mentioned here is from my recent
article on Docker.
Docker Beneifits:
Docker is an open-source project based on Linux containers. It uses the features based on
the Linux Kernel. For example, namespaces and control groups create containers. But are
containers new? No, Google has been using it for years! They have their own container
technology. There are some other Linux container technologies like Solaris Zones, LXC, etc.
These container technologies are already there before Docker came into existence. Then why
Docker? What difference did it make? Why is it on the rise? Ok, I will tell you why!
Number 1: Docker offers ease of use
Taking advantage of containers wasn't an easy task with earlier technologies. Docker has
made it easy for everyone like developers, system admins, architects, and more. Test portable
applications are easy to build. Anyone can package an application from their laptop. He/She can
then run it unmodified on any public/private cloud or bare metal. The slogan is, "build once,
run anywhere"!
Number 2: Docker offers speed
Being lightweight, the containers are fast. They also consume fewer resources. One can
easily run a Docker container in seconds. On the other side, virtual machines usually take
longer as they go through the whole process of booting up the complete virtual operating
system, every time!
Number 3: The Docker Hub
Docker offers an ecosystem known as the Docker Hub. You can consider it as an app store for
Docker images. It contains many public images created by the community. These images are ready
to use. You can easily search the images as per your requirements.
Number 4: Docker gives modularity and scalability
It is possible to break down the application functionality into individual containers.
Docker gives this freedom! It is easy to link containers together and create your application
with Docker. One can easily scale and update components independently in the future.
The Future
A lot of people come and ask me that "Will Docker eat up virtual machines?" I don't think
so! Docker is gaining a lot of momentum but this won't affect virtual machines. This reason is
that virtual machines are better under certain circumstances as compared to Docker. For
example, if there is a requirement of running multiple applications on multiple servers, then
virtual machines is a better choice. On the contrary, if there is a requirement to run multiple
copies of a single application, Docker is a better choice.
Docker containers could create a problem when it comes to security because containers share
the same kernel. The barriers between containers are quite thin. But I do believe that security
and management improve with experience and exposure. Docker certainly has a great future! I
hope that this Docker tutorial has helped you understand the basics of Containers, VM's, and
Dockers. But Docker in itself is an ocean. It isn't possible to study Docker in just one
article. For an in-depth study of Docker, I recommend this Docker course.
Please feel free to Like/Subscribe/Comment on my YouTube Videos/Channel mentioned below
:
"Docker is both a daemon (a process running in the background) and a client command. It's
like a virtual machine but it's different in important ways. First, there's less duplication.
With each extra VM you run, you duplicate the virtualization of CPU and memory and quickly run
out resources when running locally. Docker is great at setting up a local development
environment because it easily adds the running process without duplicating the virtualized
resource. Second, it's more modular. Docker makes it easy to run multiple versions or instances
of the same program without configuration headaches and port collisions. Try that in a VM!
With Docker, developers can focus on writing code without worrying about the system on which
their code will run. Applications become truly portable. You can repeatably run your
application on any other machine running Docker with confidence. For operations staff, Docker
is lightweight, easily allowing the running and management of applications with different
requirements side by side in isolated containers. This flexibility can increase resource use
per server and may reduce the number of systems needed because of its lower overhead, which in
turn reduces cost.
Docker has made Linux containerization technology easy to use.
There are a dozen reasons to use Docker. I'll focus here on three: consistency, speed and
isolation. By consistency , I mean that Docker provides a consistent environment for
your application from development all the way through production – you run from the same
starting point every time. By speed , I mean you can rapidly run a new process on a
server. Because the image is preconfigured and installed with the process you want to run, it
takes the challenge of running a process out of the equation. By isolation , I mean that
by default each Docker container that's running is isolated from the network, the file system
and other running processes.
A fourth reason is Docker's layered file system. Starting from a base image, every change
you make to a container or image becomes a new layer in the file system. As a result, file
system layers are cached, reducing the number of repetitive steps during the Docker build
process AND reducing the time it takes to upload and download similar images. It also allows
you to save the container state if, for example, you need troubleshoot why a container is
failing. The file system layers are like Git, but at the file system level. Each Docker image
is a particular combination of layers in the same way that each Git branch is a particular
combination of commits."
Docker is the most popular file format for Linux-based container development and
deployments. If you're using containers, you're most likely familiar with the
container-specific toolset of Docker tools that enable you to create and deploy container
images to a cloud-based container hosting environment.
This can work great for brand-new environments, but it can be a challenge to mix container
tooling with the systems and tools you need to manage your traditional IT environments. And, if
you're deploying your containers locally, you still need to manage the underlying
infrastructure and environment.
Portability: let's suppose in the case of Linux you have your own customized Nginx
container. You can run that Nginx container anywhere, no matter it's a cloud or data center on
even your own laptop as long as you have a docker engine running Linux OS.
Rollback: you can just run your previous build image and all charges will
automatically roll back.
Image Simplicity: Every image has a tree hierarchy and all the child images depend
upon its parent image. For example, let's suppose there is a vulnerability in docker container,
you can easily identify and patch that parent image and when you will rebuild child,
variability will automatically remove from the child images also.
Container Registry: You can store all images at a central location, you can apply
ACLs, you can do vulnerability scanning and image signing.
Runtime: No matter you want to run thousand of container you can start all within
five seconds.
Isolation: We can run hundred of the process in one Os and all will be isolated to
each other.
Ethen , Web Designer
(2015-present) Answered
Aug 30, 2018 · Author has 154 answers and 56.2k answer views
Docker is an open platform for every one of the developers bringing them a large number of
open source venture including the arrangement open source Docker
tools , and the management framework with in excess of 85,000 Dockerized applications.
Docker is even today accepted to be something more than only an application stage. What's more,
the compartment eco framework is proceeding to develop so quick that with such a large number
of Docker devices being made accessible on the web, it starts to feel like an overwhelming
undertaking when you are simply attempting to comprehend the accessible alternatives kept
directly before you.
From my personal experience, I think people just want to containerize everything without
looking at how the architectural considerations change which basically ruins the
technology.
e.g. How will someone benefit from creating FAT container images of a size of a VM when the
basic advantage of docker is to ship lightweight images.
Among growing container
trends , here's an important one: As containers go, so goes container
orchestration. That's because most organizations quickly realize that managing containers in
production can get complicated in a hurry. Orchestration solves that problem, and while there
are multiple options, Kubernetes
has become
the de facto leader .
Kubernetes' star appeal does lead to some misunderstandings and outright myths, though. We
asked a range of IT leaders and container experts to identify the biggest misconceptions about
Kubernetes – and the realities behind each of them – to help people who are just
getting going with the technology. Here are five important ones to know before you get your
hands dirty.
Misunderstanding #1: Kubernetes is only for public cloud
Reality: Kubernetes is commonly referred to as a cloud-native
technology, and for good reason. The project, which was first developed by a team at Google , currently calls the Cloud Native Computing Foundation home. ( Red Hat , one of the first
companies to work with Google on Kubernetes, has become the second-leading
contributor to Kubernetes upstream project.)
"Kubernetes is cloud-native in the sense that it has been designed to take advantage of
cloud computing architecture [and] to support scale and resilience for distributed
applications," says Raghu Kishore Vempati, principal systems engineer at Aricent .
"Kubernetes can run on different platforms, be it a personal laptop, VM, rack of bare-metal
servers, public/private cloud environment, et cetera," Vempati says.
Notes Red Hat technology evangelist Gordon Haff , "You can cluster together
groups of hosts running Linux containers, and Kubernetes helps you easily and efficiently
manage those clusters. These clusters can span hosts across public, private, and hybrid clouds
."
Misunderstanding #2: Kubernetes is a finished product
Reality: Kubernetes isn't really a product at all, much less a finished one.
"Kubernetes is an open source project, not a product," says Murli Thirumale, co-founder and
CEO at Portworx . (Portworx co-founder and
VP of product management Eric Han was the first Kubernetes product manager while at
Google.)
The Kubernetes ecosystem moves very quickly.
New users should understand a fundamental reality here: The Kubernetes ecosystem moves very
quickly. It's even been dubbed the fastest-moving
project in open source history.
"Take your eyes off of it for only one moment, and everything changes," Frank Reno, senior
technical product manager at Sumo
Logic . "It is a fast-paced, highly active community that develops Kubernetes and the
related projects. As it changes, it also changes the way you need to look at and develop
things. It's all for the better, but still, much to keep up on."
Misunderstanding #3:
Kubernetes is simple to run out of the box
"For those new to Kubernetes there's often an 'aha' moment as they realize it's not that
easy to do right."
Reality: It may be "easy" to get it up and running on a local machine, but it can quickly
get more complicated from there. "For those new to Kubernetes, there's often an 'aha' moment as
they realize it's not that easy to do right," says Amir Jerbi, co-founder and CTO at Aqua Security .
Jerbi notes that this is a key reason for the growth of commercial Kubernetes platforms on
top of the open source project, as well as managed services and consultancies. "Setting up and
managing K8s correctly requires time, knowledge, and skills, and the skill gap should not be
underestimated," Jerbi says.
Some organizations are still going to learn that the hard way, drawn in by the considerable
potential of Kubernetes and the table-stakes necessity of a using container management or
orchestration tool for running containers at scale in a production environment.
"Kubernetes is a very popular and very powerful platform," says Wei Lien Dang, VP of
products at StackRox . "Given the DIY
mindset that comes along with open source software, users often think they should be working
directly in the Kubernetes system itself. But this understanding is misguided."
Dang points to needs such as supporting high availability and resilience. Both, he says,
become easier when using abstraction layers on top of the core Kubernetes platform, such as a
UX layer to enable various end users to get the most value out of the technology.
"One of the major benefits of open source software is that it can be downloaded and used
with no license cost – but very often, making this community software usable in a
corporate environment will require a significant investment in technical effort to integrate
[or] bundle with other technologies," says Andy Kennedy, managing director at Tier 2 Consulting . "For example, in order
to provide a full set of orchestrated services, Kubernetes relies on other services provided by
open source projects, such as registry, security, telemetry, networking, and automation."
Complete container application platforms, such as Red Hat OpenShift
, eliminate the need to assemble those pieces yourself.
This gets back to the difference between the Kubernetes project and the maturing Kubernetes platforms built on
top of that project.
"Do-it-yourself Kubernetes can work with some dedicated resources, but consider a more
productized and supported [platform]," says Portworx's Thirumale. "These will help you go to
production faster." Misunderstanding #4: Kubernetes is an all-encompassing framework for
building and deploying applications
Reality: "By itself, Kubernetes does not provide any primitives for applications such as
databases, middleware, storage, [and so forth]," says Aricent's Vempati.
Developers still need to include the necessary services and components for their respective
applications, Vempati notes, yet some people overlook this.
"Kubernetes is a platform for managing containerized workloads and services with independent
and composable processes," Vempati says. "How the applications and services are orchestrated on
the platform is for the developers to define."
You can't just "lift and shift" a monolithic app into Kubernetes and say, boom, we have a
microservices architecture.
In a similar vein, some folks simply misunderstand what Kubernetes does in a more
fundamental way. Jared Sikander, CTO at NetEnrich , encounters a key misconception in the marketplace
that Kubernetes "provides containerization and microservices ." That's a misnomer.
It's a tool for deploying and managing containers and containerized microservices. You can't
just "lift and shift" a monolithic app into Kubernetes and say, boom, we have a microservices
architecture now.
"In reality, you have to refactor your applications into microservices," Sikander says.
"Kubernetes provides the platform to deploy and scale your microservices."
Misunderstanding #5: Kubernetes
inherently secures your containers
Reality: Container
security is one of the brave new worlds in the broader threat landscape. (That's evident in
the growing number of container security firms, such as Aqua, StackRox, and others.)
Kubernetes does have critical capabilities for managing the security of your containers, but
keep in mind it is not in and of itself a security platform, per se.
"Kubernetes has a lot of powerful controls built in for network policy enforcement, for
example, but accessing them natively in Kubernetes means working in a YAML file," says Dang
from StackRox. This also gets back to leveraging the right tools or abstraction layers on top
of Kubernetes to make its security-oriented features more consumable.
It's also a matter of rethinking your old security playbook for containers and for hybrid
cloud and multi-cloud environments in general.
"As enterprises increasingly flock to Kubernetes, too many organizations are still making
the dangerous mistake of relying on their previously used security measures – which
really aren't suited to protecting Kubernetes and containerized environments," says Gary Duan,
CTO at NeuVector . "While traditional
firewalls and endpoint security are postured to defend against external threats, malicious
threats to containers often grow and expand laterally via internal traffic, where more
traditional tools have zero visibility."
Security, like other considerations with containers and Kubernetes, is also a very different
animal when you're ready to move into production.
In part
two of this series, we clear up some of the misconceptions about running Kubernetes in a
production environment versus experimenting with it in a test or dev environment. The
differences can be significant.
Yes, you can. Years before Docker made containers a household term (if you live in a data
center, that is), the LXC project
developed the concept of running a kind of virtual operating system, sharing the same kernel,
but contained within defined groups of processes.
Docker built on LXC, and today there are plenty of platforms that leverage the work of LXC
both directly and indirectly. Most of these platforms make creating and maintaining containers
sublimely simple, and for large deployments, it makes sense to use such specialized services.
However, not everyone's managing a large deployment or has access to big services to learn
about containerization. The good news is that you can create, use, and learn containers with
nothing more than a PC running Linux and this article. This article will help you understand
containers by looking at LXC, how it works, why it works, and how to troubleshoot when
something goes wrong.
If you're looking for a quick-start guide to LXC, refer to the excellent Linux Containers website.
Installing LXC
If it's not already installed, you can install LXC with your package manager.
On Fedora or similar, enter:
$ sudo dnf install lxc lxc-templates lxc-doc
On Debian, Ubuntu, and similar, enter:
$ sudo apt install lxc
Creating a network bridge
Most containers assume a network will be available, and most container tools expect the user
to be able to create virtual network devices. The most basic unit required for containers is
the network bridge, which is more or less the software equivalent of a network switch. A
network switch is a little like a smart Y-adapter used to split a headphone jack so two people
can hear the same thing with separate headsets, except instead of an audio signal, a network
switch bridges network data.
You can create your own software network bridge so your host computer and your container OS
can both send and receive different network data over a single network device (either your
Ethernet port or your wireless card). This is an important concept that often gets lost once
you graduate from manually generating containers, because no matter the size of your
deployment, it's highly unlikely you have a dedicated physical network card for each container
you run. It's vital to understand that containers talk to virtual network devices, so you know
where to start troubleshooting if a container loses its network connection.
To create a network bridge on your machine, you must have the appropriate permissions. For
this article, use the sudo command to operate with root privileges. (However, LXC docs provide
a configuration to grant users permission to do this without using sudo .)
$ sudo ip link add br0 type bridge
Verify that the imaginary network interface has been created:
$ sudo ip addr show br0
7: br0: <BROADCAST,MULTICAST> mtu 1500 qdisc
noop state DOWN group default qlen 1000
link/ether 26:fa:21:5f:cf:99 brd ff:ff:ff:ff:ff:ff
Since br0 is seen as a network interface, it requires its own IP address. Choose a valid
local IP address that doesn't conflict with any existing IP address on your network and assign
it to the br0 device:
$ sudo ip addr add 192.168.168.168 dev br0
And finally, ensure that br0 is up and running:
$ sudo ip link set br0 up
Setting the container config
The config file for an LXC container can be as complex as it needs to be to define a
container's place in your network and the host system, but for this example the config is
simple. Create a file in your favorite text editor and define a name for the container and the
network's required settings:
Save this file in your home directory as mycontainer.conf .
The lxc.utsname is arbitrary. You can call your container whatever you like; it's the name
you'll use when starting and stopping it.
The network type is set to veth , which is a kind of virtual Ethernet patch cable. The idea
is that the veth connection goes from the container to the bridge device, which is defined by
the lxc.network.link property, set to br0 . The IP address for the container is in the same
network as the bridge device but unique to avoid collisions.
With the exception of the veth network type and the up network flag, you invent all the
values in the config file. The list of properties is available from man lxc.container.conf .
(If it's missing on your system, check your package manager for separate LXC documentation
packages.) There are several example config files in /usr/share/doc/lxc/examples , which you
should review later.
Launching a container shell
At this point, you're two-thirds of the way to an operable container: you have the network
infrastructure, and you've installed the imaginary network cards in an imaginary PC. All you
need now is to install an operating system.
However, even at this stage, you can see LXC at work by launching a shell within a container
space.
In this very bare container, look at your network configuration. It should look familiar,
yet unique, to you.
# /usr/sbin/ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state [...]
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
[...]
22: eth0@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> [...] qlen 1000
link/ether 4a:49:43:49:79:bd brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.168.167/24 brd 192.168.168.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 2003:db8:1:0:214:1234:fe0b:3596/64 scope global
valid_lft forever preferred_lft forever
[...]
Your container is aware of its fake network infrastructure and of a familiar-yet-unique
kernel.
# uname -av
Linux opensourcedotcom 4.18.13-100.fc27.x86_64 #1 SMP Wed Oct 10 18:34:01 UTC 2018 x86_64
x86_64 x86_64 GNU/Linux
Use the exit command to leave the container:
# exit
Installing the container operating system
Building out a fully containerized environment is a lot more complex than the networking and
config steps, so you can borrow a container template from LXC. If you don't have any templates,
look for a separate LXC template package in your software repository.
The default LXC templates are available in /usr/share/lxc/templates .
Watching a template being executed is almost as educational as building one from scratch;
it's very verbose, and you can see that lxc-create sets the "root" of the container to
/var/lib/lxc/slackware/rootfs and several packages are being downloaded and installed to that
directory.
Reading through the template files gives you an even better idea of what's involved: LXC
sets up a minimal device tree, common spool files, a file systems table (fstab), init files,
and so on. It also prevents some services that make no sense in a container (like udev for
hardware detection) from starting. Since the templates cover a wide spectrum of typical Linux
configurations, if you intend to design your own, it's wise to base your work on a template
closest to what you want to set up; otherwise, you're sure to make errors of omission (if
nothing else) that the LXC project has already stumbled over and accounted for.
Once you've installed the minimal operating system environment, you can start your
container.
You have started the container, but you have not attached to it. (Unlike the previous basic
example, you're not just running a shell this time, but a containerized operating system.)
Attach to it by name.
$ sudo lxc-attach --name slackware
#
Check that the IP address of your environment matches the one in your config file.
# /
usr / sbin / ip addr SHOW | grep eth
34 : eth0@if35: < BROADCAST , MULTICAST , UP , LOWER_UP > mtu 1500 [ ... ] 1000
link / ether 4a: 49 : 43 : 49 : 79 :bd brd ff:ff:ff:ff:ff:ff link - netnsid 0
inet 192 . 168 . 168 . 167 / 24 brd 192 . 168 . 168 . 255 scope global eth0
In real life, LXC makes it easy to create and run safe and secure containers. Containers
have come a long way since the introduction of LXC in 2008, so use its developers' expertise to
your advantage.
While the LXC instructions on linuxcontainers.org make the process
simple, this tour of the manual side of things should help you understand what's going on
behind the scenes.
The term "containers" is heavily overused. Also, depending on the context, it can mean
different things to different people.
Traditional Linux containers are really just ordinary processes on a Linux system. These
groups of processes are isolated from other groups of processes using resource constraints
(control groups [cgroups]), Linux security constraints (Unix permissions, capabilities,
SELinux, AppArmor, seccomp, etc.), and namespaces (PID, network, mount, etc.).
If you boot a modern Linux system and took a look at any process with cat
/proc/PID/cgroup , you see that the process is in a cgroup. If you look at
/proc/PID/status , you see capabilities. If you look at
/proc/self/attr/current , you see SELinux labels. If you look at
/proc/PID/ns , you see the list of namespaces the process is in. So, if you define
a container as a process with resource constraints, Linux security constraints, and namespaces,
by definition every process on a Linux system is in a container. This is why we often say
Linux is
containers, containers are Linux . Container runtimes are tools that modify these resource
constraints, security, and namespaces and launch the container.
Docker introduced the concept of a container image , which is a standard TAR file that
combines:
Rootfs (container root filesystem): A directory on the system that looks like the
standard root ( / ) of the operating system. For example, a directory with
/usr , /var , /home , etc.
JSON file (container configuration): Specifies how to run the rootfs; for example, what
command or entrypoint to run in the rootfs when the container starts; environment variables
to set for the container; the container's working directory ; and a few other settings.
Docker " tar 's up" the rootfs and the JSON file to create the base image .
This enables you to install additional content on the rootfs, create a new JSON file, and
tar the difference between the original image and the new image with the updated
JSON file. This creates a layered image .
Tools used to create container images are called container image builders . Sometimes
container engines perform this task, but several standalone tools are available that can build
container images.
Docker took these container images ( tarballs ) and moved them to a web service from which
they could be pulled, developed a protocol to pull them, and called the web service a container
registry .
Container engines are programs that can pull container images from container registries and
reassemble them onto container storage . Container engines also launch container runtimes (see
below).
Linux container internals. Illustration by Scott McCarty. CC BY-SA 4.0
Container storage is usually a copy-on-write (COW) layered filesystem. When you pull down a
container image from a container registry, you first need to untar the rootfs and place it on
disk. If you have multiple layers that make up your image, each layer is downloaded and stored
on a different layer on the COW filesystem. The COW filesystem allows each layer to be stored
separately, which maximizes sharing for layered images. Container engines often support
multiple types of container storage, including overlay , devicemapper
, btrfs , aufs , and zfs .
After the container engine downloads the container image to container storage, it needs to
create a container runtime configuration. The runtime configuration combines input from the
caller/user along with the content of the container image specification. For example, the
caller might want to specify modifications to a running container's security, add additional
environment variables, or mount volumes to the container.
The layout of the container runtime configuration and the exploded rootfs have also been
standardized by the OCI standards body as the OCI Runtime Specification .
Finally, the container engine launches a container runtime that reads the container runtime
specification; modifies the Linux cgroups, Linux security constraints, and namespaces; and
launches the container command to create the container's PID 1 . At this point, the container
engine can relay stdin / stdout back to the caller and control the
container (e.g., stop, start, attach).
Note that many new container runtimes are being introduced to use different parts of Linux
to isolate containers. People can now run containers using KVM separation (think mini virtual
machines) or they can use other hypervisor strategies (like intercepting all system calls from
processes in containers). Since we have a standard runtime specification, these tools can all
be launched by the same container engines. Even Windows can use the OCI Runtime Specification
for launching Windows containers.
At a much higher level are container orchestrators. Container orchestrators are tools used
to coordinate the execution of containers on multiple different nodes. Container orchestrators
talk to container engines to manage containers. Orchestrators tell the container engines to
start containers and wire their networks together. Orchestrators can monitor the containers and
launch additional containers as the load increases. TopicsContainersContainers columnCloudAbout the author Daniel J Walsh - Daniel Walsh has worked in the computer
security field for almost 30 years. Dan joined Red Hat in August 2001. Dan leads the RHEL
Docker enablement team since August 2013, but has been working on container technology for
several years. He has led the SELinux project, concentrating on the application space and
policy development. Dan helped developed sVirt, Secure Virtualization. He also created the
SELinux Sandbox, the Xguest user and the Secure Kiosk. Previously, Dan worked
Netect/Bindview... More about
me
Students with any interest in Information Technology or Computer Science
are going to be joining a world dominated by Cloud Computing . And of course the major
cloud service providers (CSP) would all love to see the young people embrace their cloud
platform to host the next big thing like Facebook, Instagram or SnapChat. The top three CSP all
have free offerings for students, hoping to win their minds and hearts.
But before you jump right in to cloud computing, the novice student might want to start with
some basic fundamentals of computer programming at one of the many free online resources,
including Khan Academy.
Microsoft is offering free Azure services for students. There are two different offerings.
The first is targeted at high school students ages 13+ and the second is geared towards college
students 18+.
Microsoft
Azure for Students Starter Offer is for those high school students that are interested in
building applications in the cloud. While there are not as many free services or credits as
being offered at the college level, there is certainly enough available for free to really get
some hands on experience with some cutting edge technology for the self starter. How cool would
it be for your high school to start a Cloud Computing Club, or to integrate this offering into
some of the IT classes they may already be taking.
Azure for Students
is targeted at the college level student and has many more features available for free. Any
student in computer science or information technology should definitely get some hands on
experience with these cutting edge cloud technologies and this is the perfect way to do it with
no additional out of pocket expense.
A good way to get introduced to the Azure Cloud is to start with some free online training
courses Microsoft delivers in partnership with Pluralsight.
AWS Educate . Not
to be outdone, AWS also offers some free cloud services to students and educators. These seem
to be in terms of free cloud credits, which if managed properly can go a long way. AWS also
delivers an educational program that can be combined with an
AP class in Computer Science if your high school wants to participate.
Google Cloud Platform (GCP) also has education grants available for
computer science majors at accredited universities. These seem to be the most restrictive of
the three as they are available for Computer Science Majors only at accredited
universities.
GCP does also offer training, but from what I can find I don't see any free training
offerings. If you want some hands on training you will have to r egister for some classes . The plus side of this is
that these classes all seem to be instructor led, either online or in an actual classroom. The
downside is I don't think a lot of 13 year olds are going to shell out any money to start
developing on the CGP when there are other free training opportunities available on AWS or
Azure.
For the ambitious young student, the resources are certainly there for you to be the next
Doogie Howser of Cloud
Computing.
Virtualization and containers are hot topics in today's IT industry. In this article we will
list the necessary tools to manage and configure both in Linux systems.
For many decades, virtualization has helped IT professionals to reduce operational costs and
increase energy savings. A virtual machine (or VM for short) is an emulated computer system
that runs on top of another system known as host.
VMs have limited access to the host's hardware resources (CPU, memory, storage, network
interfaces, USB devices, and so forth). The operating system running on the virtual machine is
often referred to as the guest operating system.
CPU Extensions
Before we proceed, we need to check if the virtualization extensions are enabled on our
CPU(s). To do that, use the following command, where vmx and svm are the virtualization flags
on Intel and AMD processors, respectively:
# grep --color -E 'vmx|svm' /proc/cpuinfo
No output means the extensions are either not available or not enabled in the BIOS . While
you may continue without them, performance will be negatively impacted.
Install
Virtualization Tools in Linux
To begin, let's install the necessary tools. In CentOS you will need the following
packages:
Depending on the computing resources available on the host, the above command may take some
time to bring up the virtualization viewer. This tool will enable you to perform the
installation as if you were doing it on a bare metal machine.
How to Manage Virtual
Machines in Linux
After you have created a virtual machine, here are some commands you can use to manage
it:
List all VMs:
# virsh --list all
Get info about a VM (centos7vm in this case):
# virsh dominfo centos7vm
Edit the settings of centos7vm in your default text editor:
# virsh edit centos7vm
Enable or disable autostart to have the virtual machine boot (or not) when the host
does:
I just do an actual install to a flash drive. Format as ext4, reboot to the live media, and
turn off journaling to save wear on the flash drive. Set /tmp, /var/log, /var/spool, and a few
other frequently written directories to tmpfs; again to reduce wear on the flash drive. Turn
off swap. I have been using a Linux on a flash drive for years and with prelink, ulatencyd, and
preload, it runs as well as from a hard drive. I suppose the proper way would be to use an
overlay filesystem and a persistence file but this worked for me. Just boot to USB. Another way
would be to install to an external USB drive and put the boot loader on the external drive.
How to install and setup LXC (Linux Container) on Fedora Linux 26 Posted on
July 13, 2017 July 13, 2017 in Categories
Fedora Linux ,
Linux ,
Linux Containers
(LXC) last updated July 13, 2017
H ow do I install, create and manage LXC (Linux Containers – an operating system-level virtualization)
on Fedora Linux version 26 server?
LXC is an acronym for Linux Containers. It is nothing but an operating system-level virtualization
technology for running multiple isolated Linux distros (systems containers) on a single Linux host.
This tutorial shows you how to install and manage LXC containers on Fedora Linux server.
Our sample setup
The LXC often described as a lightweight virtualization technology. You can think LXC as chrooted
jail on steroids. There is no guest operating system involved. You can only run Linux distros with
LXC. You can not run MS-Windows or *BSD or any other operating system with LXC. You can run CentOS,
Fedora, Ubuntu, Debian, Gentoo or any other Linux distro using LXC. Traditional virtualization such
as KVM/XEN/VMWARE and paravirtualization need a full operating system image for each instance. You
can run any operating system using traditional virtualization.
Installation
Type the following dnf command to install lxc and related packages on Fedora 26: $ sudo dnf install lxc lxc-templates lxc-extra debootstrap libvirt perl gpg
Sample outputs: Fig.01: LXC Installation on Fedora 26
Start and enable needed services
First start virtualization daemon named libvirtd and lxc using the systemctl command: $ sudo systemctl start libvirtd.service
$ sudo systemctl start lxc.service
$ sudo systemctl enable lxc.service
Sample outputs:
Created symlink /etc/systemd/system/multi-user.target.wants/lxc.service ? /usr/lib/systemd/system/lxc.service.
Verify that services are running: $ sudo systemctl status libvirtd.service
Sample outputs:
And: $ sudo systemctl status lxc.service
Sample outputs:
? lxc.service - LXC Container Initialization and Autoboot Code
Loaded: loaded (/usr/lib/systemd/system/lxc.service; enabled; vendor preset: disabled)
Active: active (exited) since Thu 2017-07-13 07:25:34 UTC; 1min 3s ago
Docs: man:lxc-autostart
man:lxc
Main PID: 3830 (code
=
exited, status=0/SUCCESS)
CPU: 9ms
Jul 13 07:25:34 nixcraft-f26 systemd
[1]
: Starting LXC Container Initialization and Autoboot Code...
Jul 13 07:25:34 nixcraft-f26 systemd
[1]
: Started LXC Container Initialization and Autoboot Code.
? lxc.service - LXC Container Initialization and Autoboot Code Loaded: loaded (/usr/lib/systemd/system/lxc.service;
enabled; vendor preset: disabled) Active: active (exited) since Thu 2017-07-13 07:25:34 UTC; 1min
3s ago Docs: man:lxc-autostart man:lxc Main PID: 3830 (code=exited, status=0/SUCCESS) CPU: 9ms Jul
13 07:25:34 nixcraft-f26 systemd[1]: Starting LXC Container Initialization and Autoboot Code... Jul
13 07:25:34 nixcraft-f26 systemd[1]: Started LXC Container Initialization and Autoboot Code. LXC
networking
To view configured networking interface for lxc, run: $ sudo brctl show
Sample outputs:
bridge name bridge id STP enabled interfaces
virbr0 8000.525400293323 yes virbr0-nic
You must set default bridge to virbr0 in the file /etc/lxc/default.conf: $ sudo vi /etc/lxc/default.conf
Sample config (replace lxcbr0 with virbr0 for lxc.network.link):
Save and close the file. To see DHCP range used by containers, enter: $ sudo systemctl status libvirtd.service | grep range
Sample outputs:
Jul 13 07:25:31 nixcraft-f26 dnsmasq-dhcp[3760]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
To check the current kernel for lxc support, enter: $ lxc-checkconfig
Sample outputs:
Kernel configuration not found at /proc/config.gz; searching...
Kernel configuration found at /boot/config-4.11.9-300.fc26.x86_64
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled
--- Control groups ---
Cgroup: enabled
Cgroup clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled
--- Misc ---
Veth pair device: enabled
Macvlan: enabled
Vlan: enabled
Bridges: enabled
Advanced netfilter: enabled
CONFIG_NF_NAT_IPV4: enabled
CONFIG_NF_NAT_IPV6: enabled
CONFIG_IP_NF_TARGET_MASQUERADE: enabled
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled
CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled
FUSE (for use with lxcfs): enabled
--- Checkpoint/Restore ---
checkpoint restore: enabled
CONFIG_FHANDLE: enabled
CONFIG_EVENTFD: enabled
CONFIG_EPOLL: enabled
CONFIG_UNIX_DIAG: enabled
CONFIG_INET_DIAG: enabled
CONFIG_PACKET_DIAG: enabled
CONFIG_NETLINK_DIAG: enabled
File capabilities: enabled
Note : Before booting a new kernel, you can check its configuration
usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig
How can I create a Ubuntu Linux container?
Type the following command to create Ubuntu 16.04 LTS container: $ sudo lxc-create -t download -n ubuntu-c1 -- -d ubuntu -r xenial -a amd64
Sample outputs:
Setting up the GPG keyring
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs
---
You just created an Ubuntu container (release=xenial, arch=amd64, variant=default)
To enable sshd, run: apt-get install openssh-server
For security reason, container images ship without user accounts
and without a root password.
Use lxc-attach or chroot directly into the rootfs to set a root password
or create user accounts.
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Make sure root account is locked out: $ sudo chroot /var/lib/lxc/ubuntu-c1/rootfs/ passwd
To start container run: $ sudo lxc-start -n ubuntu-c1
To login to the container named ubuntu-c1 use ubuntu user and password set earlier: $ lxc-console -n ubuntu-c1
Sample outputs: Fig.02: Launch a console for the specified container
You can now install packages and configure your server. For example, to enable sshd, run
apt-get command /
apt command : ubuntu@ubuntu-c1:~$ sudo apt-get install openssh-server
To exit from
lxc-console type Ctrl+a q to exit the console session and back to the host .
How do I create a Debain Linux container?
Type the following command to create Debian 9 ("stretch") container: $ sudo lxc-create -t download -n debian-c1 -- -d debian -r stretch -a amd64
Sample outputs:
Setting up the GPG keyring
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs
---
You just created a Debian container (release=stretch, arch=amd64, variant=default)
To enable sshd, run: apt-get install openssh-server
For security reason, container images ship without user accounts
and without a root password.
Use lxc-attach or chroot directly into the rootfs to set a root password
or create user accounts.
Setup
root account password , run: $ sudo chroot /var/lib/lxc/debian-c1/rootfs/ passwd
Start the container and login into it for management purpose, run: $ sudo lxc-start -n debian-c1
$ lxc-console -n debian-c1
How do I create a CentOS Linux container?
Type the following command to create CentOS 7 container: $ sudo lxc-create -t download -n centos-c1 -- -d centos -r 7 -a amd64
Sample outputs:
Setting up the GPG keyring
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs
---
You just created a CentOS container (release=7, arch=amd64, variant=default)
To enable sshd, run: yum install openssh-server
For security reason, container images ship without user accounts
and without a root password.
Use lxc-attach or chroot directly into the rootfs to set a root password
or create user accounts.
Set the root account password and start the container: $ sudo chroot /var/lib/lxc/centos-c1/rootfs/ passwd
$ sudo lxc-start -n centos-c1
$ lxc-console -n centos-c1
How do I create a Fedora Linux container?
Type the following command to create Fedora 25 container: $ sudo lxc-create -t download -n fedora-c1 -- -d fedora -r 25 -a amd64
Sample outputs:
Setting up the GPG keyring
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs
---
You just created a Fedora container (release=25, arch=amd64, variant=default)
To enable sshd, run: dnf install openssh-server
For security reason, container images ship without user accounts
and without a root password.
Use lxc-attach or chroot directly into the rootfs to set a root password
or create user accounts.
Set the root account password and start the container: $ sudo chroot /var/lib/lxc/fedora-c1/rootfs/ passwd
$ sudo lxc-start -n fedora-c1
$ lxc-console -n fedora-c1
How do I create a CentOS 6 Linux container and store it in btrfs ?
To display containers, updating every second, sorted by memory use: $ lxc-top --delay 1 --sort m
To display containers, updating every second, sorted by cpu use: $ lxc-top --delay 1 --sort c
To display containers, updating every second, sorted by block I/O use: $ lxc-top --delay 1 --sort b
Sample outputs: Fig.03: Shows container statistics with lxc-top
How do I destroy/delete a container?
The syntax is: $ sudo lxc-destroy -n {container}
$ sudo lxc-stop -n fedora-c2
$ sudo lxc-destroy -n fedora-c2
If a container is running, stop it first and destroy it: $ sudo lxc-destroy -f -n fedora-c2
How do I creates, lists, and restores container snapshots?
The syntax is as follows as per snapshots operation. Please note that you must use snapshot aware
file system such as BTRFS/ZFS or LVM.
Create snapshot for a container
$ sudo lxc-snapshot -n {container} -c "comment for snapshot"
$ sudo lxc-snapshot -n centos-c1 -c "13/July/17 before applying patches"
List snapshot for a container
$ sudo lxc-snapshot -n centos-c1 -L -C
Restore snapshot for a container
$ sudo lxc-snapshot -n centos-c1 -r snap0
Destroy/Delete snapshot for a container
$ sudo lxc-snapshot -n centos-c1 -d snap0
Posted by: Vivek Gite
The author is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating
system/Unix shell scripting. He has worked with global clients and in various industries, including
IT, education, defense and space research, and the nonprofit sector. Follow him on
Twitter ,
Facebook ,
Google+ .
What's new in this release (see below for details):
- - TCP and UDP connection support in WebServices.
- - Various shader improvements for Direct3D 11.
- - Improved support for high DPI settings.
- - Partial reimplementation of the GLU library.
- - Support for recent versions of OSMesa.
- - Window management improvements on macOS.
+ - Direct3D command stream runs asynchronously.
+ - Better serial and parallel ports autodetection.
+ - Still more fixes for high DPI settings.
+ - System tray notifications on macOS.
- Various bug fixes.
... improved support for
Warhammer 40,000: Dawn of War III that'll be ported to Linux and SteamOS platforms by Feral Interactive on June 8, Wine 2.9
is here to introduce support for tesselation shaders in Direct3D, binary mode support in WebServices, RegEdit UI improvements,
and clipboard changes detected through Xfixes.
...
The Wine 2.9 source tarball can be downloaded
right now from our website if you fancy compiling it on your favorite GNU/Linux distribution, but please try to keep in mind
that this is a pre-release version not suitable for production use. We recommend installing the stable Wine branch if you want to
have a reliable and bug-free experience.
Wine 2.9 will also be installable from the software repos of your operating system in the coming days.
Shuttleworth said, "LXD crushes traditional virtualisation for common enterprise environments,
where density and raw performance are the primary concerns. Canonical is taking containers to the
level of a full hypervisor, with guarantees of CPU, RAM, I/O and latency backed by silicon and
the latest Ubuntu kernels."
So what is crushing? According to Shuttleworth, LXD runs guest machines 14.5 times more densely
and with 57 percent less latency than KVM. So, for example, you can run 47 KVM Ubuntu VMs on a
16GB Intel server, or an amazing 536 LXD Ubuntu containers on the same hardware.
Shuttleworth also stated that LXD was far faster than KVM. For example, all 536 guests started
with LXD in far less time than it took KVM to launch its 37 guests. "On average" he claimed," LXD
guests started in 1.5 seconds, while KVM guests took 25 seconds to start."
As for latency, Shuttleworth boasted that, "Without the overhead emulation of a VM, LXD avoids
the scheduling latencies and other performance hazards. Using a sample 0MQ [a popular Linux
high-performance asynchronous messaging library] workload, LXD guests had 57 percent less latency
for KVM guests.
Thus, LXD should cut more than half of the latency for such latency-sensitive workloads as voice
or video transcode. This makes LXD an important potential tool in the move to network function
virtualisation (NFV) in telecommunications and media, and the convergence of cloud and high
performance computing.
Indeed, Shuttleworth claimed that with LXD the Ubuntu containers ran at speeds so close to
bare-metal that they couldn't see any performance difference. Now, that's impressive!
Virtualization has swept through the data center in recent years, enabling IT transformation and
serving as the secret sauce behind cloud computing. Now it's time to examine what's next for
virtualization as the data center options mature and virtualization spreads to desktops,
networks, and beyond.
LXD, however, as Shuttleworth pointed out, is not a replacement for KVM or other hypervisor
technologies such as Xen. Indeed, it can't replace them. In addition, LXD is not trying to
displace Docker as a container technology.
No website about Xen can be considered complete without an opinion on this topic. KVM got included into the Linux kernel and is
considered the right solution by most distributions and top Linux developers, including Linus Thorvalds himself. This made many people
think Xen is somehow inferior or is on the way to decline. The truth is, these solutions differ both in terms of underlying technology
and common applications.
How Xen works
Xen not only didn't make it to the main tree of the Linux kernel. It doesn't even run on Linux, although it looks like it. It's
a bare metal hypervisor (or: type 1 hypervisor)- a piece of software that runs directly on hardware. If you install a Xen package
on your normal Linux distribution, after rebooting you will see Xen messages first. It will then boot your existing system into a
first, specially privileged virtual machine called dom0.
This makes the process quite complex. If you start experimenting with Xen and at first attempt make your machine unbootable, don't
worry - it happened to many people, including Yours Truly. You can also download Xen Server - commercial, but free distribution of
Xen which comes with a simple to use installer, a specially tailored, minimal Linux system in dom0 and enterprise-class management
tools. I'll write some more about diffences between XenServer and "community" Xen in a few days.
It also means you won't be able to manipulate VMs using ordinary Linux tools, e.g. stop them with kill and monitor with top. However,
Xen comes with some great management software and even greater 3rd-party apps are available (be careful, some of them don't work
with Xen Server). They can fully utilize interesting features of Xen, like storing snapshots of VMs and live-migration between physical
servers.
Xen is also special for its use of technology called paravirtualization. In short, it means that the guest operating systems knows
it runs on a virtualized system. There is an obvious downside: it needs to be specially modified, although with open source OSes
that's not much of an issue. But there's also one very important advantage: speed. Xen delivers almost native performance. Other
virtualization platforms use this approach in a very limited way, usually in form of a driver package that you install on a guest
systems. This improves the speed compaired to completely non-paravirtualized system, but is still far from what can be achieved with
Xen.
How KVM works
KVM runs inside a Linux system, not above it - it's called type 2, or hosted hypervisor. This has several significant implications.
From technical point of view, it makes it easier to deploy and manage, no need for special boot-time support; but it also makes it
harder to deliver good performance. From political point of view, Linux developers view it as superior to Xen because it's a part
of the system, not an outside piece of software.
KVM requires CPU with hardware virtualization support. Most new server, desktop and laptop processors from Intel and AMD work
with KVM. Older CPUs or low-power units for netbooks, PDAs and the like lack this feature. Hardware-assisted virtualization makes
it possible to run an unmodified operating system with an adequate speed. Xen can do it too, although this feature is mostly used
to run Windows or other proprietary guests. Even with hardware support, pure virtualization is still much slower than paravirtualization.
Rest of the world
Some VMware server platforms and Microsoft Hyper-V are bare-metal hypervisors, like Xen. VMware's desktop solutions (Player, Workstation)
are hosted, as well as QEMU, VirtualBox, Microsoft Virtual PC and pretty much everything else. None of them employ a full paravirtualization,
although they sometimes offer drivers improving the performance of guest systems.
KVM only runs on machines with hardware virtualization support. Some enterprise platforms have this requirement too. VirtualBox
and desktop versions of VMware work on CPUs lacking virtualization support, but the performance is greatly reduced.
What shoud you choose?
For the server, grid or cloud
If you want to run Linux, BSD or Solaris guests, nothing beats the paravirtualized performance of Xen. For Windows and other proprietary
operating systems, there's not much difference between the platforms. Performance and features are similar.
In the beginning KVM lacked live migration and good tools. Nowadays most open source VM management applications (like virt-manager
on the screenshot) support both Xen and KVM. Live migration was added in 2007. The whole system is considered stable, although some
people still have reservations and think it's not mature enough. Out of the box support in leading Linux distributions is definitely
a good point.
VMware is the most widespread solutions - as they proudly admit, it's used by all companies from Fortune 100. Main disadvantage
is poor support from open source community. If free management software from VMware is not enough for you, you usually have no choice
but to buy a commercial solution - and they don't come cheap. Expect to pay several thousand $ per server or even per CPU.
My subjective choice would be: 1 - Xen, 2 - KVM, 3 - VMware ESXi.
For the personal computer
While Xen is my first choice for the server, it would be very far on the list of "best desktop virtualization platforms". One
reason is poor support for power management. It slowly improves, but still I wouldn't install Xen on my laptop. Also the installation
method is more suitable for server platforms, but inconvenient for the desktop.
KVM falls somewhere in the middle. As a hosted hypervisor, it's easier to run. Your Linux distribution probably already supports
it. Yet, it lacks some user-friendliness of true desktop solutions and if your CPU doesn't have virtualization extensions, you're
out of luck.
VMware Player (free of charge, but not open source) is extremaly easy to use, when you want to run VMs prepared by somebody else
(hence the name Player - nothing to do with games). Creating a new machine requires editing configuration file or external software
(eg. this web-based VM creator). What I really like is convenient hardware management (see screenshot) - just one click to decide
if your USB drive belongs to host or guest operating system, another to mount ISO image as guest's DVD-ROM. Another feature is easy
file sharing between guest and host. Player's bigger brother is VMware Workstation (about $180). It comes with the ability to create
new VMs as well as some other additions. Due to the number of features it slightly harder to use, but still very user-friendly.
VMware offers special drivers for guest operating systems. They are bundled with Workstation, for Player they have to be downloaded
separately (or you can borrow them from Workstation, even demo download - license allows it). They are especially useful if you want
to run Windows guest, even on older CPUs without hardware assist it's quite responsive.
VirtualBox comes close to VMware. It also has the desktop look&feel and runs on non-hardware-assisted platforms. Bundled guest
additions improve performance of virtualized systems. Sharing files and hardware is easy - but not that easy. Overall, in both speed
and features, it comes second.
My subjective choice: 1 - VMware Player or Workstation, 2 - VirtualBox, 3 - KVM
EDIT: I later found out that new version of VirtualBox is superior to VMware Player.
[Mar 15, 2011] Hype and virtue
by Timothy Roscoe, Kevin Elphinstone,Gernot Heiser
In this paper, we question whether hypervisors are really acting as a disruptive force in OS research,
instead arguing that they have so far changed very little at a technical level. Essentially, we have retained the
conventional Unix-like OS interface and added a new ABI based on PC hardware which is highly unsuitable for most purposes.
Despite commercial excitement, focus on hypervisor design may be leading OS research astray. However, adopting a different approach
to virtualization and recognizing its value to academic research holds the prospect of opening up kernel research to new directions.
The best way is probably to exclude memory allocation subsystem of guest systems presenting them with unlimited linear memory space
(effectively converting them to Dos from the point of view of memory allocation ;-) and handle all memory allocation in hypervisor...
That was done in VM/CMS many years ago. Those guys are reinventing bicycle like if often happens when old technology become revitalized
due to hardware advances.
Because KVM virtual machines are regular processes, the standard memory conservation techniques apply. But unlike regular processes,
KVM guests contain a nested operating system, which impacts memory overcommitment in two key ways. KVM guests can have greater memory
overcommitment potential than regular processes. This is due to a large difference between minimum and maximum guest memory requirements
caused by swings in utilization.
Capitalizing on this variability is central to the appeal of virtualization, but it is not always easy. While the host is managing
the memory allocated to a KVM guest, the guest kernel is simultaneously managing the same memory. Lacking any form of collaboration
between the host and guest, neither the host nor the guest memory manager is able to make optimal decisions regarding caching and
swapping, which can lead to less efficient use of memory and degraded performance.
Linux provides additional mechanisms to address memory overcommitment specific to virtualization.
Memory ballooning is a technique in which the host instructs a cooperative guest to release some of its assigned
memory so that it can be used for another purpose. This technique can help refocus memory pressure from the host onto a guest.
Kernel Same-page Merging (KSM) uses a kernel thread that scans previously identified memory ranges for identical
pages, merges them together, and frees the duplicates. Systems that run a large number of homogeneous virtual machines benefit
most from this form of memory sharing.
Other resource management features such as Cgroups have applications in memory overcommitment that can dynamically
shuffle resources among virtual machines.
Learn how to use the open source Clonezilla Live cloning software to convert your physical server to a virtual one. Specifically,
see how to perform a physical-to-virtual system migration using an image-based method.
As mentioned, IBM has one virtualization type on their midrange systems, PowerVM, formerly referred to as Advanced Power Virtualization.
IBM uses a type-one hypervisor for its logical partitioning and virtualization, similar in some respects to Sun Microsystems' LDOMs
and VMWARE's ESX server. Type-1 hypervisors run directly on a host's hardware, as a hardware control and guest operating system,
which is an evolvement of IBM's classic originally hypervisor- vp/cms. Generally speaking, they are more efficient, more tightly
integrated with hardware, better performing, and more reliable than other types of hypervisors. Figure 1 illustrates some of the
fundamental differences between the different types of partitioning and hypervisor-based virtualization solutions. IBM LPARs and
HP vPars fall into the first example -- hardware partitioning (through their logical partitioning products), while HP also offers
physical partitioning through nPars.
IBM's solution, sometimes referred to as para-virtualization, embeds the hypervisor within the hardware platform. The fundamental
difference with IBM is that there is one roadmap, strategy, and hypervisor, all integrated around one hardware platform: IBM Power
Systems. Because of this clear focus, IBM can enhance and innovate, without trying to mix and match many different partitioning and
virtualization models around different hardware types. Further, they can integrate their virtualization into the firmware, where
HP simply cannot or chooses not to.
Work on Xen has been supported by UK EPSRC grant GR/S01894, Intel Research, HP Labs and Microsoft Research. For further details
contact [email protected].
The Last but not LeastTechnology is dominated by
two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt.
Ph.D
FAIR USE NOTICEThis site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
You can use PayPal to to buy a cup of coffee for authors
of this site
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society.We do not warrant the correctness
of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be
tracked by Google please disable Javascript for this site. This site is perfectly usable without
Javascript.