|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
Adapted from IBM General Parallel File System - Wikipedia and IBM Redbooks Implementing the IBM General Parallel File System (GPFS) in a Cross Platform Environment
The IBM General Parallel File System (GPFS) is a high performance shared-disk file management solution that provides fast, reliable access to data from multiple nodes in a cluster environment. It was initially designed for AIX on RS/6000 system(1998).
GPFS allows applications on multiple nodes to share file data. It stripes data across multiple disks for higher performance. GPFS is based on a shared disk model which provides lower overhead access to disks not directly attached to the application nodes and uses a distributed locking protocol to provide full data coherence for access from any node.
|
It offers many of the standard POSIX file system interfaces allowing most applications to execute without modification or recompiling. These capabilities are available while allowing high speed access to the same data from all nodes of the cluster and providing full data coherence for operations occurring on the various nodes. GPFS attempts to continue operation across various node and component failures assuming that sufficient resources exist to continue.
GPFS can exploit IP over InfiniBand and Remote Direct Memory Access (RDMA) to provide access to the file system. It is especially useful in GPFS clusters. As a cluster file system GPFS provides a global namespace, shared file system access among GPFS clusters, simultaneous file access from multiple nodes, high recoverability and data availability through replication, the ability to make changes while a file system is mounted, and simplified administration even in large environments.
The same file can be accessed concurrently from multiple nodes. GPFS is designed to provide high availability through advanced clustering technologies, dynamic file system management and data replication. GPFS can continue to provide data access even when the cluster experiences storage or node malfunctions. GPFS scalability and performance are designed to meet the needs of data intensive applications such as engineering design, digital media, data mining, relational databases, financial analytics, seismic data processing, scientific research and scalable file serving.
The unique differentiation points for GPFS versus other files systems are as follows:
In addition to providing filesystem storage capabilities, GPFS provides tools for management and administration of the GPFS cluster and allows for shared access to file systems from remote GPFS clusters.
GPFS is the software licensed by IBM. It is neither free not open source. It is offered as part of the IBM System Cluster 1350. The most recent release of GPFS is 3.5
GPFS began as the Tiger Shark file system, a research project at IBM's Almaden Research Center as early as 1993. Shark was initially designed to support high throughput multimedia applications. This design turned out to be well suited to scientific computing.
Another ancestor of GPFS is IBM's Vesta filesystem, developed as a research project at IBM's Thomas J. Watson Research Center between 1992-1995. Vesta introduced the concept of file partitioning to accommodate the needs of parallel applications that run on high-performance multicomputers with parallel I/O subsystems. With partitioning, a file is not a sequence of bytes, but rather multiple disjoint sequences that may be accessed in parallel. The partitioning is such that it abstracts away the number and type of I/O nodes hosting the filesystem, and it allows a variety of logical partitioned views of files, regardless of the physical distribution of data within the I/O nodes. The disjoint sequences are arranged to correspond to individual processes of a parallel application, allowing for improved scalability.
Vesta was commercialized as the PIOFS filesystem around 1994 and was succeeded by GPFS around 1998. [8] The main difference between the older and newer filesystems was that GPFS replaced the specialized interface offered by Vesta/PIOFS with the standard Unix API: all the features to support high performance parallel I/O were hidden from users and implemented under the hood.[3][8] Today, GPFS is used by many of the top 500 supercomputers listed on the Top 500 Supercomputing Sites web site. Since inception GPFS has been successfully deployed for many commercial applications including: digital media, grid analytics and scalable file service.
In 2010 IBM released a version of GPFS that included a capability known as GPFS-SNC where SNC stands for Shared Nothing Cluster. This allows GPFS to be used as a filesystem for locally attached disks on a cluster of network connected servers rather than requiring sharing of disks using a SAN with dedicated servers. GPFS-SNC is suitable for workloads with high data locality
GPFS provides high performance by allowing data to be accessed over multiple computers at once. Most existing file systems are designed for a single server environment, and adding more file servers does not improve performance. GPFS provides higher input/output performance by "striping" blocks of data from individual files over multiple disks, and reading and writing these blocks in parallel. Other features provided by GPFS include high availability, support for heterogeneous clusters, disaster recovery, security, DMAPI, HSM and ILM.
According to (Schmuck and Haskin), a file that is written to the filesystem is broken up into blocks of a configured size, less than 1 megabyte each. These blocks are distributed across multiple filesystem nodes, so that a single file is fully distributed across the disk array. This results in high reading and writing speeds for a single file, as the combined bandwidth of the many physical drives is high. This makes the filesystem vulnerable to disk failures -any one disk failing would be enough to lose data. To prevent data loss, the filesystem nodes have RAID controllers — multiple copies of each block are written to the physical disks on the individual nodes. It is also possible to opt out of RAID-replicated blocks, and instead store two copies of each block on different filesystem nodes.
Other features of the filesystem include
It is interesting to compare this with Hadoop's HDFS filesystem, which is designed to store similar or greater quantities of data on commodity hardware — that is, datacenters without RAID disks and a Storage Area Network (SAN).
Storage pools allow for the grouping of disks within a file system. Tiers of storage can be created by grouping disks based on performance, locality or reliability characteristics. For example, one pool could be high performance fibre channel disks and another more economical SATA storage.
A fileset is a sub-tree of the file system namespace and provides a way to partition the namespace into smaller, more manageable units. Filesets provide an administrative boundary that can be used to set quotas and be specified in a policy to control initial data placement or data migration. Data in a single fileset can reside in one or more storage pools. Where the file data resides and how it is migrated is based on a set of rules in a user defined policy.
There are two types of user defined policies in GPFS: File placement and File management. File placement policies direct file data as files are created to the appropriate storage pool. File placement rules are determined by attributes such as file name, the user name or the fileset. File management policies allow the file's data to be moved or replicated or files deleted. File management policies can be used to move data from one pool to another without changing the file's location in the directory structure. File management policies are determined by file attributes such as last access time, path name or size of the file.
The GPFS policy processing engine is scalable and can be run on many nodes at once. This allows management policies to be applied to a single file system with billions of files and complete in a few hours.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Introduction
We're installing a two node GPFS cluster : gpfs1 and gpfs2. Those are RHEL5 systems, accessing a shared disk as '/dev/sdb'. We're not using client/server GPFS feature but just two NSD.
Installation
On each node, make sure you've got those packages installed,
rpm -q \libstdc++ \
compat-libstdc++-296 \compat-libstdc++-33 \
libXp \imake \
gcc-c++ \kernel \
kernel-headers \kernel-devel \
xorg-x11-xauth
On each node, make sure the nodes reslove and are able to login as root to each one, even itself,
cat /etc/hostsssh-keygen -t dsa -P ''
copy/paste the public keys from each node,
cat .ssh/id_dsa.pubto one same authorized_keys2 on all the nodes,
vi ~/.ssh/authorized_keys2check the nodes can connect to each other, even to itselfs,
ssh gpfs1ssh gpfs2
On each node, extract and install IBM Java,
./gpfs_install-3.2.1-0_i386 --text-onlyrpm -ivh /usr/lpp/mmfs/3.2/ibm-java2-i386-jre-5.0-4.0.i386.rpm
extract again and install the GPFS RPMs,
./gpfs_install-3.2.1-0_i386 --text-onlyrpm -ivh /usr/lpp/mmfs/3.2/gpfs*.rpm
On each node, get the latest GPFS update (http://www14.software.ibm.com/webapp/set2/sas/f/gpfs/download/home.html) and install it,
mkdir /usr/lpp/mmfs/3.2.1-13tar xvzf gpfs-3.2.1-13.i386.update.tar.gz -C /usr/lpp/mmfs/3.2.1-13
rpm -Uvh /usr/lpp/mmfs/3.2.1-13/*.rpm
On each node, prepare the portability layer build,
#mv /etc/redhat-release /etc/redhat-release.dist#echo 'Red Hat Enterprise Linux Server release 5.3 (Tikanga)' > /etc/redhat-release
cd /usr/lpp/mmfs/srcexport SHARKCLONEROOT=/usr/lpp/mmfs/src
rm config/site.mcrmake Autoconfig
check for those values into the configuration,
grep ^LINUX_DISTRIBUTION config/site.mcrgrep 'define LINUX_DISTRIBUTION_LEVEL' config/site.mcr
grep 'define LINUX_KERNEL_VERSION' config/site.mcrNote. "2061899" for kernel "2.6.18-128.1.10.el5"
On each node, build it,
make cleanmake World
make InstallImages
On each node, edit the PATH,
vi ~/.bashrcadd this line,
PATH=$PATH:/usr/lpp/mmfs/binapply,
source ~/.bashrc
On some node, create the cluster,
mmcrcluster -N gpfs1:quorum,gpfs2:quorum -p gpfs1 -s gpfs2 -r /usr/bin/ssh -R /usr/bin/scpNote. gpfs1 as primary configuration server, gpfs2 as secondary
On some node, start the cluster on all the nodes,
mmstartup -a
On some node, create the NSD,
vi /etc/diskdef.txtlike,
/dev/sdb:gpfs1,gpfs2::::apply,
mmcrnsd -F /etc/diskdef.txt
On some node, create the filesystem,
mmcrfs gpfs1 -F /etc/diskdef.txt -A yes -T /gpfsNote. '-A yes' for automount
Note. check for changes into '/etc/fstab'
On some node, mount /gpfs on all the nodes,
mmmount /gpfs -a
On some node, check you've got access to the GUI,
/etc/init.d/gpfsgui startNote. if you need to change the default ports, edit those file and change "80" and "443" to the ports you want,
#vi /usr/lpp/mmfs/gui/conf/config.properties#vi /usr/lpp/mmfs/gui/conf/webcontainer.properties
wait a few seconds (starting JAVA...) and go to node's GUI URL,
https: //gpfs2/ibm/console/
On each node, you can now disable the GUI to save some RAM,
/etc/init.d/gpfsgui stopchkconfig gpfsgui off
and make sure gpfs is enable everywhere,
chkconfig --list | grep gpfsNote. also make sure the shared disk shows up at boot.
Usage
For toubleshooting, watch the logs there,
tail -F /var/log/messages | grep 'mmfs:'
On some node, to start the cluster and mount the file system on all the nodes,
mmstartup -ammmount /gpfs -a
Note. "mmshutdown" to stop the cluster.
Show cluster informations,
mmlscluster#mmlsconfig
#mmlsnode#mmlsmgr
Show file systems and mounts,
#mmlsnsd#mmlsdisk gpfs1
mmlsmount allshow file systems options,
mmlsfs gpfs1 -a
To disable automount,
mmchfs gpfs1 -A noto reenable automount,
mmchfs gpfs1 -A yes
References
Install and configure General Parallel File System (GPFS) on xSeries : http://www.ibm.com/developerworks/eserver/library/es-gpfs/
General Parallel File System (GPFS) : http://www.ibm.com/developerworks/wikis/display/hpccentral/General+Parallel+File+System+(GPFS)
Managing File Systems : http://www.ibm.com/developerworks/wikis/display/hpccentral/Managing+File+Systems
GPFS V3.1 Problem Determination Guide : http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.gpfs.doc/gpfs31/bl1pdg1117.html
GPFS : http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS
GPFS V3.2 and GPFS V3.1 FAQs : http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html
Mar 21, 2006 | IBM
A file system describes the way information is stored on a hard disk, such as ext2, ext3, ReiserFS, and JFS. The General Parallel File System (GPFS) is another type of file system available for a clustered environment. The design of GPFS has better throughput and a high fault tolerance.
This article discusses a simple case of GPFS implementation. To keep things easy, you'll use machines with two hard disks -- the first hard disk is used for a Linux® installation and the second is left "as is" (in raw format).
Google matched content |
|accessdate=
requires |url=
(help)
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: August 19, 2015