Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells

Working with Archives and Compressed Files

News Red Hat Certification Program Understanding and using essential tools Access a shell prompt and issue commands with correct syntax Finding Help Managing files in RHEL Working with hard and soft links Working with archives and compressed files Using the Midnight Commander as file manager
Text files processing Using redirection and pipes Use grep and extended regular expressions to analyze text files Finding files and directories; mass operations on files Connecting to the server via ssh, using multiple consoles and screen command Introduction to Unix permissions model VIM: Basic and intermediate set of command for syadmins VIM: Basic and intermediate set of command for syadmins Managing local users and groups
        Tips Sysadmin Horror Stories Unix History with some Emphasis on Scripting Humor Etc

Extracted from Professor Nikolai Bezroukov unpublished lecture notes.

Copyright 2010-2018, Dr. Nikolai Bezroukov. This is a fragment of the copyrighted unpublished work. All rights reserved.

The most common archive format in Linux is compressed tarball.  To create such an archive you need to know ins and outs of the tar command . This command was originally designed to stream files to a tape.  By default compression is with  gzip. You need to specify bzip2 explicitly.  You should be able perform three important tasks:

There are four major types are compressed archives in Linux

Tar archiver

Tar is  a very old (dates back to Version 6 of AT&T UNIX, circa 1975) zero-compression archiver. But it is not easy to replace it with any archiver that has a zero-compression option (for example zip) as with time it got some unique capabilities and became a standard de-facto for zero compression archiver.

Unlike most archivers, tar can be used as the head or tail of a pipeline. tar  is very convenient for moving hierarchies directories hierarchies while preserving all attributes and symbolic and hard links between servers.

Archive created by tar is usually called tarball.  A tarball historically was a magnetic tape, but now it's usually a disk file. The default device, /dev/rmt0, is seldom used today, the most common is to archive into file that is often processed additionally by gzip, or, for text files,  xz (option -J).

The tar command can specify a list of files or directories and can include name substitution characters. The basic form is

tar  keystring options -f tarball filenames... .

The keystring is a string of characters starting with one function letter (c, r, t  , u, or x) and zero or more function modifiers (letters or digits), depending on the function letter used.  You can also specify all of them as options. tarball is specified after option -f. Omitting -f  is  the most common  mistake for novice users of tar.

tar options files_to_include

Three examples:

Main options:

  1. -A, --catenate, --concatenate append tar files to an archive
  2. -c, --create create a new archive
  3. -d, --diff, --compare find differences between archive and file system
  4. --delete delete from the archive (not on mag tapes!)
  5. -r, --append append files to the end of an archive
  6. -t, --list list the contents of an archive
  7. --test-label test the archive volume label and exit
  8. -u, --update only append files newer than copy in archive
  9. -x, --extract, --get extract files from an archive
  10. -C, --directory=DIR change to directory DIR
  11. -f, --file=ARCHIVE use archive file or device ARCHIVE
  12. -p, --preserve-permissions extract information about file permissions (default for superuser)
  13. -v, --verbose verbosely list files processed

there are also three options that specify compression program used:

  1. -z, --gzip filter the archive through gzip
  2. -j, --bzip2 filter the archive through bzip2
  3. -J, --xz filter the archive through xz

Creating Archives with tar

To create an archive, you use the command.

tar -cvf archivename.tar  directory_or_list_of_files_or_regular_expression

Always use  option -v to see what is happening. To put files in an archive, you need at least read permissions to the file. For example to compress your home directory and pup the resulting archive in  /tmp you can use the command

tar -cvf /tmp/joeuser.tar /home/joeuser

Notice the options that are used; the order in these options is important -- the last one should be option f  because it has parameter which specifies where to put the archive we are creating.

Similarly you can create archive of etc

tar -cvf -C /etc /root/etc181006.tar . 
cd /etc && tar cvf 

TIPS:

Originally, tar did not use the dash (-) in front of its options. Modern tar implementations use that dash, as do all other Linux programs, but they still allow the old usage without a dash for backward compatibility.

While managing archives with tar, it is also possible to add a file to an existing archive, or to update an archive. To add a file to an archive, you use the -r options. For example: 

tar -rvf /root/etc181006.tar /root/*.cfg /root/.bash*

To update a currently existing archive file, you can use the -u option: 

tar -uvf /root/etc181006.tar /root/*.cfg /root/.bash*

write newer versions of specified files all files in /home to the end of archive and correct the directory of the archive to to new entries. Older files are not deleted, only directory is corrected,  so archive increases in size.

Extracting files from a tarball

Before extracting a file, it is good to check if file exists and what are its attributes. The option -t can be used to find out. Type, for instance,

tar -tvf /root/etc181006.tar 

to see the contents of the tar archive. 

TIPS:

To extract the contents of an archive, use tar -xvf /archivename. This extracts all the content of the archive into the current directory. That means that if you are in /root when typing tar -xvf /root/homes.tar, and the file contains a directory /home, after extracting you’ll have a new directory /root/home that contains the entire contents of the file. This might not be what you wanted to accomplish.

It is also possible to extract one file out of the archive. If the file is in root directory of the archive it is put into current directory. If it in one of the subdirectories subdirectories are created:

tar -xvf /archivename.tar exact_path_in_listing_of_tar_archive_for_that_file

There are two solutions to put the extracted contents right where you want to have them:

TIPS

If your archive etc181012.tar lists the file hosts as e ./hosts that you want to extract, for instance, use

tar -xvf /root/etc181012.tar ./hosts

This is a very useful capability, if you mangled the file and want to recover the previous version. Of course, you need a fresh archive to that. That's why it makes sense to create etc archive ach morning when you login to the system, but before any changes. 

As etc is the nerve center of the Linux OS and  contains all system related configuration files in here or in its sub-directories a good practice is to create etc archive in you  on your first login each day. You can do do putting appropriate command in your .bash_profile. You also can use GIT to maintain files in etc181012.  There is a package  called etckeeper that automatically adds changed file to git database, but still you need to have a tarball to recover git database in case of SNAFU.

 A "configuration file" is defined as a local file used to control the operation of a program; it must be static and cannot be an executable binary. For this reason, it's a good idea to backup this directory regularly. It will definitely save you a lot of re-configuration later if you re-install or lose your current installation. Normally, no binaries should be or are located here.

Using Compression

For large files compression should be done via parallel version of the compression program and this typically done via pipe. For small archives such as software you develop you can use option in tar.  

tar -cv  /etc |  pbzip2 > /root/etc181006.tar.bz2

on /etc directory bzip2 provides ~12% better compression then gzip, while xz provides ~30% better compression.  So you save one archive space each ten days if you do it daily from cron.  if files that contitute the archive are very large (say around 500MB and larger) is makes sense to compress them individually first and only then create tar archive without compression.

Usage  pigz or bzip2 is a must in this case.  Especially the last, as it provides better compression ration then gzip, while having adequate (although slightly slower then pigz) speed.

For small archives compression is specified via one of tar option -z for gzip, -j for bzip and -J for xz. The result is the same.  For example

tar -cvzf  /root/etc181006.tgz /etc

Please note that compressed tarball is an atomic entity: you can't extract individual files without decompressing it first. That's a big difference with zip archive format, which is more like virtual file system and allow to extract individual files from the archive without decompression the whole archive.

To decompress files that have been compressed with gzip or bzip2, you can use the gunzip and bunzip2 utilities. Parallel version in this case are not much faster: decompression speed is approximately the same. Of course, if archive was compressed better the decompression speed might be higher and in this sense xz might beat alternatives as it provides the highest compression ration. 

Here are compressed sizes of default content of /etc directory in RHEL 7 using those three compression programs:

[root@test01 Archive]# ll
total 24600
-rw-r--r--. 1 root root 8580949 Oct  7 17:56 etc181006.tbz
-rw-r--r--. 1 root root 9795626 Oct  7 17:47 etc181006.tgz
-rw-r--r--. 1 root root 6808940 Oct  7 18:04 etc181006.txz

TIPS:

Using tar to backup etc directory and home partition

One of the important tips for aspiring Linux system administrators is to create backup of etc directory each morning on the first login. This can be done automatically you work with the tar command to manage archives.

It is also important to backup home directory. In both cases the simplest way to accomplish those tasks is to use tar.

  1. Open a root shell on your server. By logging in, the home directory of user root will become the current directory, so all relative filenames used in this exercise refer to /root/.
  2. Type export timestamp=`date +"%y%m%d"`
  3. Type tar -cvf etc$timestamp.tar /etc to archive the contents of the /etc directory.
  4. Type file etc$timestamp.tar and read the information that is provided by the command. This should look like the following:
  5. Type pigz etc$timestamp.tar.gz
  6. Type tar tvf etc$timestamp.tar.gz. Notice that the tar command has no issues reading from a gzip compressed file. Also notice that the archive content consists of all relative filenames.
  7. Type tar xvf etc$timestamp.tar.gz etc/hosts.
  8. Type ls -R. Notice that a subdirectory etc has been created in the current directory. In this subdirectory, the file hosts has been restored.
  9. Type gunzip etc$timestamp.tar.gz. This decompresses the compressed file but does not change anything else with regard to the tar command.
  10. Type tar xvf etc$timestamp.tar -C /tmp etc/passwd. This extracts the password file to the /tmp directory.
  11. Type tar cjvf /tmp/home$timestamp /home. This creates a dated compressed archive of the home directory to the home directory of user root.
  12. Type rm /root/etc$timestamp* /root/home$timestamp*  to remove all files

Working with archive using Midnight Commander

Midnight Commander presents archive as "virtual file system" -- that means that if you click Enter on an archive it will display its content in panel and all standard operations are instantly available.

Moreover  you can program additional operation using so called "User menu" (available if you hit F2).  By default it contains seven archive related operations (in bold): 

Operation 3-6 work the following way. they compress the current directory content and put the resilting tarball into the parent directory.  by default mc user menu uses non parallel version of cursors. You can edit it and change that to parallel version.

You can also add you own operations. For example instead the parent directory you can put archive into the directory defined by passive panel (%D)


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

18 Tar Command Examples in Linux

This is a weak article with multiple mistakes and typos. Most examples are trivial. But a couple is useful..

## Untar files in Current Directory ##
# tar -xvf public_html-14-09-12.tar

## Untar files in specified Directory ##
# tar -xvf public_html-14-09-12.tar -C /home/public_html/videos/

/home/public_html/videos/
/home/public_html/videos/views.php
/home/public_html/videos/index.php
/home/public_html/videos/logout.php
/home/public_html/videos/all_categories.php
/home/public_html/videos/feeds.xml

Extract Group of Files using Wildcard

To extract a group of files we use wildcard based extracting. For example, to extract a group of all files whose pattern begins with .php from a tar, tar.gz and tar.bz2 archive file.

# tar -xvf Phpfiles-org.tar --wildcards '*.php'

# tar -zxvf Phpfiles-org.tar.gz --wildcards '*.php'

# tar -jxvf Phpfiles-org.tar.bz2 --wildcards '*.php'

/home/php/iframe_ew.php
/home/php/videos_all.php
/home/php/rss.php
/home/php/index.php
/home/php/vendor.php
/home/php/video_title.php
/home/php/report.php
/home/php/video.php

Bhasker Reddy, November 4, 2015 at 3:58 am

While extracting a particular file from the tarball, just giving a file name doesn't work. You need to give the path of the file within the tarball.

For example: I have taken a backup of my home directory called "/home/aziz" and it contains test1 and test2 files.

$ tar -xvf /tmp/aziz.tar test1 - This command does not work while extracting a particular called test1
$ tar -xvf /tmp/aziz.tar home/aziz/test1 - This command works while extracting a file called test1

OR

$ tar --extract --file=/tmp/aziz.tar home/aziz/test1

Recommended Links

Google matched content

Softpanorama Recommended

https://www.certdepot.net/

Red Hat Certified System Administrator (RHCSA EX200) – Study Guide

...



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: December 07, 2018