|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
|
Tar is a very old (dates back to Version 6 of AT&T UNIX, circa 1975) zero-compression archiver. But it is not easy to replace it with any archiver that has a zero-compression option (for example zip) as with time it got some unique capabilities and became a standard de-facto for zero compression.
Archive created by tar is usually called tarball. A tarball historically was a magnetic tape, but now it's usually a disk file. The default device, /dev/rmt0, is seldom used today, the most common is to create a file that is often postprocessed additionally by gzip, bzip2, or, small archive containing mostly text files(source code archives) , xz (option -J).
You can view it as a poor relative of ISO format :-). Tarball can be mounted as a filesystem using FUSE. But for small archives (say, less them 5GB) there are two simpler methods:
As for Nautilus ( https://askubuntu.com/questions/168795/how-do-i-extract-a-specific-file-from-a-tar-archive )
Extract it with the Archive Manager
Open the tar in Archive Manager from Nautilus, go down into the folder hierarchy to find the file you need, and extract it.
· On a server or command-line system, use a text-based file manager such as Midnight Commander (mc) to accomplish the same.
3. Using Nautilus/Archive-Mounter
Right-click the tar in Nautilus, and select Open with ArchiveMounter.
The tar will now appear similar to a removable drive on the left, and you can explore/navigate it like a normal drive and drag/copy/paste any file(s) you need to any destination.
Unlike ISO tar does not store content of the archive separately. All it has is headers for arch archive entry. So the attempt to list the content of the archive involves scanning of the whole the body of archive and recovering entries headers one by one. Which for large archives (say, over 10TB) takes a long time (often over 24 hours) and as such is extremely painful. So for large archives used for backup (which usually never change) it makes sense to store with with manifest -- listing of the contents of the archive.
In other words for large archives tar became as monolithic as tar.gz and as such tar.gz of tar.bz2 formats are preferable for storing large backups.
Unlike most archivers, tar can be used as the head or tail of a pipeline. tar is very convenient for moving hierarchies directories hierarchies while preserving all attributes and symbolic and hard links between servers.
Please note that tar.gz format is atomic -- you can't extract individual files from it without full decompresssion. In order to do so you need first to decompress the whole archive then then extract file from this temporary tarball. For large compressed tarballs this adds substantial delay (can be a week if the compressed tarball is several terabytes)
The tar command can specify a list of files or directories and can include name substitution characters. The basic form is
tar keystring options -f tarball filenames... .
The keystring is a string of characters starting with a function letter (c, r, t , u, or x) and zero or more function modifiers (letters or digits), depending on the function letter used.
The name of the tarball is specified after option -f. For example, tar cvf /root/etc190826.tar -C /etc . Omitting -f is the most common mistake for novice users of tar.
The name of the tarball is specified after option -f. For example, tar cvf /root/etc190826.tar -C
/etc . |
You can specify keystring values as options too because previous version of tar did not have the idea of keystring and this was preserved for compatibility.
tar options files_to_include
You can perform several standard for archivers operations on tarball. In this sense tar is just another member of the family. Among typical operations:
We will discuss GNU tar implementation. There is alternative implementation called star that many prefer, but GNU tar is standard in commercial Linux distributions and is available for all other flavors of Unix so we prefer it. GNU tar has several interesting additional features such as:
tar cvf myarchive.tar -X exclude.lst -T include.lst
Not all tar implementations are created equal. In the past only Solaris tar understand ACLs but later GNU tar caught up and now is the best implementation.
The current version of tar as of August 2012 is 1.26 (dated 2011-03-13). Versions used differ in different linux distributions:
While being just a zero compression archiver, tar has several capabilities that regular archivers usually do not have and as such it is very convenient for many tasks like backups and replication of filesystem from one server to another.
File limits are different for various OSes. In older Unixes tar often used to have 2G file limit. Obviously, older tar programs also won't be able to handle files or archives that are larger than 2^31-1 bytes (2.1 Gigabytes). Try running 'tar --version'. If the first line indicates you are using gnu tar, then any version newer than 1.12.64 is able to work with large files. You can also try command:
strings `which tar` | grep 64
you should see some lines saying lseek64, creat64, fopen64. If yes, your tar contains support for large files. GNU tar docs also mention that the official POSIX tar spec limits files to 8GB, but that gnu tar will generate non-POSIX (therefore possible non-portable files) with sizes up to something like 2^88 bytes. Still formally tarballs that you want to use on any POSIX computer are limited to 8GB.
Formally tarballs that you want to use on any POSIX computer are limited to 8GB |
Since the majority of tarballs are gzip'ed, the maximum filesize may be limited to due to gzip limitation. The newer versions of gzip (1.3.5 and above) support large files. Before that the limit was 2GB. To check gzip version use
gzip --version
Unlike many other utilities tar does not assume that the thirst argument is the tar archive to operate with. Tar assumes that the first argument is a file to operate with. This created problems in novices. For example, unlike most Unix and DOS utilities, listing of the tar archive requires two options t and f (file to be listed).
tar tf myfiles.taror
tar tvf myfiles.tar
Adding option -v allow to see stored attributes
The limit for the length of file names is around 256, but can be as low as 100 on older Oses. See also Google Answers UNIX Question! tar size constraint. In old 32-bit versions of tar because of limitations on header block space in the tar command, user numbers (UIDs), and group identification numbers (GIDs) larger than 65,535 will be corrupted when restored by GNU tar.
Note:
Relax-and-Recover uses tar format for creating backups. It is written in Bash so you can learn a lot from analysing and tracing its code about tar usage for bare-metal backups. See also Tar options for bare metal recovery
Tar is one of the few backup programs that is pipable. The functions supported by tar are the same as for any archiver:
Note: you need to specify option -f, if you want to list a content of tarball.
Information can be listed about all files or a given set of files. If no file argument is given, the list of names of all files in the tarball are listed. With the option -v , additional information for the specified files is displayed. For example:
tar -tvf etc_baseline110315_0900.tar tar -tvf etc_baseline110315_0900.tar hostsIf you can't get information about an individual file you are interested in, that means that you use wrong stored path to it. In this case to see the path you can use
tar -tf etc_baseline110315_0900.tar | grep hosts
tar xfz filename.tar.gz -C PathToDirectory
The command extracting the tar contents into particular directory …
The most popular options include:
Three examples:
Main options:
You need to use option c - Create which imply that the writing begins at the beginning of the tarball, instead of at the end. The tar command for creating a tar archive uses a string of options as well as a list of what files are to be included and names the device to be written to. The default device, /dev/rmt0, is seldom used today, the most common is to archive into file that is often processed additionally by gzip
tar cvf myarchive.tar *
The cvf argument string specifies that you are writing files (c for create), providing feedback to the user (v for verbose), and specifying the device rather than using the default (f for file). You can instantly pipe tar archive into gzip by specifying z option:
tar cvzf myarchive.tgz *
The convention is to extension tar for tar archive and tgz for tared and gziped archives. Please note that tgz archives are monolithic bricks and are just designed to store files. But with regular tarball you can do several operations. In the past instead of gzip, the standard compress utility was used In this case archives have prefix tar.Z.
To create an archive, you use the command.
tar -cvf archivename.tar directory_or_list_of_files_or_regular_expression
Always use option -v to see what is happening. To put files in an archive, you need at least read permissions to the file. For example to compress your home directory and pup the resulting archive in /tmp you can use the command
tar -cvf /tmp/joeuser.tar /home/joeuser
Notice the options that are used; the order in these options is important -- the last one should be option f which
specifies where to pur the archive we are creating
Similarly you can create archive of etc
tar -cvf /root/etc181006.tar /etc
Originally, tar did not use the dash (-) in front of its options. Modern tar implementations use that dash, as do all other Linux programs, but they still allow the old usage without a dash for backward compatibility.
While managing archives with tar, it is also possible to add a file to an existing archive, or to update an archive. To add a file to an archive, you use the -r options. For example:
tar -rvf /root/etc181006.tar /root/*cfg /root/.bash*
To update a currently existing archive file, you can use the -u option:
tar -uvf /root/etc181006.tar /root/*cfg /root/.bash*
write newer versions of specified files all files in /home to the end of archive and correct the directory of the archive to to new entroies. Older files are not deleted so archive increases in size.
Archive of /etc and home directory usually are compresses with gzip achieving approximately 300% compression (the compressed file is one third of the size of the directory).
# mkdir /root/Archive # tar -cv /etc | gzip > /root/Archive/etc`date +"%y%m%d"`.tgz # du -sh /etc 30M /etc # ll /root/Archive total 9568 -rw-r--r--. 1 root root 9795626 Oct 7 17:47 etc181007.tgz
Binary files also can be compressed at least to 50% of the original size. So you can save tremendous amount of space used for storing those archive (and typically archives are stored for a couple of years in enterprise environment).
You can compress tarball after it can was created on on the fly.
Compression after it was created is typically done via pipe. this make sense to do only for parallel version of compressors as regular version can be invoked via tar option:
tar -cv /etc | bzip2 > /root/etc181006.tar.bgz
on /etc directory bzip2 provides ~12% better compression then gzip, while xz provides ~30% better compression. So you save one archive space each ten days if you do it daily from cron. For very large archives (say over 500M) is makes sense to use only parallel version of compressions -- pigz and bzip2. Especially the last, as it provides better compression ration, while maining adequate (although slightly slower then pigz) speed.
On the fly compression is specified via one of tar option -z for gzip, -j for bzip and -J for xz. For example
tar -cvzf /root/etc181006.tgz /etc
To decompress files that have been compressed with gzip or bzip2, you can use the gunzip and bunzip2 utilities. Decompression speed is approximately the same for all three compression programs we discussed.
Here are compressed sizes of default content of /etc directory in RHEL 7 using those three compression programs:
[root@test01 Archive]# ll total 24600 -rw-r--r--. 1 root root 8580949 Oct 7 17:56 etc181006.tbz -rw-r--r--. 1 root root 9795626 Oct 7 17:47 etc181006.tgz -rw-r--r--. 1 root root 6808940 Oct 7 18:04 etc181006.txz
For large directories of files it does not make any sense to use built-in option and you are better off compressing the tarball via pipe using parallel version of archivers. for xz it makes sense to do it in all cases as non-parallel version in excruciatingly slow even on /etc directory.
There is no need to specify these options while extracting. The tar utility recognizes the type of compressed contents and automatically decompress it.
To list the contents of a tar file without extracting, use the t option as shown below. Including the v option as well results in a long listing.
You need also specify option -f to list the files in the tarball. The most common problem that novices experience with the option -v is forgetting about the necessity to use also option -f to specify the file |
TIP: As most people forget to specify option f to list the file it is better to create alias, for example
alias tls='tar -tf'
and put it in your .bash_profile file.
The number of detas in the listing is controlled by -V option.
tar tvf myfiles.tarWith -v option all major attributes of each file are displayed. without it just a name is displayed. So there is a fidderence between
tar tf myfiles.tarand
tar tvf myfiles.tar
NOTE: Tar requires -f option to specify the file. This is one of tar idiosyncrasies and source of a lot of grief for system administrators, who type tar -t myfiles.tar and receive nothing as tar tries to read standard input.
To extract to standard output a file from archive you can use -O or --to-stdout, for example
tar -xOf myfiles.tar hosts | moreHowever, --to-command may be more convenient for use with multiple files.
Archives created with tar include the file ownership, file permissions, and access and creation dates of the files. The p (preserve) option restores file permissions to the original state. This is usually good since you'll ordinarily want to preserve permissions as well as dates so that executables will execute and you can determine how old they are. In some situations, you might not like that original owners are retrieved, since the original owners may be people at some other organization altogether. The tar command will set up ownership according to the numeric UID of the original owner. If someone in your local passwd file or network information service has the same UID, that person will become the owner; otherwise the owner will display numerically. Obviously, ownership can be altered later.
tar xvpf myachive.tar
Extract each file from a shell prompt by typing tar xvzf file.tar.gz from the directory you saved the file.
I would like to remind you again that you can also extract individual file to standard output (via option -O) and redirect it, for example
tar -xvOf /root/etc_baseline110628_0900.tar hosts > /etc/hosts110628
One of the rarely mentioned capabilities of tar is its ability to serve as diff for between the directory and the tarball
The command is (assuming the root was the current directory when tarball was created
cd /etc && tar -df /tmp/etc20141020.tar 2>dev/nullIf some files are missing from the directory this comparison operation fails to produce delta fles for obvious reasons.
From tar manual
4.2.6 Comparing Archive Members with the File System
The `--compare' (`-d'), or `--diff' operation compares specified archive members against files with the same names, and then reports differences in file size, mode, owner, modification date and contents. You should only specify archive member names, not file names. If you do not name any members, then
tar
will compare the entire archive. If a file is represented in the archive but does not exist in the file system,tar
reports a difference.You have to specify the record size of the archive when modifying an archive with a non-default record size.
tar
ignores files in the file system that do not have corresponding members in the archive.The following example compares the archive members `rock', `blues' and `funk' in the archive `bluesrock.tar' with files of the same name in the file system. (Note that there is no file, `funk';
tar
will report an error message.)$ tar --compare --file=bluesrock.tar rock blues funk rock blues tar: funk not found in archiveThe spirit behind the `--compare' (`--diff', `-d') option is to check whether the archive represents the current state of files on disk, more than validating the integrity of the archive media. For this latter goal, see Verifying Data as It is Stored.
Here is some additional info from diff a gzipped tarball against a directory
user394
- diff a gzipped tarball against a directory? user394
Is there a way I can diff a gzipped tarball against an existing directory?
I would like to be able to do it without extracting the data from the tarball.
- Gilles
Mount the tarball as a directory, for example with AVFS. Then use
diff -r
on the real directory and the point where the tarball is mounted.mountavfs diff -r ~/.avfs/path/to/foo.tar.gz\# real-directory
- AXE-Labs
Turns out that GNU tar has diff built in (-d):
$ #Create the archive and a difference: $ echo one>file1;echo two>file2;tar -czf archive.tgz file*;echo changing>>file1 $ tar -dzf archive.tgz file1: Size differs
If you are working with tar that does not have this try:
$ for F in `tar -tzf archive.tgz`;do tar -xzOf archive.tgz $F|diff --brief - $F;done Files - and file1
It also has very useful but rarely used and poorly understood feature called tag files. Any directory tagged with such a file is automatically excluded.
The simplest way to avoid operating on files whose names match a particular pattern is to use `--exclude'
tar
to ignore files that match the pattern. Forward slash is OK if you use it with the option, for example
--exclude=/proc -- exclude=/sys
The `--exclude=pattern' option prevents any file or member whose name matches the shell wildcard (pattern) from being operated on. For example, to create an archive with all the contents of the directory `src' except for files whose names end in `.o', use the command
tar -cf src.tar --exclude='*.o' src
You may give multiple `--exclude' options.
tar
to ignore files that match the patterns listed in file. The tar command also enables you to create lists of files that should be excluded from into tarball. The option -X accomplish that and it can be used with the option -T (include files from the list, see below) .
For example
ls *.zip > exclude.lst tar cvf myarchive.tar -X exclude.lst *
in this case zip files will not be included into the archive. It's a good idea to exclude the exclude file itself, as well as the tar file that you are creating in your exclude file. Notice that this has been done in the following example:
tar cvf myarchive.tar -X exclude.lst *
When archiving directories that are under some version control system (VCS), it is often convenient to read exclusion patterns from this VCS' ignore files (e.g. `.cvsignore', `.gitignore', etc.) The following options provide such possibility:
The patterns are treated much as the corresponding VCS would treat them, i.e.:
Any line beginning with a `#' is a comment. Backslash escapes the comment character.
Any line beginning with a `#' is a comment.
tar
checks if it contains file. If so, exclusion patterns are read from this
file. The patterns affect only the directory itself. As of version 1.29, the following files are excluded:
When creating an archive, the `--exclude-caches' option family causes tar
to exclude all directories
that contain a cache directory tag. A cache directory tag is a short file with the well-known name `CACHEDIR.TAG'
and having a standard header specified in http://www.brynosaurus.com/cachedir/spec.html.
Various applications write cache directory tags into directories they use to hold regenerable, non-precious data, so that such data
can be more easily excluded from backups.
There are three `exclude-caches' options, each providing a different exclusion semantics:
tar
has the ability to ignore specified files and directories contained in a special file. The localtion of the file
is specified with option -X or `--exclude-from'
The syntax is one definition per line. Each line is interpreted as a basic regular expression (tar documentation stupidly call this "pattern")
Thus if tar
is called as `tar -c -X foo .' and the file `foo'
contains a single line `*.o', no files whose names end in `.o' will be added to the archive.
Notice, that lines from file are read verbatim. One of the frequent errors is leaving some extra whitespace after a file name, which is difficult to catch using text editors.
However, empty lines are OK and are ignored. They can be used for readability.
For example:
# Not old backups /opt/backup/arch-full* # Not temporary files /tmp/ # Not the cache for pacman /var/cache/pacman/pkg/
see BackupYourSystem-TAR - Community Help Wiki
Another option family, `--exclude-tag', provides a generalization of this concept. It takes a single argument, a file name to look for. Any directory that contains this file will be excluded from the dump. Similarly to `exclude-caches', there are three options in this option family:
Multiple `--exclude-tag*' options can be given.
For example, given this directory:
$ find dir dir dir/blues dir/jazz dir/folk dir/folk/tagfile dir/folk/sanjuan dir/folk/trote
The `--exclude-tag' will produce the following:
$ tar -cf archive.tar --exclude-tag=tagfile -v dir dir/ dir/blues dir/jazz dir/folk/ tar: dir/folk/: contains a cache directory tag tagfile; contents not dumped dir/folk/tagfile
Both the `dir/folk' directory and its tagfile are preserved in the archive, however the rest of files in this directory are not.
Now, using the `--exclude-tag-under' option will exclude `tagfile' from the dump, while still preserving the directory itself, as shown in this example:
$ tar -cf archive.tar --exclude-tag-under=tagfile -v dir dir/ dir/blues dir/jazz dir/folk/ ./tar: dir/folk/: contains a cache directory tag tagfile; contents not dumped
Finally, using `--exclude-tag-all' omits the `dir/folk' directory entirely:
$ tar -cf archive.tar --exclude-tag-all=tagfile -v dir dir/ dir/blues dir/jazz ./tar: dir/folk/: contains a cache directory tag tagfile; directory not dumped
From: Problems with Using the exclude
Options
Some users find `exclude' options confusing. Here are some common pitfalls:
tar
does not act on a file name explicitly listed on the command line, if one of its
file name components is excluded. In the example above, if you create an archive and exclude files that end with `*.o',
but explicitly name the file `dir.o/foo' after all the options have been listed, `dir.o/foo' will be
excluded from the archive. tar
sees wildcard characters like `*'. If you do not do this, the shell might expand the `*'
itself using files at hand, so tar
might receive a list of files instead of one pattern, or none at all, making the
command somewhat illegal. This might not correspond to what you want.
For example, write:
$ tar -c -f archive.tar --exclude '*.o' directory
rather than:
# Wrong!
$ tar -c -f archive.tar --exclude *.o directory
regexp
syntax, when using exclude options in tar
.
If you try to use regexp
syntax to describe files to be excluded, your command might fail. In earlier versions of tar
, what is now the `--exclude-from' option was called `--exclude'
instead. Now, `--exclude' applies to patterns listed on the command line and `--exclude-from' applies
to patterns listed in a file.
The include file can be used to specify which files should be included.
Combining tar with the find utility, you can archive files based on many criteria, including such things as how old the files are, how big, and how recently used. The following sequence of commands locates files that are newer than a particular file and creates an include file of files to be backed up.
find -newer lastproject -print >> include.lst tar cvf myfiles.tar -T include,lst
That is especially important if you bring archive to a new place after modifying some files in the morning and need to merge changes back in the evening.
Both an include and an exclude file are used. Any file that appears in both files, by the way, will be included.
tar cvf myarchive.tar -X exclude.lst -T include.lst
Notice how we use options that require parameters: the first such option is used as the last in in the first string of options (cvf) and then each option specified separately with its parameter.
GNU tar has some features that enable it to mimic the behavior of find and tar in a single command.
Tar archives can be transferred with remote copy commands such as ssh, scp, ftp, kermit, etc. These utilities know how to deal with binary data. To mail a tarball, you need first encode the file to make it work. The uuencode command turns the contents of files into printable characters using a fixed-width format that allows them to be mailed and subsequently decoded easily. The resultant file will be larger than the file before uuencoding; after all, uuencode must map a larger character set of printable and nonprintable characters to a smaller one of printable characters, so it uses extra bits to do this, and the file will be about a third larger than the original.
mv command could not be used to move directories across file systems. A file system can be thought of as a hard drive or hard drive partition. The mv command works fine when you want to move a directory between different locations on the same file system (hard drive), but it doesn't work well when you want to move a file across file systems. Depending on your version of mv, an error message could be generated when you try to do this.
For example, consider this directory:
ls -F /tmp/ch2 ch2-01.doc ch2.doc@
If you use the mv command to move this directory in the directory /home/mybook on a different file system, an error message similar to the following is generated:
mv: cannot move 'ch22' across filesystems: Not a regular file
Some UNIX versions implement a workaround inside mv that executes the following commands:
rm -rf destination cp -rPp source destination rm -rf source
Here source and destination are directories.
The main problem with this strategy is while symbolic links are copied, hard links in the source directory are not always copied correctly. Sometime coping hard links is just impossible as target of the move operation can be in a different filesystem.
In addition to this, there are two other minor problems with using cp:
The workaround for these problems is to use the tar ( tar as in tape archive ) command to copy directories. This is usually accomplished with pipe and such an operation became a Unix idiom for copying large tree of file, possibly the whole filesystem:
(cd mydata; tar cvf - *) | tar xpB -
What this command does is move to a subdirectory and read files, which it then pipes to an extract at the current working directory. The parentheses group the cd and tar commands so that you can be working in two directories at the same time. The two - characters in this command represent standard output and standard input, informing the respective tar commands where to write and read data. The - designator thereby allows tar commands to be chained in this way.
You can also use cd command with the target directory instead of source directory. For example:tar cvz joeuser | (cd /Archive && tar xpz )
Another example:
tar cBf - * | (cd todir; tar xvpBf -)
NOTE: Errors in this command provide perfect opportunity to wipe out a lot of data. For example if source and target directory is identical you will wipe all the files
cd /Archive; tar cvz joeuser | (cd /Archive && tar xpz )
WARNING: Never do such things on the fly in a hurry. You can easily wipe out the data if you make mistake. View this as en
equivalent of rm command and
behave accordingly. Read Creative uses of rm
Than read it again. Make sure that the command work as expected on test data befor staring working with actual data.
if you are not sure how tar archive was created it is safer to expand it first in /tmp and see results before restoring it to the original tree (especially if this is a system directory). |
You can also move file from one server to another. See Cross network copy
Archives created with tar preserve the file ownership, file permissions, and access and creation dates of the files. Once the files are extracted from a tar file they look the same in content and description as they did when archived. The p (preserve) option will restore file permissions to the original state. This is usually a good idea since you'll ordinarily want to preserve permissions as well as dates so that executables will execute and you can determine how old they are.
In some situations, you might not like that original owners are restored, since the original owners may be people at some other organization altogether. The tar command will set up ownership according to the numeric UID of the original owner. If someone in your local /etc/passwd file or network information service has the same UID, that person will become the owner; otherwise the owner will display numerically. Obviously, ownership can be altered later. But in this case you may want to unpack the archive as a regular user instead of root. If you unpack this archive with other user privileges (non-root) all uid and gid will be replaced with the uid and gid from this user.
Keep that in mind, if you make backups/restore, practically always you need to do it using UID 0 (root).
No tar tutorial is complete without an example of a cross-network copy. This is often called tar-to-tar file transfere. One tar command creates an archive with one tar command while the other extracts from the archive without ever creating a *.tar file. The only problem with this command is that it looks a bit awkward. This capability depends of present of ssh or rsh.
To copy a directory hierarchy using this technique, first position yourself in the source directory:
cd fromdir
Next, tar the contents of the directory using the create (i.e., the "c" Option). Pipe the output to a tar extract (i.e., the "x" option) command. The tar extract should be enclosed in parentheses and contain two parts:
If you are not sure how tar archive was created it is safer to expand it first in /tmp and see results before restoring it to the original tree (especially if this is a system directory). |
For example:
tar cBf - * | (cd todir; tar xvpBf -)
WARNING: Never do such operations on the fly in a hurry. Target directory need to be verified twise. You can easily wipe out the data if you make mistake. View this as en
equivalent of rm command and
behave accordingly. Read Creative uses of rm
Than read it again. Make sure that the command work as expected on test data before staring working with actual data. if you are not sure how tar archive was created it is safer to expand it first in /tmp and see results before restoring it to the original tree (especially if this is a system directory). |
The hyphens in the tar command inform tar that no file is involved in the operation. The option B forces multiple reads and allows the command, as needed, to work across a network. The "p" is the preserve option - generally the default when the superuser uses this command.
Using tar-to-tar commands across you can move tree of directories from one server to another with a single command:
cd /home/joeuser tar cBf - *" | ssh new_server "cd /home/joeuser; tar xvBf -"
Notice how we group the remote commands to clearly separate what we are running on the remote host from what we are doing locally. You might also use tar in conjunction with dd to read files from, or write files to, a tape device on a remote system. In the following command, we copy the files from the current directory and write them to a tape device on a remote host.
tar cvfb - 20 * | ssh boson dd of=/dev/rmt0 obs=20
Back-to-back tar commands have been used for many yef will be replaced with the uid and gid from this user. Keep that in mind, if you make backups/restore, practically always do any backup/restore with UID 0 (root).
If directories that are contained in tar archive exist the permissions will be changed to that in tar archive and files will be overwritten. If this is important system directory like /etc the net result can be a large SNAFU.
if you are not sure how tar archive was created it is safer to expand it first in /tmp and see results before restoring it to the original tree (especially if this is a system directory).
The use -f option to specify the file on which tar operates is the source of a log of grievances. This is not standard arrangement and as such it take time to get used to it.
Exclusion of files is tricky and you need to understand that basic regular expression are essentially applied to the line in the listing of tar with such files. If you do not use absolute path option there is no leading slash in such a listing and you can't use it to create your tar archive.
If tar became corrupted you still can recover most of files because of zero compression feature, but if such file compressed with gzip this is more tricky.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Sep 18, 2020 | www.redhat.com
How to append or add files to a backupIn this example, we add onto a backup backup.tar . This allows you to add additional files to the pre-existing backup backup.tar .
# tar -rvf backup.tar /path/to/file.xmlLet's break down these options:
How to split a backup into smaller backups
-r
- Append to archive
-v
- Verbose output
-f
- Name the fileIn this example, we split the existing backup into smaller archived files. You can
pipe
thetar
command into thesplit
command.# tar cvf - /dir | split --bytes=200MB - backup.tarLet's break down these options:
-c
- Create the archive
-v
- Verbose output
-f
- Name the fileIn this example, the
How to check the integrity of a tar.gz backupdir/
is the directory that you want to split the backup content from. We are making 200MB backups from the/dir
folder.In this example, we check the integrity of an existing
tar
archive.To test the
gzip
file is not corrupt:#gunzip -t backup.tar.gzTo test the
tar
file content's integrity:#gunzip -c backup.tar.gz | tar t > /dev/nullOR
#tar -tvWF backup.tarLet's break down these options:
Use pipes and greps to locate content
-W
- Verify an archive file
-t
- List files of archived file
-v
- Verbose outputIn this example, we use
pipes
andgreps
to locate content. The best option is already made for you.Zgrep
can be utilized forgzip
archives.#zgrep <keyword> backup.tar.gzYou can also use the
zcat
command. This shows the content of the archive, thenpipes
that output to agrep
.#zcat backup.tar.gz | grep <keyword>
Egrep
is a great one to use just for regular file types.
May 20, 2018 | www.cyberciti.biz
I had only one backup copy of my QT project and I just wanted to get a directory called functions. I end up deleting entire backup (note -c switch instead of -x):
cd /mnt/bacupusbharddisk
tar -zcvf project.tar.gz functions
I had no backup. Similarly I end up running rsync command and deleted all new files by overwriting files from backup set (now I have switched to rsnapshot )
rsync -av -delete /dest /src
Again, I had no backup.... ... ...
All men make mistakes, but only wise men learn from their mistakes -- Winston Churchill .
From all those mistakes I have learn that:
- You must keep a good set of backups. Test your backups regularly too.
- The clear choice for preserving all data of UNIX file systems is dump, which is only tool that guaranties recovery under all conditions. (see Torture-testing Backup and Archive Programs paper).
- Never use rsync with single backup directory. Create a snapshots using rsync or rsnapshots .
- Use CVS/git to store configuration files.
- Wait and read command line twice before hitting the dam [Enter] key.
- Use your well tested perl / shell scripts and open source configuration management software such as puppet, Ansible, Cfengine or Chef to configure all servers. This also applies to day today jobs such as creating the users and more.
Mistakes are the inevitable, so have you made any mistakes that have caused some sort of downtime? Please add them into the comments section below.
Dec 31, 2013 | unix.stackexchange.com
user2013619 ,Dec 31, 2013 at 0:43
I would like to use MC (midnight commander) to compress the selected dir with date in its name, e.g:dirname_20131231.tar.gz
The command in the User menu is :
tar -czf dirname_`date '+%Y%m%d'`.tar.gz %dThe archive is missing because
%m
, and%d
has another meaning in MC. I made an alias for the date, but it also doesn't work.Does anybody solved this problem ever?
John1024 ,Dec 31, 2013 at 1:06
To escape the percent signs, double them:tar -czf dirname_$(date '+%%Y%%m%%d').tar.gz %dThe above would compress the current directory (%d) to a file also in the current directory. If you want to compress the directory pointed to by the cursor rather than the current directory, use %f instead:
tar -czf %f_$(date '+%%Y%%m%%d').tar.gz %f
mc
handles escaping of special characters so there is no need to put %f in quotes.By the way, midnight commander's special treatment of percent signs occurs not just in the user menu file but also at the command line. This is an issue when using shell commands with constructs like
${var%.c}
. At the command line, the same as in the user menu file, percent signs can be escaped by doubling them.
Aug 07, 2019 | stackoverflow.com
Ask Question Asked 8 years, 3 months ago Active 1 month ago Viewed 104k times 106 45
porges ,Sep 6, 2012 at 17:43
Alright, so simple problem here. I'm working on a simple back up code. It works fine except if the files have spaces in them. This is how I'm finding files and adding them to a tar archive:find . -type f | xargs tar -czvf backup.tar.gzThe problem is when the file has a space in the name because tar thinks that it's a folder. Basically is there a way I can add quotes around the results from find? Or a different way to fix this?
Brad Parks ,Mar 2, 2017 at 18:35
Use this:find . -type f -print0 | tar -czvf backup.tar.gz --null -T -It will:
- deal with files with spaces, newlines, leading dashes, and other funniness
- handle an unlimited number of files
- won't repeatedly overwrite your backup.tar.gz like using
tar -c
withxargs
will do when you have a large number of filesAlso see:
- GNU tar manual
- How can I build a tar from stdin? , search for null
czubehead ,Mar 19, 2018 at 11:51
There could be another way to achieve what you want. Basically,
- Use the find command to output path to whatever files you're looking for. Redirect stdout to a filename of your choosing.
- Then tar with the -T option which allows it to take a list of file locations (the one you just created with find!)
find . -name "*.whatever" > yourListOfFiles tar -cvf yourfile.tar -T yourListOfFilesgsteff ,May 5, 2011 at 2:05
Try running:find . -type f | xargs -d "\n" tar -czvf backup.tar.gzCaleb Kester ,Oct 12, 2013 at 20:41
Why not:tar czvf backup.tar.gz *Sure it's clever to use find and then xargs, but you're doing it the hard way.
Update: Porges has commented with a find-option that I think is a better answer than my answer, or the other one:
find -print0 ... | xargs -0 ....
Kalibur x ,May 19, 2016 at 13:54
If you have multiple files or directories and you want to zip them into independent*.gz
file you can do this. Optional-type f -atime
find -name "httpd-log*.txt" -type f -mtime +1 -exec tar -vzcf {}.gz {} \;This will compress
httpd-log01.txt httpd-log02.txtto
httpd-log01.txt.gz httpd-log02.txt.gzFrank Eggink ,Apr 26, 2017 at 8:28
Why not give something like this a try:tar cvf scala.tar `find src -name *.scala`
tommy.carstensen ,Dec 10, 2017 at 14:55
Another solution as seen here :find var/log/ -iname "anaconda.*" -exec tar -cvzf file.tar.gz {} +Robino ,Sep 22, 2016 at 14:26
The best solution seem to be to create a file list and then archive files because you can use other sources and do something else with the list.For example this allows using the list to calculate size of the files being archived:
#!/bin/sh backupFileName="backup-big-$(date +"%Y%m%d-%H%M")" backupRoot="/var/www" backupOutPath="" archivePath=$backupOutPath$backupFileName.tar.gz listOfFilesPath=$backupOutPath$backupFileName.filelist # # Make a list of files/directories to archive # echo "" > $listOfFilesPath echo "${backupRoot}/uploads" >> $listOfFilesPath echo "${backupRoot}/extra/user/data" >> $listOfFilesPath find "${backupRoot}/drupal_root/sites/" -name "files" -type d >> $listOfFilesPath # # Size calculation # sizeForProgress=` cat $listOfFilesPath | while read nextFile;do if [ ! -z "$nextFile" ]; then du -sb "$nextFile" fi done | awk '{size+=$1} END {print size}' ` # # Archive with progress # ## simple with dump of all files currently archived #tar -czvf $archivePath -T $listOfFilesPath ## progress bar sizeForShow=$(($sizeForProgress/1024/1024)) echo -e "\nRunning backup [source files are $sizeForShow MiB]\n" tar -cPp -T $listOfFilesPath | pv -s $sizeForProgress | gzip > $archivePathuser3472383 ,Jun 27 at 1:11
Would add a comment to @Steve Kehlet post but need 50 rep (RIP).For anyone that has found this post through numerous googling, I found a way to not only find specific files given a time range, but also NOT include the relative paths OR whitespaces that would cause tarring errors. (THANK YOU SO MUCH STEVE.)
find . -name "*.pdf" -type f -mtime 0 -printf "%f\0" | tar -czvf /dir/zip.tar.gz --null -T -
.
relative directory-name "*.pdf"
look for pdfs (or any file type)-type f
type to look for is a file-mtime 0
look for files created in last 24 hours-printf "%f\0"
Regular-print0
OR-printf "%f"
did NOT work for me. From man pages:This quoting is performed in the same way as for GNU ls. This is not the same quoting mechanism as the one used for -ls and -fls. If you are able to decide what format to use for the output of find then it is normally better to use '\0' as a terminator than to use newline, as file names can contain white space and newline characters.
-czvf
create archive, filter the archive through gzip , verbosely list files processed, archive name
Aug 06, 2019 | stackoverflow.com
Tar archiving that takes input from a list of files Ask Question Asked 7 years, 9 months ago Active 6 months ago Viewed 123k times 131 29
Kurt McKee ,Apr 29 at 10:22
I have a file that contain list of files I want to archive with tar. Let's call itmylist.txt
It contains:
file1.txt file2.txt ... file10.txtIs there a way I can issue TAR command that takes
mylist.txt
as input? Something liketar -cvf allfiles.tar -[someoption?] mylist.txtSo that it is similar as if I issue this command:
tar -cvf allfiles.tar file1.txt file2.txt file10.txtStphane ,May 25 at 0:11
Yes:tar -cvf allfiles.tar -T mylist.txtdrue ,Jun 23, 2014 at 14:56
Assuming GNU tar (as this is Linux), the-T
or--files-from
option is what you want.Stphane ,Mar 1, 2016 at 20:28
You can also pipe in the file names which might be useful:find /path/to/files -name \*.txt | tar -cvf allfiles.tar -T -David C. Rankin ,May 31, 2018 at 18:27
Some versions of tar, for example, the default versions on HP-UX (I tested 11.11 and 11.31), do not include a command line option to specify a file list, so a decent work-around is to do this:tar cvf allfiles.tar $(cat mylist.txt)Jan ,Sep 25, 2015 at 20:18
On Solaris, you can use the option -I to read the filenames that you would normally state on the command line from a file. In contrast to the command line, this can create tar archives with hundreds of thousands of files (just did that).So the example would read
tar -cvf allfiles.tar -I mylist.txt,
For me on AIX, it worked as follows:tar -L List.txt -cvf BKP.tar
Aug 06, 2019 | stackoverflow.com
Shell command to tar directory excluding certain files/folders Ask Question Asked 10 years, 1 month ago Active 1 month ago Viewed 787k times 720 186
Rekhyt ,Jun 24, 2014 at 16:06
Is there a simple shell command/script that supports excluding certain files/folders from being archived?I have a directory that need to be archived with a sub directory that has a number of very large files I do not need to backup.
Not quite solutions:
The
tar --exclude=PATTERN
command matches the given pattern and excludes those files, but I need specific files & folders to be ignored (full file path), otherwise valid files might be excluded.I could also use the find command to create a list of files and exclude the ones I don't want to archive and pass the list to tar, but that only works with for a small amount of files. I have tens of thousands.
I'm beginning to think the only solution is to create a file with a list of files/folders to be excluded, then use rsync with
--exclude-from=file
to copy all the files to a tmp directory, and then use tar to archive that directory.Can anybody think of a better/more efficient solution?
EDIT: Charles Ma 's solution works well. The big gotcha is that the
--exclude='./folder'
MUST be at the beginning of the tar command. Full command (cd first, so backup is relative to that directory):cd /folder_to_backup tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .James O'Brien ,Nov 24, 2016 at 9:55
You can have multiple exclude options for tar so$ tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .etc will work. Make sure to put
--exclude
before the source and destination items.Johan Soderberg ,Jun 11, 2009 at 23:10
You can exclude directories with--exclude
for tar.If you want to archive everything except
/usr
you can use:tar -zcvf /all.tgz / --exclude=/usrIn your case perhaps something like
tar -zcvf archive.tgz arc_dir --exclude=dir/ignore_this_dircstamas ,Oct 8, 2018 at 18:02
Possible options to exclude files/directories from backup using tar:Exclude files using multiple patterns
tar -czf backup.tar.gz --exclude=PATTERN1 --exclude=PATTERN2 ... /path/to/backupExclude files using an exclude file filled with a list of patterns
tar -czf backup.tar.gz -X /path/to/exclude.txt /path/to/backupExclude files using tags by placing a tag file in any directory that should be skipped
tar -czf backup.tar.gz --exclude-tag-all=exclude.tag /path/to/backupAnish Ramaswamy ,Apr 1 at 16:18
old question with many answers, but I found that none were quite clear enough for me, so I would like to add my try.if you have the following structure
/home/ftp/mysite/with following file/folders
/home/ftp/mysite/file1 /home/ftp/mysite/file2 /home/ftp/mysite/file3 /home/ftp/mysite/folder1 /home/ftp/mysite/folder2 /home/ftp/mysite/folder3so, you want to make a tar file that contain everyting inside /home/ftp/mysite (to move the site to a new server), but
file3
is just junk, and everything infolder3
is also not needed, so we will skip those two.we use the format
tar -czvf <name of tar file> <what to tar> <any excludes>where the c = create, z = zip, and v = verbose (you can see the files as they are entered, usefull to make sure none of the files you exclude are being added). and f= file.
so, my command would look like this
cd /home/ftp/ tar -czvf mysite.tar.gz mysite --exclude='file3' --exclude='folder3'note the files/folders excluded are relatively to the root of your tar (I have tried full path here relative to / but I can not make that work).
hope this will help someone (and me next time I google it)
not2qubit ,Apr 4, 2018 at 3:24
You can use standard "ant notation" to exclude directories relative.
This works for me and excludes any .git or node_module directories.tar -cvf myFile.tar --exclude=**/.git/* --exclude=**/node_modules/* -T /data/txt/myInputFile.txt 2> /data/txt/myTarLogFile.txtmyInputFile.txt Contains:
/dev2/java
/dev2/javascriptGeertVc ,Feb 9, 2015 at 13:37
I've experienced that, at least with the Cygwin version of tar I'm using ("CYGWIN_NT-5.1 1.7.17(0.262/5/3) 2012-10-19 14:39 i686 Cygwin" on a Windows XP Home Edition SP3 machine), the order of options is important.While this construction worked for me:
tar cfvz target.tgz --exclude='<dir1>' --exclude='<dir2>' target_dirthat one didn't work:
tar cfvz --exclude='<dir1>' --exclude='<dir2>' target.tgz target_dirThis, while
tar --help
reveals the following:tar [OPTION...] [FILE]So, the second command should also work, but apparently it doesn't seem to be the case...
Best rgds,
Scott Stensland ,Feb 12, 2015 at 20:55
This exclude pattern handles filename suffix like png or mp3 as well as directory names like .git and node_modulestar --exclude={*.png,*.mp3,*.wav,.git,node_modules} -Jcf ${target_tarball} ${source_dirname}Michael ,May 18 at 23:29
I found this somewhere else so I won't take credit, but it worked better than any of the solutions above for my mac specific issues (even though this is closed):tar zc --exclude __MACOSX --exclude .DS_Store -f <archive> <source(s)>J. Lawson ,Apr 17, 2018 at 23:28
For those who have issues with it, some versions of tar would only work properly without the './' in the exclude value.Tar --versiontar (GNU tar) 1.27.1
Command syntax that work:
tar -czvf ../allfiles-butsome.tar.gz * --exclude=acme/fooThese will not work:
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude=./acme/foo $ tar -czvf ../allfiles-butsome.tar.gz * --exclude='./acme/foo' $ tar --exclude=./acme/foo -czvf ../allfiles-butsome.tar.gz * $ tar --exclude='./acme/foo' -czvf ../allfiles-butsome.tar.gz * $ tar -czvf ../allfiles-butsome.tar.gz * --exclude=/full/path/acme/foo $ tar -czvf ../allfiles-butsome.tar.gz * --exclude='/full/path/acme/foo' $ tar --exclude=/full/path/acme/foo -czvf ../allfiles-butsome.tar.gz * $ tar --exclude='/full/path/acme/foo' -czvf ../allfiles-butsome.tar.gz *Jerinaw ,May 6, 2017 at 20:07
For Mac OSX I had to do
tar -zcv --exclude='folder' -f theOutputTarFile.tar folderToTar
Note the
-f
after the--exclude=
Aaron Votre ,Jul 15, 2016 at 15:56
I agree the --exclude flag is the right approach.$ tar --exclude='./folder_or_file' --exclude='file_pattern' --exclude='fileA'A word of warning for a side effect that I did not find immediately obvious: The exclusion of 'fileA' in this example will search for 'fileA' RECURSIVELY!
Example:A directory with a single subdirectory containing a file of the same name (data.txt)
data.txt config.txt --+dirA | data.txt | config.docx
- If using
--exclude='data.txt'
the archive will not contain EITHER data.txt file. This can cause unexpected results if archiving third party libraries, such as a node_modules directory.- To avoid this issue make sure to give the entire path, like
--exclude='./dirA/data.txt'
Znik ,Nov 15, 2014 at 5:12
To avoid possible'xargs: Argument list too long'
errors due to the use offind ... | xargs ...
when processing tens of thousands of files, you can pipe the output offind
directly totar
usingfind ... -print0 | tar --null ...
.# archive a given directory, but exclude various files & directories # specified by their full file paths find "$(pwd -P)" -type d \( -path '/path/to/dir1' -or -path '/path/to/dir2' \) -prune \ -or -not \( -path '/path/to/file1' -or -path '/path/to/file2' \) -print0 | gnutar --null --no-recursion -czf archive.tar.gz --files-from - #bsdtar --null -n -czf archive.tar.gz -T -Mike ,May 9, 2014 at 21:29
After reading this thread, I did a little testing on RHEL 5 and here are my results for tarring up the abc directory:This will exclude the directories error and logs and all files under the directories:
tar cvpzf abc.tgz abc/ --exclude='abc/error' --exclude='abc/logs'Adding a wildcard after the excluded directory will exclude the files but preserve the directories:
tar cvpzf abc.tgz abc/ --exclude='abc/error/*' --exclude='abc/logs/*'Alex B ,Jun 11, 2009 at 23:03
Use the find command in conjunction with the tar append (-r) option. This way you can add files to an existing tar in a single step, instead of a two pass solution (create list of files, create tar).find /dir/dir -prune ... -o etc etc.... -exec tar rvf ~/tarfile.tar {} \;frommelmak ,Sep 10, 2012 at 14:08
You can also use one of the "--exclude-tag" options depending on your needs:
- --exclude-tag=FILE
- --exclude-tag-all=FILE
- --exclude-tag-under=FILE
The folder hosting the specified FILE will be excluded.
camh ,Jun 12, 2009 at 5:53
You can use cpio(1) to create tar files. cpio takes the files to archive on stdin, so if you've already figured out the find command you want to use to select the files the archive, pipe it into cpio to create the tar file:find ... | cpio -o -H ustar | gzip -c > archive.tar.gzPicoutputCls ,Aug 21, 2018 at 14:13
gnu tar v 1.26 the --exclude needs to come after archive file and backup directory arguments, should have no leading or trailing slashes, and prefers no quotes (single or double). So relative to the PARENT directory to be backed up, it's:
tar cvfz /path_to/mytar.tgz ./dir_to_backup --exclude=some_path/to_exclude
user2553863 ,May 28 at 21:41
After reading all this good answers for different versions and having solved the problem for myself, I think there are very small details that are very important, and rare to GNU/Linux general use , that aren't stressed enough and deserves more than comments.So I'm not going to try to answer the question for every case, but instead, try to register where to look when things doesn't work.
IT IS VERY IMPORTANT TO NOTICE:
- THE ORDER OF THE OPTIONS MATTER: it is not the same put the --exclude before than after the file option and directories to backup. This is unexpected at least to me, because in my experience, in GNU/Linux commands, usually the order of the options doesn't matter.
- Different tar versions expects this options in different order: for instance, @Andrew's answer indicates that in GNU tar v 1.26 and 1.28 the excludes comes last, whereas in my case, with GNU tar 1.29, it's the other way.
- THE TRAILING SLASHES MATTER : at least in GNU tar 1.29, it shouldn't be any .
In my case, for GNU tar 1.29 on Debian stretch, the command that worked was
tar --exclude="/home/user/.config/chromium" --exclude="/home/user/.cache" -cf file.tar /dir1/ /home/ /dir3/The quotes didn't matter, it worked with or without them.
I hope this will be useful to someone.
jørgensen ,Dec 19, 2015 at 11:10
Your best bet is to use find with tar, via xargs (to handle the large number of arguments). For example:find / -print0 | xargs -0 tar cjf tarfile.tar.bz2Ashwini Gupta ,Jan 12, 2018 at 10:30
tar -cvzf destination_folder source_folder -X /home/folder/excludes.txt-X indicates a file which contains a list of filenames which must be excluded from the backup. For Instance, you can specify *~ in this file to not include any filenames ending with ~ in the backup.
George ,Sep 4, 2013 at 22:35
Possible redundant answer but since I found it useful, here it is:While a FreeBSD root (i.e. using csh) I wanted to copy my whole root filesystem to /mnt but without /usr and (obviously) /mnt. This is what worked (I am at /):
tar --exclude ./usr --exclude ./mnt --create --file - . (cd /mnt && tar xvd -)My whole point is that it was necessary (by putting the ./ ) to specify to tar that the excluded directories where part of the greater directory being copied.
My €0.02
t0r0X ,Sep 29, 2014 at 20:25
I had no luck getting tar to exclude a 5 Gigabyte subdirectory a few levels deep. In the end, I just used the unix Zip command. It worked a lot easier for me.So for this particular example from the original post
(tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz . )The equivalent would be:
zip -r /backup/filename.zip . -x upload/folder/**\* upload/folder2/**\*
(NOTE: Here is the post I originally used that helped me https://superuser.com/questions/312301/unix-zip-directory-but-excluded-specific-subdirectories-and-everything-within-t )
RohitPorwal ,Jul 21, 2016 at 9:56
Check it outtar cvpzf zip_folder.tgz . --exclude=./public --exclude=./tmp --exclude=./log --exclude=fileNametripleee ,Sep 14, 2017 at 4:38
The following bash script should do the trick. It uses the answer given here by Marcus Sundman.#!/bin/bash echo -n "Please enter the name of the tar file you wish to create with out extension " read nam echo -n "Please enter the path to the directories to tar " read pathin echo tar -czvf $nam.tar.gz excludes=`find $pathin -iname "*.CC" -exec echo "--exclude \'{}\'" \;|xargs` echo $pathin echo tar -czvf $nam.tar.gz $excludes $pathinThis will print out the command you need and you can just copy and paste it back in. There is probably a more elegant way to provide it directly to the command line.
Just change *.CC for any other common extension, file name or regex you want to exclude and this should still work.
EDIT
Just to add a little explanation; find generates a list of files matching the chosen regex (in this case *.CC). This list is passed via xargs to the echo command. This prints --exclude 'one entry from the list'. The slashes () are escape characters for the ' marks.
Aug 06, 2019 | stackoverflow.com
More efficient way to find & tar millions of files Ask Question Asked 9 years, 3 months ago Active 8 months ago Viewed 25k times 22 13
theomega ,Apr 29, 2010 at 13:51
I've got a job running on my server at the command line prompt for a two days now:find data/ -name filepattern-*2009* -exec tar uf 2009.tar {} ;It is taking forever , and then some. Yes, there are millions of files in the target directory. (Each file is a measly 8 bytes in a well hashed directory structure.) But just running...
find data/ -name filepattern-*2009* -print > filesOfInterest.txt...takes only two hours or so. At the rate my job is running, it won't be finished for a couple of weeks .. That seems unreasonable. Is there a more efficient to do this? Maybe with a more complicated bash script?
A secondary questions is "why is my current approach so slow?"
Stu Thompson ,May 6, 2013 at 1:11
If you already did the second command that created the file list, just use the-T
option to tell tar to read the files names from that saved file list. Running 1 tar command vs N tar commands will be a lot better.Matthew Mott ,Jul 3, 2014 at 19:21
One option is to use cpio to generate a tar-format archive:$ find data/ -name "filepattern-*2009*" | cpio -ov --format=ustar > 2009.tarcpio works natively with a list of filenames from stdin, rather than a top-level directory, which makes it an ideal tool for this situation.
bashfu ,Apr 23, 2010 at 10:05
Here's a find-tar combination that can do what you want without the use of xargs or exec (which should result in a noticeable speed-up):tar --version # tar (GNU tar) 1.14 # FreeBSD find (on Mac OS X) find -x data -name "filepattern-*2009*" -print0 | tar --null --no-recursion -uf 2009.tar --files-from - # for GNU find use -xdev instead of -x gfind data -xdev -name "filepattern-*2009*" -print0 | tar --null --no-recursion -uf 2009.tar --files-from - # added: set permissions via tar find -x data -name "filepattern-*2009*" -print0 | \ tar --null --no-recursion --owner=... --group=... --mode=... -uf 2009.tar --files-from -Stu Thompson ,Apr 28, 2010 at 12:50
There is xargs for this:find data/ -name filepattern-*2009* -print0 | xargs -0 tar uf 2009.tarGuessing why it is slow is hard as there is not much information. What is the structure of the directory, what filesystem do you use, how it was configured on creating. Having milions of files in single directory is quite hard situation for most filesystems.
bashfu ,May 1, 2010 at 14:18
To correctly handle file names with weird (but legal) characters (such as newlines, ...) you should write your file list to filesOfInterest.txt using find's -print0:find -x data -name "filepattern-*2009*" -print0 > filesOfInterest.txt tar --null --no-recursion -uf 2009.tar --files-from filesOfInterest.txtMichael Aaron Safyan ,Apr 23, 2010 at 8:47
The way you currently have things, you are invoking the tar command every single time it finds a file, which is not surprisingly slow. Instead of taking the two hours to print plus the amount of time it takes to open the tar archive, see if the files are out of date, and add them to the archive, you are actually multiplying those times together. You might have better success invoking the tar command once, after you have batched together all the names, possibly using xargs to achieve the invocation. By the way, I hope you are using 'filepattern-*2009*' and not filepattern-*2009* as the stars will be expanded by the shell without quotes.ruffrey ,Nov 20, 2018 at 17:13
There is a utility for this calledtarsplitter
.tarsplitter -m archive -i folder/*.json -o archive.tar -p 8will use 8 threads to archive the files matching "folder/*.json" into an output archive of "archive.tar"
syneticon-dj ,Jul 22, 2013 at 8:47
Simplest (also remove file after archive creation):find *.1 -exec tar czf '{}.tgz' '{}' --remove-files \;
Aug 06, 2019 | unix.stackexchange.com
Fastest way combine many files into one (tar czf is too slow) Ask Question Asked 7 years, 11 months ago Active 21 days ago Viewed 32k times 22 5
Gilles ,Nov 5, 2013 at 0:05
Currently I'm runningtar czf
to combine backup files. The files are in a specific directory.But the number of files is growing. Using
tzr czf
takes too much time (more than 20 minutes and counting).I need to combine the files more quickly and in a scalable fashion.
I've found
genisoimage
,readom
andmkisofs
. But I don't know which is fastest and what the limitations are for each of them.Rufo El Magufo ,Aug 24, 2017 at 7:56
You should check if most of your time are being spent on CPU or in I/O. Either way, there are ways to improve it:A: don't compress
You didn't mention "compression" in your list of requirements so try dropping the "z" from your arguments list:
tar cf
. This might be speed up things a bit.There are other techniques to speed-up the process, like using "-N " to skip files you already backed up before.
B: backup the whole partition with dd
Alternatively, if you're backing up an entire partition, take a copy of the whole disk image instead. This would save processing and a lot of disk head seek time.
tar
and any other program working at a higher level have a overhead of having to read and process directory entries and inodes to find where the file content is and to do more head disk seeks , reading each file from a different place from the disk.To backup the underlying data much faster, use:
dd bs=16M if=/dev/sda1 of=/another/filesystem
(This assumes you're not using RAID, which may change things a bit)
,
To repeat what others have said: we need to know more about the files that are being backed up. I'll go with some assumptions here. Append to the tar fileIf files are only being added to the directories (that is, no file is being deleted), make sure you are appending to the existing tar file rather than re-creating it every time. You can do this by specifying the existing archive filename in your
Write to a different disktar
command instead of a new one (or deleting the old one).Reading from the same disk you are writing to may be killing performance. Try writing to a different disk to spread the I/O load. If the archive file needs to be on the same disk as the original files, move it afterwards.
Don't compressJust repeating what @Yves said. If your backup files are already compressed, there's not much need to compress again. You'll just be wasting CPU cycles.
Aug 02, 2019 | superuser.com
How to tar directory and then remove originals including the directory? Ask Question Asked 9 years, 6 months ago Active 4 years, 6 months ago Viewed 124k times 28 7
mit ,Dec 7, 2016 at 1:22
I'm trying totar
a collection of files in a directory called 'my_directory' and remove the originals by using the command:tar -cvf files.tar my_directory --remove-filesHowever it is only removing the individual files inside the directory and not the directory itself (which is what I specified in the command). What am I missing here?
EDIT:
Yes, I suppose the 'remove-files' option is fairly literal. Although I too found the man page unclear on that point. (In linux I tend not to really distinguish much between directories and files that much, and forget sometimes that they are not the same thing). It looks like the consensus is that it doesn't remove directories.
However, my major prompting point for asking this question stems from tar's handling of absolute paths. Because you must specify a relative path to a file/s to be compressed, you therefore must change to the parent directory to tar it properly. As I see it using any kind of follow-on 'rm' command is potentially dangerous in that situation. Thus I was hoping to simplify things by making tar itself do the remove.
For example, imagine a backup script where the directory to backup (ie. tar) is included as a shell variable. If that shell variable value was badly entered, it is possible that the result could be deleted files from whatever directory you happened to be in last.
Arjan ,Feb 13, 2016 at 13:08
You are missing the part which says the--remove-files
option removes files after adding them to the archive.You could follow the archive and file-removal operation with a command like,
find /path/to/be/archived/ -depth -type d -empty -exec rmdir {} \;
Update: You may be interested in reading this short Debian discussion on,
Bug 424692: --remove-files complains that directories "changed as we read it" .Kim ,Feb 13, 2016 at 13:08
Since the--remove-files
option only removes files , you could trytar -cvf files.tar my_directory && rm -R my_directoryso that the directory is removed only if the
tar
returns an exit status of 0redburn ,Feb 13, 2016 at 13:08
Have you tried to put --remove-files directive after archive name? It works for me.tar -cvf files.tar --remove-files my_directoryshellking ,Oct 4, 2010 at 19:58
source={directory argument}e.g.
source={FULL ABSOLUTE PATH}/my_directoryparent={parent directory of argument}e.g.
parent={ABSOLUTE PATH of 'my_directory'/logFile={path to a run log that captures status messages}Then you could execute something along the lines of:
cd ${parent} tar cvf Tar_File.`date%Y%M%D_%H%M%S` ${source} if [ $? != 0 ] then echo "Backup FAILED for ${source} at `date` >> ${logFile} else echo "Backup SUCCESS for ${source} at `date` >> ${logFile} rm -rf ${source} fimit ,Nov 14, 2011 at 13:21
This was probably a bug.Also the word "file" is ambigous in this case. But because this is a command line switch I would it expect to mean also directories, because in unix/lnux everything is a file, also a directory. (The other interpretation is of course also valid, but It makes no sense to keep directories in such a case. I would consider it unexpected and confusing behavior.)
But I have found that in gnu tar on some distributions gnu tar actually removes the directory tree. Another indication that keeping the tree was a bug. Or at least some workaround until they fixed it.
This is what I tried out on an ubuntu 10.04 console:
mit:/var/tmp$ mkdir tree1 mit:/var/tmp$ mkdir tree1/sub1 mit:/var/tmp$ > tree1/sub1/file1 mit:/var/tmp$ ls -la drwxrwxrwt 4 root root 4096 2011-11-14 15:40 . drwxr-xr-x 16 root root 4096 2011-02-25 03:15 .. drwxr-xr-x 3 mit mit 4096 2011-11-14 15:40 tree1 mit:/var/tmp$ tar -czf tree1.tar.gz tree1/ --remove-files # AS YOU CAN SEE THE TREE IS GONE NOW: mit:/var/tmp$ ls -la drwxrwxrwt 3 root root 4096 2011-11-14 15:41 . drwxr-xr-x 16 root root 4096 2011-02-25 03:15 .. -rw-r--r-- 1 mit mit 159 2011-11-14 15:41 tree1.tar.gz mit:/var/tmp$ tar --version tar (GNU tar) 1.22 Copyright © 2009 Free Software Foundation, Inc.If you want to see it on your machine, paste this into a console at your own risk:
tar --version cd /var/tmp mkdir -p tree1/sub1 > tree1/sub1/file1 tar -czf tree1.tar.gz tree1/ --remove-files ls -la
Jul 31, 2019 | www.linux.com
Mounting archives with FUSE and archivemount Author: Ben Martin The archivemount FUSE filesystem lets you mount a possibly compressed tarball as a filesystem. Because FUSE exposes its filesystems through the Linux kernel, you can use any application to load and save files directly into such mounted archives. This lets you use your favourite text editor, image viewer, or music player on files that are still inside an archive file. Going one step further, because archivemount also supports write access for some archive formats, you can edit a text file directly from inside an archive too.
I couldn't find any packages that let you easily install archivemount for mainstream distributions. Its distribution includes a single source file and a Makefile.
archivemount depends on libarchive for the heavy lifting. Packages of libarchive exist for Ubuntu Gutsy and openSUSE for not for Fedora. To compile libarchive you need to have uudecode installed; my version came with the sharutils package on Fedora 8. Once you have uudecode, you can build libarchive using the standard
./configure; make; sudo make install
process.With libarchive installed, either from source or from packages, simply invoke
# cp -av archivemount /usr/local/bin/make
to build archivemount itself. To install archivemount, copy its binary into /usr/local/bin and set permissions appropriately. A common setup on Linux distributions is to have a fuse group that a user must be a member of in order to mount a FUSE filesystem. It makes sense to have the archivemount command owned by this group as a reminder to users that they require that permission in order to use the tool. Setup is shown below:
# chown root:fuse /usr/local/bin/archivemount
# chmod 550 /usr/local/bin/archivemountTo show how you can use archivemount I'll first create a trivial compressed tarball, then mount it with archivemount. You can then explore the directory structure of the contents of the tarball with the ls command, and access a file from the archive directly with cat.
$ mkdir -p /tmp/archivetest
$ cd /tmp/archivetest
$ date >datefile1
$ date >datefile2
$ mkdir subA
$ date >subA/foobar
$ cd /tmp
$ tar czvf archivetest.tar.gz archivetest
$ mkdir testing
$ archivemount archivetest.tar.gz testing
$ ls -l testing/archivetest/
-rw-r--r-- 0 root root 29 2008-04-02 21:04 datefile1
-rw-r--r-- 0 root root 29 2008-04-02 21:04 datefile2
drwxr-xr-x 0 root root 0 2008-04-02 21:04 subA
$ cat testing/archivetest/datefile2
Wed Apr 2 21:04:08 EST 2008Next, I'll create a new file in the archive and read its contents back again. Notice that the first use of the tar command directly on the tarball does not show that the newly created file is in the archive. This is because archivemount delays all write operations until the archive is unmounted. After issuing the
$ date > testing/archivetest/new-file1fusermount -u
command, the new file is added to the archive itself.
$ cat testing/archivetest/new-file1
Wed Apr 2 21:12:07 EST 2008
$ tar tzvf archivetest.tar.gz
drwxr-xr-x root/root 0 2008-04-02 21:04 archivetest/
-rw-r--r-- root/root 29 2008-04-02 21:04 archivetest/datefile2
-rw-r--r-- root/root 29 2008-04-02 21:04 archivetest/datefile1
drwxr-xr-x root/root 0 2008-04-02 21:04 archivetest/subA/
-rw-r--r-- root/root 29 2008-04-02 21:04 archivetest/subA/foobar$ fusermount -u testing
$ tar tzvf archivetest.tar.gz
drwxr-xr-x root/root 0 2008-04-02 21:04 archivetest/
-rw-r--r-- root/root 29 2008-04-02 21:04 archivetest/datefile2
-rw-r--r-- root/root 29 2008-04-02 21:04 archivetest/datefile1
drwxr-xr-x root/root 0 2008-04-02 21:04 archivetest/subA/
-rw-r--r-- root/root 29 2008-04-02 21:04 archivetest/subA/foobar
-rw-rw-r-- ben/ben 29 2008-04-02 21:12 archivetest/new-file1When you unmount a FUSE filesystem, the unmount command can return before the FUSE filesystem has fully exited. This can lead to a situation where the FUSE filesystem might run into an error in some processing but not have a good place to report that error. The archivemount documentation warns that if there is an error writing changes to an archive during unmount then archivemount cannot be blamed for a loss of data. Things are not quite as grim as they sound though. I mounted a tar.gz archive to which I had only read access and attempted to create new files and write to existing ones. The operations failed immediately with a "Read-only filesystem" message.
In an effort to trick archivemount into losing data, I created an archive in a format that libarchive has only read support for. I created archivetest.zip with the original contents of the archivetest directory and mounted it. Creating a new file worked, and reading it back was fine. As expected from the warnings on the README file for archivemount, I did not see any error message when I unmounted the zip file. However, attempting to view the manifest of the zip file with
unzip -l
failed. It turns out that my archivemount operations had turned the file into archivetest.zip, which was now a non-compressed POSIX tar archive. Usingtar tvf
I saw that the manifest of the archivetest.zip tar archive included the contents including the new file that I created. There was also a archivetest.zip.orig which was in zip format and contained the contents of the zip archive when I mounted it with archivemount.So it turns out to be fairly tricky to get archivemount to lose data. Mounting a read-only archive file didn't work, and modifying an archive format that libarchive could only read from didn't work, though in the last case you will have to contend with the archive format being silently changed. One other situation could potentially trip you up: Because archivemount creates a new archive at unmount time, you should make sure that you will not run out of disk space where the archives are stored.
To test archivemount's performance, I used the bonnie++ filesystem benchmark version 1.03. Because archivemount holds off updating the actual archive until the filesystem is unmounted, you will get good performance when accessing and writing to a mounted archive. As shown below, when comparing the use of archivemount on an archive file stored in /tmp to direct access to a subdirectory in /tmp, seek times for archivemount were halved on average relative to direct access, and you can expect about 70% of the performance of direct access when using archivemount for rewriting. The bonnie++ documentation explains that for the rewrite test, a chunk of data is a read, dirtied, and written back to a file, and this requires a seek, so archivemount's slower seek performance likely causes this benchmark to be slower as well.
$ cd /tmp
$ mkdir empty
$ ls -d empty | cpio -ov > empty.cpio
$ mkdir empty-mounted
$ archivemount empty.cpio empty-mounted
$ mkdir bonnie-test
$ /usr/sbin/bonnie++ -d /tmp/bonnie-test
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
v8tsrv 2G 14424 25 14726 4 13930 6 28502 49 52581 17 8322 123$ /usr/sbin/bonnie++ -d /tmp/empty-mounted
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
v8tsrv 2G 12016 19 12918 7 9766 6 27543 40 52937 6 4457 24When you want to pluck a few files out of a tarball, archivemount might be just the command for the job. Instead of expanding the archive into /tmp just to load a few files into Emacs, just mount the archive and run Emacs directly on the archivemount filesystem. As the bonnie++ benchmarks above show, an application using an archivemount filesystem does not necessarily suffer a performance hit.
Jul 31, 2019 | www.gnu.org
In the last chapter, you learned about the first three operations to
tar
. This chapter presents the remaining five operations totar
: `--append' , `--update' , `--concatenate' , `--delete' , and `--compare' .You are not likely to use these operations as frequently as those covered in the last chapter; however, since they perform specialized functions, they are quite useful when you do need to use them. We will give examples using the same directory and files that you created in the last chapter. As you may recall, the directory is called `practice' , the files are `jazz' , `blues' , `folk' , and the two archive files you created are `collection.tar' and `music.tar' .
We will also use the archive files `afiles.tar' and `bfiles.tar' . The archive `afiles.tar' contains the members `apple' , `angst' , and `aspic' ; `bfiles.tar' contains the members `./birds' , `baboon' , and `./box' .
Unless we state otherwise, all practicing you do and examples you follow in this chapter will take place in the `practice' directory that you created in the previous chapter; see Preparing a Practice Directory for Examples . (Below in this section, we will remind you of the state of the examples where the last chapter left them.)
The five operations that we will cover in this chapter are:
- `--append'
- `-r'
- Add new entries to an archive that already exists.
- `--update'
- `-u'
- Add more recent copies of archive members to the end of an archive, if they exist.
- `--concatenate'
- `--catenate'
- `-A'
- Add one or more pre-existing archives to the end of another archive.
- `--delete'
- Delete items from an archive (does not work on tapes).
- `--compare'
- `--diff'
- `-d'
- Compare archive members to their counterparts in the file system.
4.2.2 How to Add Files to Existing Archives: `--append'
[ < ] [ > ] [ << ] [ Up ] [ >> ] [ Top ] [ Contents ] [ Index ] [ ? ] If you want to add files to an existing archive, you don't need to create a new archive; you can use `--append' ( `-r' ). The archive must already exist in order to use `--append' . (A related operation is the `--update' operation; you can use this to add newer versions of archive members to an existing archive. To learn how to do this with `--update' , see section Updating an Archive .)
If you use `--append' to add a file that has the same name as an archive member to an archive containing that archive member, then the old member is not deleted. What does happen, however, is somewhat complex.
tar
allows you to have infinite number of files with the same name. Some operations treat these same-named members no differently than any other set of archive members: for example, if you view an archive with `--list' ( `-t' ), you will see all of those members listed, with their data modification times, owners, etc.Other operations don't deal with these members as perfectly as you might prefer; if you were to use `--extract' to extract the archive, only the most recently added copy of a member with the same name as other members would end up in the working directory. This is because `--extract' extracts an archive in the order the members appeared in the archive; the most recently archived members will be extracted last. Additionally, an extracted member will replace a file of the same name which existed in the directory already, and
tar
will not prompt you about this (10) . Thus, only the most recently archived member will end up being extracted, as it will replace the one extracted before it, and so on.There exists a special option that allows you to get around this behavior and extract (or list) only a particular copy of the file. This is `--occurrence' option. If you run
tar
with this option, it will extract only the first copy of the file. You may also give this option an argument specifying the number of copy to be extracted. Thus, for example if the archive `archive.tar' contained three copies of file `myfile' , then the command
tar --extract --file archive.tar --occurrence=2 myfilewould extract only the second copy. See section --occurrence , for the description of `--occurrence' option.
See hag - you might want to incorporate some of the above into the MMwtSN node; not sure. i didn't know how to make it simpler...
There are a few ways to get around this. Xref to Multiple Members with the Same Name, maybe.
If you want to replace an archive member, use `--delete' to delete the member you want to remove from the archive, and then use `--append' to add the member you want to be in the archive. Note that you can not change the order of the archive; the most recently added member will still appear last. In this sense, you cannot truly "replace" one member with another. (Replacing one member with another will not work on certain types of media, such as tapes; see Removing Archive Members Using `--delete' and Tapes and Other Archive Media , for more information.)
4.2.2.1 Appending Files to an Archive
4.2.2.1 Appending Files to an Archive 4.2.2.2 Multiple Members with the Same Name The simplest way to add a file to an already existing archive is the `--append' ( `-r' ) operation, which writes specified files into the archive whether or not they are already among the archived files.
When you use `--append' , you must specify file name arguments, as there is no default. If you specify a file that already exists in the archive, another copy of the file will be added to the end of the archive. As with other operations, the member names of the newly added files will be exactly the same as their names given on the command line. The `--verbose' ( `-v' ) option will print out the names of the files as they are written into the archive.
`--append' cannot be performed on some tape drives, unfortunately, due to deficiencies in the formats those tape drives use. The archive must be a valid
tar
archive, or else the results of using this operation will be unpredictable. See section Tapes and Other Archive Media .To demonstrate using `--append' to add a file to an archive, create a file called `rock' in the `practice' directory. Make sure you are in the `practice' directory. Then, run the following
tar
command to add `rock' to `collection.tar' :
$ tar --append --file=collection.tar rockIf you now use the `--list' ( `-t' ) operation, you will see that `rock' has been added to the archive:
4.2.2.2 Multiple Members with the Same Name
$ tar --list --file=collection.tar -rw-r--r-- me/user 28 1996-10-18 16:31 jazz -rw-r--r-- me/user 21 1996-09-23 16:44 blues -rw-r--r-- me/user 20 1996-09-23 16:44 folk -rw-r--r-- me/user 20 1996-09-23 16:44 rockYou can use `--append' ( `-r' ) to add copies of files which have been updated since the archive was created. (However, we do not recommend doing this since there is another
tar
option called `--update' ; See section Updating an Archive , for more information. We describe this use of `--append' here for the sake of completeness.) When you extract the archive, the older version will be effectively lost. This works because files are extracted from an archive in the order in which they were archived. Thus, when the archive is extracted, a file archived later in time will replace a file of the same name which was archived earlier, even though the older version of the file will remain in the archive unless you delete all versions of the file.Supposing you change the file `blues' and then append the changed version to `collection.tar' . As you saw above, the original `blues' is in the archive `collection.tar' . If you change the file and append the new version of the file to the archive, there will be two copies in the archive. When you extract the archive, the older version of the file will be extracted first, and then replaced by the newer version when it is extracted.
You can append the new, changed copy of the file `blues' to the archive in this way:
$ tar --append --verbose --file=collection.tar blues bluesBecause you specified the `--verbose' option,
tar
has printed the name of the file being appended as it was acted on. Now list the contents of the archive:
$ tar --list --verbose --file=collection.tar -rw-r--r-- me/user 28 1996-10-18 16:31 jazz -rw-r--r-- me/user 21 1996-09-23 16:44 blues -rw-r--r-- me/user 20 1996-09-23 16:44 folk -rw-r--r-- me/user 20 1996-09-23 16:44 rock -rw-r--r-- me/user 58 1996-10-24 18:30 bluesThe newest version of `blues' is now at the end of the archive (note the different creation dates and file sizes). If you extract the archive, the older version of the file `blues' will be replaced by the newer version. You can confirm this by extracting the archive and running `ls' on the directory.
If you wish to extract the first occurrence of the file `blues' from the archive, use `--occurrence' option, as shown in the following example:
$ tar --extract -vv --occurrence --file=collection.tar blues -rw-r--r-- me/user 21 1996-09-23 16:44 bluesSee section Changing How
4.2.3 Updating an Archivetar
Writes Files , for more information on `--extract' and see -occurrence , for a description of `--occurrence' option.In the previous section, you learned how to use `--append' to add a file to an existing archive. A related operation is `--update' ( `-u' ). The `--update' operation updates a
tar
archive by comparing the date of the specified archive members against the date of the file with the same name. If the file has been modified more recently than the archive member, then the newer version of the file is added to the archive (as with `--append' ).Unfortunately, you cannot use `--update' with magnetic tape drives. The operation will fail.
See other examples of media on which -update will fail? need to ask charles and/or mib/thomas/dave shevett..
Both `--update' and `--append' work by adding to the end of the archive. When you extract a file from the archive, only the version stored last will wind up in the file system, unless you use the `--backup' option. See section Multiple Members with the Same Name , for a detailed discussion.
4.2.3.1 How to Update an Archive Using `--update'You must use file name arguments with the `--update' ( `-u' ) operation. If you don't specify any files,
tar
won't act on any files and won't tell you that it didn't do anything (which may end up confusing you).To see the `--update' option at work, create a new file, `classical' , in your practice directory, and some extra text to the file `blues' , using any text editor. Then invoke
tar
with the `update' operation and the `--verbose' ( `-v' ) option specified, using the names of all the files in the `practice' directory as file name arguments:
$ tar --update -v -f collection.tar blues folk rock classical blues classical $Because we have specified verbose mode,
tar
prints out the names of the files it is working on, which in this case are the names of the files that needed to be updated. If you run `tar --list' and look at the archive, you will see `blues' and `classical' at its end. There will be a total of two versions of the member `blues' ; the one at the end will be newer and larger, since you added text before updating it.The reason
tar
does not overwrite the older file when updating it is that writing to the middle of a section of tape is a difficult process. Tapes are not designed to go backward. See section Tapes and Other Archive Media , for more information about tapes.`--update' ( `-u' ) is not suitable for performing backups for two reasons: it does not change directory content entries, and it lengthens the archive every time it is used. The GNU
tar
options intended specifically for backups are more efficient. If you need to run backups, please consult Performing Backups and Restoring Files .
4.2.4 Combining Archives with `--concatenate'
[ < ] [ > ] [ << ] [ Up ] [ >> ] [ Top ] [ Contents ] [ Index ] [ ? ] Sometimes it may be convenient to add a second archive onto the end of an archive rather than adding individual files to the archive. To add one or more archives to the end of another archive, you should use the `--concatenate' ( `--catenate' , `-A' ) operation.
To use `--concatenate' , give the first archive with `--file' option and name the rest of archives to be concatenated on the command line. The members, and their member names, will be copied verbatim from those archives to the first one (11) . The new, concatenated archive will be called by the same name as the one given with the `--file' option. As usual, if you omit `--file' ,
tar
will use the value of the environment variableTAPE
, or, if this has not been set, the default archive name.See There is no way to specify a new name...
To demonstrate how `--concatenate' works, create two small archives called `bluesrock.tar' and `folkjazz.tar' , using the relevant files from `practice' :
$ tar -cvf bluesrock.tar blues rock blues rock $ tar -cvf folkjazz.tar folk jazz folk jazzIf you like, You can run `tar --list' to make sure the archives contain what they are supposed to:
$ tar -tvf bluesrock.tar -rw-r--r-- melissa/user 105 1997-01-21 19:42 blues -rw-r--r-- melissa/user 33 1997-01-20 15:34 rock $ tar -tvf jazzfolk.tar -rw-r--r-- melissa/user 20 1996-09-23 16:44 folk -rw-r--r-- melissa/user 65 1997-01-30 14:15 jazzWe can concatenate these two archives with
tar
:
$ cd .. $ tar --concatenate --file=bluesrock.tar jazzfolk.tarIf you now list the contents of the `bluesrock.tar' , you will see that now it also contains the archive members of `jazzfolk.tar' :
$ tar --list --file=bluesrock.tar blues rock folk jazzWhen you use `--concatenate' , the source and target archives must already exist and must have been created using compatible format parameters. Notice, that
tar
does not check whether the archives it concatenates have compatible formats, it does not even check if the files are really tar archives.Like `--append' ( `-r' ), this operation cannot be performed on some tape drives, due to deficiencies in the formats those tape drives use.
It may seem more intuitive to you to want or try to use
cat
to concatenate two archives instead of using the `--concatenate' operation; after all,cat
is the utility for combining files.However,
tar
archives incorporate an end-of-file marker which must be removed if the concatenated archives are to be read properly as one archive. `--concatenate' removes the end-of-archive marker from the target archive before each new archive is appended. If you usecat
to combine the archives, the result will not be a validtar
format archive. If you need to retrieve files from an archive that was added to using thecat
utility, use the `--ignore-zeros' ( `-i' ) option. See section Ignoring Blocks of Zeros , for further information on dealing with archives improperly combined using thecat
shell utility.
4.2.5 Removing Archive Members Using `--delete'
[ < ] [ > ] [ << ] [ Up ] [ >> ] [ Top ] [ Contents ] [ Index ] [ ? ] You can remove members from an archive by using the `--delete' option. Specify the name of the archive with `--file' ( `-f' ) and then specify the names of the members to be deleted; if you list no member names, nothing will be deleted. The `--verbose' option will cause
tar
to print the names of the members as they are deleted. As with `--extract' , you must give the exact member names when using `tar --delete' . `--delete' will remove all versions of the named file from the archive. The `--delete' operation can run very slowly.Unlike other operations, `--delete' has no short form.
This operation will rewrite the archive. You can only use `--delete' on an archive if the archive device allows you to write to any point on the media, such as a disk; because of this, it does not work on magnetic tapes. Do not try to delete an archive member from a magnetic tape; the action will not succeed, and you will be likely to scramble the archive and damage your tape. There is no safe way (except by completely re-writing the archive) to delete files from most kinds of magnetic tape. See section Tapes and Other Archive Media .
To delete all versions of the file `blues' from the archive `collection.tar' in the `practice' directory, make sure you are in that directory, and then,
$ tar --list --file=collection.tar blues folk jazz rock $ tar --delete --file=collection.tar blues $ tar --list --file=collection.tar folk jazz rockSee Check if the above listing is actually produced after running all the examples on collection.tar.
The `--delete' option has been reported to work properly when
4.2.6 Comparing Archive Members with the File Systemtar
acts as a filter fromstdin
tostdout
.The `--compare' ( `-d' ), or `--diff' operation compares specified archive members against files with the same names, and then reports differences in file size, mode, owner, modification date and contents. You should only specify archive member names, not file names. If you do not name any members, then
tar
will compare the entire archive. If a file is represented in the archive but does not exist in the file system,tar
reports a difference.You have to specify the record size of the archive when modifying an archive with a non-default record size.
tar
ignores files in the file system that do not have corresponding members in the archive.The following example compares the archive members `rock' , `blues' and `funk' in the archive `bluesrock.tar' with files of the same name in the file system. (Note that there is no file, `funk' ;
tar
will report an error message.)
$ tar --compare --file=bluesrock.tar rock blues funk rock blues tar: funk not found in archiveThe spirit behind the `--compare' ( `--diff' , `-d' ) option is to check whether the archive represents the current state of files on disk, more than validating the integrity of the archive media. For this latter goal, see Verifying Data as It is Stored .
Jul 30, 2019 | askubuntu.com
- @ChristopheDeTroyer Tarballs are compressed in such a way that you have to decompress them in full, then take out the file you want. I think that .zip folders are different, so if you want to be able to take out individual files fast, try them. – GKFX Jun 3 '16 at 13:04
Jan 05, 2017 | stackoverflow.com
How do I tar a directory of files and folders without including the directory itself? Ask Question Asked 10 years, 1 month ago Active 8 months ago Viewed 464k times 348 105
tvanfosson ,Jan 5, 2017 at 12:29
I typically do:tar -czvf my_directory.tar.gz my_directoryWhat if I just want to include everything (including any hidden system files) in my_directory, but not the directory itself? I don't want:
my_directory --- my_file --- my_file --- my_fileI want:
my_file my_file my_filePanCrit ,Feb 19 at 13:04
cd my_directory/ && tar -zcvf ../my_dir.tgz . && cd -should do the job in one line. It works well for hidden files as well. "*" doesn't expand hidden files by path name expansion at least in bash. Below is my experiment:
$ mkdir my_directory $ touch my_directory/file1 $ touch my_directory/file2 $ touch my_directory/.hiddenfile1 $ touch my_directory/.hiddenfile2 $ cd my_directory/ && tar -zcvf ../my_dir.tgz . && cd .. ./ ./file1 ./file2 ./.hiddenfile1 ./.hiddenfile2 $ tar ztf my_dir.tgz ./ ./file1 ./file2 ./.hiddenfile1 ./.hiddenfile2JCotton ,Mar 3, 2015 at 2:46
Use the-C
switch of tar:tar -czvf my_directory.tar.gz -C my_directory .The
-C my_directory
tells tar to change the current directory tomy_directory
, and then.
means "add the entire current directory" (including hidden files and sub-directories).Make sure you do
-C my_directory
before you do.
or else you'll get the files in the current directory.Digger ,Mar 23 at 6:52
You can also create archive as usual and extract it with:tar --strip-components 1 -xvf my_directory.tar.gzjwg ,Mar 8, 2017 at 12:56
Have a look at--transform
/--xform
, it gives you the opportunity to massage the file name as the file is added to the archive:% mkdir my_directory % touch my_directory/file1 % touch my_directory/file2 % touch my_directory/.hiddenfile1 % touch my_directory/.hiddenfile2 % tar -v -c -f my_dir.tgz --xform='s,my_directory/,,' $(find my_directory -type f) my_directory/file2 my_directory/.hiddenfile1 my_directory/.hiddenfile2 my_directory/file1 % tar -t -f my_dir.tgz file2 .hiddenfile1 .hiddenfile2 file1Transform expression is similar to that of
sed
, and we can use separators other than/
(,
in the above example).
https://www.gnu.org/software/tar/manual/html_section/tar_52.htmlAlex ,Mar 31, 2017 at 15:40
TL;DRfind /my/dir/ -printf "%P\n" | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -With some conditions (archive only files, dirs and symlinks):
find /my/dir/ -printf "%P\n" -type f -o -type l -o -type d | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -ExplanationThe below unfortunately includes a parent directory
./
in the archive:tar -czf mydir.tgz -C /my/dir .You can move all the files out of that directory by using the
--transform
configuration option, but that doesn't get rid of the.
directory itself. It becomes increasingly difficult to tame the command.You could use
$(find ...)
to add a file list to the command (like in magnus' answer ), but that potentially causes a "file list too long" error. The best way is to combine it with tar's-T
option, like this:find /my/dir/ -printf "%P\n" -type f -o -type l -o -type d | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -Basically what it does is list all files (
-type f
), links (-type l
) and subdirectories (-type d
) under your directory, make all filenames relative using-printf "%P\n"
, and then pass that to the tar command (it takes filenames from STDIN using-T -
). The-C
option is needed so tar knows where the files with relative names are located. The--no-recursion
flag is so that tar doesn't recurse into folders it is told to archive (causing duplicate files).If you need to do something special with filenames (filtering, following symlinks etc), the
find
command is pretty powerful, and you can test it by just removing thetar
part of the above command:$ find /my/dir/ -printf "%P\n" -type f -o -type l -o -type d > textfile.txt > documentation.pdf > subfolder2 > subfolder > subfolder/.gitignoreFor example if you want to filter PDF files, add
! -name '*.pdf'
$ find /my/dir/ -printf "%P\n" -type f ! -name '*.pdf' -o -type l -o -type d > textfile.txt > subfolder2 > subfolder > subfolder/.gitignoreNon-GNU findThe command uses
printf
(available in GNUfind
) which tellsfind
to print its results with relative paths. However, if you don't have GNUfind
, this works to make the paths relative (removes parents withsed
):find /my/dir/ -type f -o -type l -o -type d | sed s,^/my/dir/,, | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -BrainStone ,Dec 21, 2016 at 22:14
This Answer should work in most situations. Notice however how the filenames are stored in the tar file as, for example,./file1
rather than justfile1
. I found that this caused problems when using this method to manipulate tarballs used as package files in BuildRoot .One solution is to use some Bash globs to list all files except for
..
like this:tar -C my_dir -zcvf my_dir.tar.gz .[^.]* ..?* *This is a trick I learnt from this answer .
Now tar will return an error if there are no files matching
..?*
or.[^.]*
, but it will still work. If the error is a problem (you are checking for success in a script), this works:shopt -s nullglob tar -C my_dir -zcvf my_dir.tar.gz .[^.]* ..?* * shopt -u nullglobThough now we are messing with shell options, we might decide that it is neater to have
*
match hidden files:shopt -s dotglob tar -C my_dir -zcvf my_dir.tar.gz * shopt -u dotglobThis might not work where your shell globs
*
in the current directory, so alternatively, use:shopt -s dotglob cd my_dir tar -zcvf ../my_dir.tar.gz * cd .. shopt -u dotglobPanCrit ,Jun 14, 2010 at 6:47
cd my_directory tar zcvf ../my_directory.tar.gz *anion ,May 11, 2018 at 14:10
If it's a Unix/Linux system, and you care about hidden files (which will be missed by *), you need to do:cd my_directory tar zcvf ../my_directory.tar.gz * .??*I don't know what hidden files look like under Windows.
gpz500 ,Feb 27, 2014 at 10:46
I would propose the following Bash function (first argument is the path to the dir, second argument is the basename of resulting archive):function tar_dir_contents () { local DIRPATH="$1" local TARARCH="$2.tar.gz" local ORGIFS="$IFS" IFS=$'\n' tar -C "$DIRPATH" -czf "$TARARCH" $( ls -a "$DIRPATH" | grep -v '\(^\.$\)\|\(^\.\.$\)' ) IFS="$ORGIFS" }You can run it in this way:
$ tar_dir_contents /path/to/some/dir my_archiveand it will generate the archive
my_archive.tar.gz
within current directory. It works with hidden (.*) elements and with elements with spaces in their filename.med ,Feb 9, 2017 at 17:19
cd my_directory && tar -czvf ../my_directory.tar.gz $(ls -A) && cd ..This one worked for me and it's include all hidden files without putting all files in a root directory named "." like in tomoe's answer :
Breno Salgado ,Apr 16, 2016 at 15:42
Use pax.Pax is a deprecated package but does the job perfectly and in a simple fashion.
pax -w > mydir.tar mydirasynts ,Jun 26 at 16:40
Simplest way I found:cd my_dir && tar -czvf ../my_dir.tar.gz *
marcingo ,Aug 23, 2016 at 18:04
# tar all files within and deeper in a given directory # with no prefixes ( neither <directory>/ nor ./ ) # parameters: <source directory> <target archive file> function tar_all_in_dir { { cd "$1" && find -type f -print0; } \ | cut --zero-terminated --characters=3- \ | tar --create --file="$2" --directory="$1" --null --files-from=- }Safely handles filenames with spaces or other unusual characters. You can optionally add a
-name '*.sql'
or similar filter to the find command to limit the files included.user1456599 ,Feb 13, 2013 at 21:37
tar -cvzf tarlearn.tar.gz --remove-files mytemp/*If the folder is mytemp then if you apply the above it will zip and remove all the files in the folder but leave it alone
tar -cvzf tarlearn.tar.gz --remove-files --exclude='*12_2008*' --no-recursion mytemp/*You can give exclude patterns and also specify not to look into subfolders too
Aaron Digulla ,Jun 2, 2009 at 15:33
tar -C my_dir -zcvf my_dir.tar.gz `ls my_dir`
Jul 28, 2019 | askubuntu.com
CMCDragonkai, Jun 3, 2016 at 13:04
1. Using the Command-line tarYes, just give the full stored path of the file after the tarball name.
Example: suppose you want file
etc/apt/sources.list
frometc.tar
:tar -xf etc.tar etc/apt/sources.listWill extract
sources.list
and create directoriesetc/apt
under the current directory.2. Extract it with the Archive Manager
- You can use the
-t
listing option instead of-x
, maybe along with grep , to find the path of the file you want- You can also extract a single directory
- tar has other options like
--wildcards
, etc. for more advanced partial extraction scenarios; seeman tar
Open the tar in Archive Manager from Nautilus, go down into the folder hierarchy to find the file you need, and extract it.
3. Using Nautilus/Archive-Mounter
- On a server or command-line system, use a text-based file manager such as Midnight Commander (
mc
) to accomplish the same.Right-click the tar in Nautilus, and select Open with ArchiveMounter.
The tar will now appear similar to a removable drive on the left, and you can explore/navigate it like a normal drive and drag/copy/paste any file(s) you need to any destination.
Jul 28, 2019 | unix.stackexchange.com
,
Midnight Commander
uses virtual filesystem (VFS
) for displying files, such as contents of a.tar.gz
archive, or of.iso
image. This is configured inmc.ext
with rules such as this one (Open
is Enter ,View
is F3 ):regex/\.([iI][sS][oO])$ Open=%cd %p/iso9660:// View=%view{ascii} isoinfo -d -i %fWhen I press Enter on an
.iso
file,mc
will open the.iso
and I can browse individual files. This is very useful.Now my question: I have also files which are disk images, i.e. created with
pv /dev/sda1 > sda1.img
I would like
mc
to "browse" the files inside these images in the same fashion as.iso
.Is this possible ? How would such rule look like ?
Jul 28, 2019 | raymii.org
Find files in tar archives and extract specific files from tar archives
Published: 17-10-2018 | Author: Remy van Elst | Text only version of this article
Table of Contents
This is a small tip, to find specific files in tar archives and how to extract those specific files from said archive. Usefull when you have a 2 GB large tar file with millions of small files, and you need just one. Finding files in tar archivesUsing the command line flags
-ft
(long flags are--file --list
) we can list the contents of an archive. Usinggrep
you can search that list for the correct file. Example:tar -ft large_file.tar.gz | grep "the-filename-you-want"Output:
"full-path/to-the-file/in-the-archive/the-filename-you-want"With a modern tar on modern linux you can omit the flags for compressed archives and just pass a
Extracting one file from a tar archive.tar.gz
or.tar.bz2
file directly.When extracting a tar archive, you can specify the filename of the file you want (full path, use the command above to find it), as the second command line option. Example:
tar -xf large_file.tar.gz "full-path/to-the-file/in-the-archive/the-filename-you-want"It might just take a long time, at least for my 2 GB file it took a while.
An alternative is to use "mc" (midnight commander), which can open archive files just a a local folder.
Tags: archive , bash , grep , shell , snippets , tar
Jul 28, 2019 | www.linode.com
- Another tool that can save you time is Midnight Commander's user menu. Go back to
/tmp/test
where you created nine files. Press F2 and bring up the user menu. Select Compress the current subdirectory (tar.gz) . After you choose the name for the archive, this will be created in/tmp
(one level up from the directory being compressed). If you highlight the .tar.gz file and press ENTER you'll notice it will open like a regular directory. This allows you to browse archives and extract files by simply copying them ( F5 ) to the opposite panel's working directory.- To find out the size of a directory (actually, the size of all the files it contains), highlight the directory and then press CTRL+SPACE .
- To search, go up in your directory tree until you reach the top level,
/
, called root directory. Now press F9 , then c , followed by f . After the Find File dialog opens, type*.gz
. This will find any accessible gzip archive on the system. In the results dialog, press l (L) for Panelize . All the results will be fed to one of your panels so you can easily browse, copy, view and so on. If you enter a directory from that list, you lose the list of found files, but you can easily return to it with F9 , l (L) then z (to select Panelize from the Left menu).- Managing files is not always done locally. Midnight Commander also supports accessing remote filesystems through SSH's Secure File Transfer Protocol, SFTP . This way you can easily transfer files between servers.
Press F9 , followed by l (L), then select the SFTP link menu entry. In the dialog box titled SFTP to machine enter
sftp://[email protected]
. Replaceexample
with the username you have created on the remote machine and203.0.113.1
with the IP address of your server. This will work only if the server at the other end accepts password logins. If you're logging in with SSH keys, then you'll first need to create and/or edit~/.ssh/config
. It could look something like this:
- ~/.ssh/config
1 2 3 4 5 Host sftp_server HostName 203.0.113.1 Port 22 User your_user IdentityFile ~/.ssh/id_rsaYou can choose whatever you want as the Host value, it's only an identifier. IdentityFile is the path to your private SSH key.
After the config file is setup, access your SFTP server by typing the identifier value you set after Host in the SFTP to machine dialog. In this example, enter
sftp_server
.
Jul 28, 2019 | kosiara87.blogspot.com
Midnight Commander how to compress a file/directory; Make a tar archive with midnight commander
To compress a file in Midnight Commader (e.g. to make a tar.gz archive) navigate to the directory you want to pack and press 'F2'. This will bring up the 'User menu'. Choose the option 'Compress the current subdirectory'. This will compress the WHOLE directory you're currently in - not the highlighted directory.
Jun 23, 2019 | stackoverflow.com
user1118764 , Sep 7, 2012 at 6:58
I normally compress usingtar zcvf
and decompress usingtar zxvf
(using gzip due to habit).I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I notice that many of the cores are unused during compression/decompression.
Is there any way I can utilize the unused cores to make it faster?
Warren Severin , Nov 13, 2017 at 4:37
The solution proposed by Xiong Chiamiov above works beautifully. I had just backed up my laptop with .tar.bz2 and it took 132 minutes using only one cpu thread. Then I compiled and installed tar from source: gnu.org/software/tar I included the options mentioned in the configure step: ./configure --with-gzip=pigz --with-bzip2=lbzip2 --with-lzip=plzip I ran the backup again and it took only 32 minutes. That's better than 4X improvement! I watched the system monitor and it kept all 4 cpus (8 threads) flatlined at 100% the whole time. THAT is the best solution. – Warren Severin Nov 13 '17 at 4:37Mark Adler , Sep 7, 2012 at 14:48
You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz:tar cf - paths-to-archive | pigz > archive.tar.gzBy default, pigz uses the number of available cores, or eight if it could not query that. You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can request better compression with -9. E.g.
tar cf - paths-to-archive | pigz -9 -p 32 > archive.tar.gzuser788171 , Feb 20, 2013 at 12:43
How do you use pigz to decompress in the same fashion? Or does it only work for compression?Mark Adler , Feb 20, 2013 at 16:18
pigz does use multiple cores for decompression, but only with limited improvement over a single core. The deflate format does not lend itself to parallel decompression.The decompression portion must be done serially. The other cores for pigz decompression are used for reading, writing, and calculating the CRC. When compressing on the other hand, pigz gets close to a factor of n improvement with n cores.
Garrett , Mar 1, 2014 at 7:26
The hyphen here is stdout (see this page ).Mark Adler , Jul 2, 2014 at 21:29
Yes. 100% compatible in both directions.Mark Adler , Apr 23, 2015 at 5:23
There is effectively no CPU time spent tarring, so it wouldn't help much. The tar format is just a copy of the input file with header blocks in between files.Jen , Jun 14, 2013 at 14:34
You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use.For example use:
tar -c --use-compress-program=pigz -f tar.file dir_to_zipValerio Schiavoni , Aug 5, 2014 at 22:38
Unfortunately by doing so the concurrent feature of pigz is lost. You can see for yourself by executing that command and monitoring the load on each of the cores. – Valerio Schiavoni Aug 5 '14 at 22:38bovender , Sep 18, 2015 at 10:14
@ValerioSchiavoni: Not here, I get full load on all 4 cores (Ubuntu 15.04 'Vivid'). – bovender Sep 18 '15 at 10:14Valerio Schiavoni , Sep 28, 2015 at 23:41
On compress or on decompress ? – Valerio Schiavoni Sep 28 '15 at 23:41Offenso , Jan 11, 2017 at 17:26
I prefertar - dir_to_zip | pv | pigz > tar.file
pv helps me estimate, you can skip it. But still it easier to write and remember. – Offenso Jan 11 '17 at 17:26Maxim Suslov , Dec 18, 2014 at 7:31
Common approachThere is option for
tar
program:-I, --use-compress-program PROG filter through PROG (must accept -d)You can use multithread version of archiver or compressor utility.
Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:
$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive $ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archiveArchiver must accept -d. If your replacement utility hasn't this parameter and/or you need specify additional parameters, then use pipes (add parameters if necessary):
$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz $ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gzInput and output of singlethread and multithread are compatible. You can compress using multithread version and decompress using singlethread version and vice versa.
p7zipFor p7zip for compression you need a small shell script like the following:
#!/bin/sh case $1 in -d) 7za -txz -si -so e;; *) 7za -txz -si -so a .;; esac 2>/dev/nullSave it as 7zhelper.sh. Here the example of usage:
$ tar -I 7zhelper.sh -cf OUTPUT_FILE.tar.7z paths_to_archive $ tar -I 7zhelper.sh -xf OUTPUT_FILE.tar.7zxzRegarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils, you can utilize multiple cores for compression by setting
-T
or--threads
to an appropriate value via the environmental variable XZ_DEFAULTS (e.g.XZ_DEFAULTS="-T 0"
).This is a fragment of man for 5.1.0alpha version:
Multithreaded compression and decompression are not implemented yet, so this option has no effect for now.
However this will not work for decompression of files that haven't also been compressed with threading enabled. From man for version 5.2.2:
Recompiling with replacementThreaded decompression hasn't been implemented yet. It will only work on files that contain multiple blocks with size information in block headers. All files compressed in multi-threaded mode meet this condition, but files compressed in single-threaded mode don't even if --block-size=size is used.
If you build tar from sources, then you can recompile with parameters
--with-gzip=pigz --with-bzip2=lbzip2 --with-lzip=plzipAfter recompiling tar with these options you can check the output of tar's help:
$ tar --help | grep "lbzip2\|plzip\|pigz" -j, --bzip2 filter the archive through lbzip2 --lzip filter the archive through plzip -z, --gzip, --gunzip, --ungzip filter the archive through pigzmpibzip2 , Apr 28, 2015 at 20:57
I just found pbzip2 and mpibzip2 . mpibzip2 looks very promising for clusters or if you have a laptop and a multicore desktop computer for instance. – user1985657 Apr 28 '15 at 20:57oᴉɹǝɥɔ , Jun 10, 2015 at 17:39
Processing STDIN may in fact be slower. – oᴉɹǝɥɔ Jun 10 '15 at 17:39selurvedu , May 26, 2016 at 22:13
Plus 1 forxz
option. It the simplest, yet effective approach. – selurvedu May 26 '16 at 22:13panticz.de , Sep 1, 2014 at 15:02
You can use the shortcut-I
for tar's--use-compress-program
switch, and invokepbzip2
for bzip2 compression on multiple cores:tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 DIRECTORY_TO_COMPRESS/einpoklum , Feb 11, 2017 at 15:59
A nice TL;DR for @MaximSuslov's answer . – einpoklum Feb 11 '17 at 15:59If you want to have more flexibility with filenames and compression options, you can use:find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec \ tar -P --transform='s@/my/path/@@g' -cf - {} + | \ pigz -9 -p 4 > myarchive.tar.gzStep 1:find
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec
This command will look for the files you want to archive, in this case
/my/path/*.sql
and/my/path/*.log
. Add as many-o -name "pattern"
as you want.Step 2:
-exec
will execute the next command using the results offind
:tar
tar
tar -P --transform='s@/my/path/@@g' -cf - {} +
--transform
is a simple string replacement parameter. It will strip the path of the files from the archive so the tarball's root becomes the current directory when extracting. Note that you can't use-C
option to change directory as you'll lose benefits offind
: all files of the directory would be included.
-P
tellstar
to use absolute paths, so it doesn't trigger the warning "Removing leading `/' from member names". Leading '/' with be removed by--transform
anyway.
-cf -
tellstar
to use the tarball name we'll specify laterStep 3:
{} +
uses everyfiles thatfind
found previouslypigz
pigz -9 -p 4
Use as many parameters as you want. In this case
Step 4: archive name-9
is the compression level and-p 4
is the number of cores dedicated to compression. If you run this on a heavy loaded webserver, you probably don't want to use all available cores.
> myarchive.tar.gz
Finally.
Jul 20, 2017 | www.linuxjournal.com
Anonymous, 11/08/2002
At an unnamed location it happened thus... The customer had been using a home built 'tar' -based backup system for a long time. They were informed enough to have even tested and verified that recovery would work also.
Everything had been working fine, and they even had to do a recovery which went fine. Well, one day something evil happened to a disk and they had to replace the unit and do a full recovery.
Things looked fine until someone noticed that a directory with critically important and sensitive data was missing. Turned out that some manager had decided to 'secure' the directory by doing 'chmod 000 dir' to protect the data from inquisitive eyes when the data was not being used.
Of course, tar complained about the situation and returned with non-null status, but since the backup procedure had seemed to work fine, no one thought it necessary to view the logs...
Jul 20, 2017 | www.linuxjournal.com
Anonymous on Fri, 11/08/2002 - 03:00.
Anonymous on Sun, 11/10/2002 - 03:00.The Subject, not the content, really brings back memories.
Imagine this, your tasked with complete control over the network in a multi-million dollar company. You've had some experience in the real world of network maintaince, but mostly you've learned from breaking things at home.
Time comes to implement (yes this was a startup company), a backup routine. You carefully consider the best way to do it and decide copying data to a holding disk before the tape run would be perfect in the situation, faster restore if the holding disk is still alive.
So off you go configuring all your servers for ssh pass through, and create the rsync scripts. Then before the trial run you think it would be a good idea to create a local backup of all the websites.
You logon to the web server, create a temp directory and start testing your newly advance rsync skills. After a couple of goes, you think your ready for the real thing, but you decide to run the test one more time.
Everything seems fine so you delete the temp directory. You pause for a second and your month drops open wider than it has ever opened before, and a feeling of terror overcomes you. You want to hide in a hole and hope you didn't see what you saw.
I RECURSIVELY DELETED ALL THE LIVE CORPORATE WEBSITES ON FRIDAY AFTERNOON AT 4PM!
This is why it's ALWAYS A GOOD IDEA to use Midnight Commander or something similar to delete directories!!
...Root for (5) years and never trashed a filesystem yet (knockwoody)...
Anonymous on Fri, 11/08/2002 - 03:00.
rsync with ssh as the transport mechanism works very well with my nightly LAN backups. I've found this page to be very helpful: http://www.mikerubel.org/computers/rsync_snapshots/
Nov 01, 2018 | opensource.com
In a well-known data center (whose name I do not want to remember), one cold October night we had a production outage in which thousands of web servers stopped responding due to downtime in the main database. The database administrator asked me, the rookie sysadmin, to recover the database's last full backup and restore it to bring the service back online.
But, at the end of the process, the database was still broken. I didn't worry, because there were other full backup files in stock. However, even after doing the process several times, the result didn't change.
With great fear, I asked the senior sysadmin what to do to fix this behavior.
"You remember when I showed you, a few days ago, how the full backup script was running? Something about how important it was to validate the backup?" responded the sysadmin.
"Of course! You told me that I had to stay a couple of extra hours to perform that task," I answered. "Exactly! But you preferred to leave early without finishing that task," he said. "Oh my! I thought it was optional!" I exclaimed.
"It was, it was "
Moral of the story: Even with the best solution that promises to make the most thorough backups, the ghost of the failed restoration can appear, darkening our job skills, if we don't make a habit of validating the backup every time.
Nov 08, 2002 | www.linuxjournal.com
Anonymous on Fri, 11/08/2002
Why don't you just buy an extra hard disk and have a copy of your important data there. With today's prices it doesn't cost anything.
Anonymous on Fri, 11/08/2002 - 03:00. A lot of people seams to have this idea, and in many situations it should work fine.
However, there is the human factor. Sometimes simple things go wrong (as simple as copying a file), and it takes a while before anybody notices that the contents of this file is not what is expected. This means you have to have many "generations" of backup of the file in order to be able to restore it, and in order to not put all the "eggs in the same basket" each of the file backups should be on a physical device.
Also, backing up to another disk in the same computer will probably not save you when lighting strikes, as the backup disk is just as likely to be fried as the main disk.
In real life, the backup strategy and hardware/software choices to support it is (as most other things) a balancing act. The important thing is that you have a strategy, and that you test it regularly to make sure it works as intended (as the main point is in the article). Also, realizing that achieving 100% backup security is impossible might save a lot of time in setting up the strategy.
(I.e. you have to say that this strategy has certain specified limits, like not being able to restore a file to its intermediate state sometime during a workday, only to the state it had when it was last backed up, which should be a maximum of xxx hours ago and so on...)
Hallvard P
Feb 05, 2012 | www.mysysad.com
Frankly speaking, I did not want to waste time and bandwidth downloading images. Here is the syntax to exclude a directory.# tar cvfp mytarball.tar /mypath/Example.com_DIR --exclude=/mypath/Example.com_DIR/images
Tar everything in the current directory but exclude two files
# tar cvpf mytar.tar * --exclude=index.html --exclude=myimage.png
Apr 28, 2018 | serverfault.com
up vote 2 down vote favorite 3
Im new to linux backup.
Im thinking of full system backup of my linux server using tar. I came up with the following code:tar -zcvpf /archive/fullbackup.tar.gz \ --exclude=/archive \ --exclude=/mnt \ --exclude=/proc \ --exclude=/lost+found \ --exclude=/dev \ --exclude=/sys \ --exclude=/tmp \ /and if in need of any hardware problem, restore it with
cd / tar -zxpvf fullbackup.tar.gzBut does my above code back up MBR and filesystem? Will the above code be enough to bring the same server back? linux backup tar share | improve this question edited Feb 14 '13 at 10:17
But does my above code back up MBR and filesystem?
Hennes 4,470 1 13 27
No. It backs up the contents of the filesystem.
Not the MBR which is not a file but is contained in a sector outside the file systems. And not the filesystem with it potentially tweaked settings and or errors, just the contents of the file system (granted, that is a minor difference).
and if in need of any hardware problem, restore it with
cd / tar -zxpvf fullbackup.tar.gzWill the above code be enough to bring the same server back?
Probably, as long as you use the same setup. The tarball will just contain the files, not the partition scheme used for the disks. So you will have to partition the disk in the same way. (Or copy the old partition scheme, e.g. with
dd if=/dev/sda of=myMBRbackup bs=512 count=1
).Note that there are better ways to create backups, some of which already have been answered in other posts. Personally I would just backup the configuration and the data. Everything else is merely a matter of reinstalling. Possibly even with the latest version.
Also not that tar will backup all files. The first time that is a good thing.
But if you run that weekly or daily you will get a lot of large backups. In that case look at rsync (which does incremental changes) or one of the many other options. share | improve this answer edited Feb 14 '13 at 8:14 answered Feb 14 '13 at 7:29
- Could you paste the URL of other post for better ways to create backups? I have been looking for it but couldn't get the just way of doing it. – misamisa Feb 14 '13 at 22:37
- It depends on how you want to make backups. If you have several servers, try using Amanda ( amanda.org ), if you just have two servers use rsync (comes with the OS). If you have a build in tape drive use your tar command. If you want a once off backup with a nice graphical front consider Acronis True Image. Etc etc. – Hennes Feb 14 '13 at 22:41
- It's single Linux server (CentOS - shell access only) which I want to take a full system backup once. Once the backup is taken locally, I will transfer that backup to other PC to keep it safe. – misamisa Feb 14 '13 at 23:02
- In that case why create the backup on a server and then transfer it. You can do this instead: One the host to store your backup run
nc -l 1234 > MyBackupOfComputerA.tgz
. And on the computer to archive run the tar command with as output filename-
and append| nc ComputerB 1234
. That-
will write to std out instead of to a file. The|
pipes the std out text to netcat which forwards it to another computer. The nc on the other computer listens for the information and writes it to a file. No intermediate copy is needed. – Hennes Feb 14 '13 at 23:05- As you told me above that the tar command won't backup the MBR and filesystem. I am looking for a way to have them in a backup in case of harddisk failure. – misamisa Feb 14 '13 at 23:08
Using tar to backup/restore a system is pretty rudimentary, and by that I mean that there are probably more elegant ways out there to backup your system... If you really want to stick to tar, here's a very good guide I found (it includes instructions on backing up the MBR; grub specifically).=: https://help.ubuntu.com/community/BackupYourSystem/TAR While it's on the Ubuntu wiki website, there's no reason why it wouldn't work on any UNIX/Linux machine.
You may also wish to check out this: https://help.ubuntu.com/community/BackupYourSystem
If you'd like something with a nice web GUI that's relatively straightforward to set up and use: http://backuppc.sourceforge.net/
Floyd Feb 14 '13 at 6:40
- 1 Let me second the pointer to backuppc. If you have several machines to backup, backuppc is a dream come true. Best thing of it is it's deduplication and compression. –
Using remastersys :
You can create live iso of your existing system. so install all the required packages on your ubuntu and then take a iso using remastersys. Then using startup disk, you can create bootable usb from this iso.
edit your /etc/apt/sources.list file. Add the following line in the end of the file.
deb http://www.remastersys.com/ubuntu precise main
Then run the following command:
sudo apt-get update
sudo apt-get install remastersys
sudo apt-get install remastersys-gui
sudo apt-get install remastersys-gtk
To run the remastersys in gui mode, type the following command:
sudo remastersys-gui share | improve this answer
Apr 28, 2018 | www.lylebackenroth.com
Source: help.ubuntu.com
Preparing for backup
Just a quick note. You are about to back up your entire system. Don't forget to empty your Wastebasket, remove any unwanted files in your /home directory, and cleanup your desktop.
Depending on why you're backing up, you might want to: Delete all your emails; Clear your browser search history; Wipe your saved browser personal details
- Unmount any external media devices, and remove any CDs/DVDs not needed for the backup process.
- This will lessen the amount of exclusions you need to type later in the process.
.... ... ...
Files that are bigger than 2GB are not supported by some implementations of ISO9660 and may not be restorable. So don't simply burn a DVD with a huge .iso file on it. Split it up using the command split or use a different way to get it onto the DVD. See man split for further information on split.
A possible workaround is the following:
sudo tar –create –bzip2 –exclude /tmp –one-file-system –sparse / | growisofs -use-the-force-luke -Z /dev/hda=/proc/self/fd/0
Note that this only backs up one file system. You might want to use –exclude instead of –one-file-system to filter out the stuff you don't want backed up. This assumes your DVD drive is /dev/hda. This will not create a mountable DVD. To restore it you will reference the device file:
sudo tar –extract –bzip2 –file /dev/hda
At the end of the process you might get a message along the lines of 'tar: Error exit delayed from previous errors' or something, but in most cases you can just ignore that.
Another workaround would be to Bzip2 to compress your backup. Bzip2 provides a higher compression ratio at the expense of speed. If compression is important to you, just substitute the z in the command with j, and change the file name to backup.tar.bz2 . That would make the command look like this:
tar -cvpjf /backup.tar.bz2 –exclude=/proc –exclude=/lost+found –exclude=/backup.tar.bz2 –exclude=/mnt –exclude=/sys /
tar
listing.Apr 28, 2018 | stackoverflow.com
Udo G ,May 9, 2012 at 7:13
I'm usingtar
to make daily backups of a server and want to avoid backup of /proc and /sys system directories, but without excluding any directories named "proc" or "sys" somewhere else in the file tree.For, example having the following directory tree (" bla " being normal files):
# find . ./sys ./sys/bla ./foo ./foo/sys ./foo/sys/blaI would like to exclude
./sys
but not./foo/sys
.I can't seem to find an
--exclude
pattern that does that...# tar cvf /dev/null * --exclude=sys foo/or...
# tar cvf /dev/null * --exclude=/sys foo/ foo/sys/ foo/sys/bla sys/ sys/blaAny ideas? (Linux Debian 6)
drinchev ,May 9, 2012 at 7:19
Are you sure there is no exclude? If you are using MAC OS it is a different story! Look here – drinchev May 9 '12 at 7:19Udo G ,May 9, 2012 at 7:21
Not sure I understand your question. There is a--exclude
option, but I don't know how to match it for single, absolute file names (not any file by that name) - see my examples above. – Udo G May 9 '12 at 7:21paulsm4 ,May 9, 2012 at 7:22
Look here: stackoverflow.com/questions/984204/ – paulsm4 May 9 '12 at 7:22CharlesB ,May 9, 2012 at 7:29
You can specify absolute paths to the exclude pattern, this way othersys
orproc
directories will be archived:tar --exclude=/sys --exclude=/proc /Udo G ,May 9, 2012 at 7:34
True, but the important detail about this is that the excluded file name must match exactly the notation reported by thetar
listing. For my example that would be./sys
- as I just found out now. – Udo G May 9 '12 at 7:34pjv ,Apr 9, 2013 at 18:14
In this case you might want to use:--anchored --exclude=sys/\*because in case your tar does not show the leading "/" you have a problem with the filter.
Savvas Radevic ,May 9, 2013 at 10:44
This did the trick for me, thank you! I wanted to exclude a specific directory, not all directories/subdirectories matching the pattern. bsdtar does not have "--anchored" option though, and with bsdtar we can use full paths to exclude specific folders. – Savvas Radevic May 9 '13 at 10:44Savvas Radevic ,May 9, 2013 at 10:58
ah found it! in bsdtar the anchor is "^":bsdtar cvjf test.tar.bz2 --exclude myfile.avi --exclude "^myexcludedfolder" *
– Savvas Radevic May 9 '13 at 10:58Stephen Donecker ,Nov 8, 2012 at 19:12
Using tar you can exclude directories by placing a tag file in any directory that should be skipped.Create tag files,
touch /sys/.exclude_from_backup touch /proc/.exclude_from_backupThen,
tar -czf backup.tar.gz --exclude-tag-all=.exclude_from_backup *pjv ,Apr 9, 2013 at 17:58
Good idea in theory but often /sys and /proc cannot be written to. – pjv Apr 9 '13 at 17:58
Apr 27, 2018 | stackoverflow.com
deepwell ,Jun 11, 2009 at 22:57
Is there a simple shell command/script that supports excluding certain files/folders from being archived?I have a directory that need to be archived with a sub directory that has a number of very large files I do not need to backup.
Not quite solutions:
The
tar --exclude=PATTERN
command matches the given pattern and excludes those files, but I need specific files & folders to be ignored (full file path), otherwise valid files might be excluded.I could also use the find command to create a list of files and exclude the ones I don't want to archive and pass the list to tar, but that only works with for a small amount of files. I have tens of thousands.
I'm beginning to think the only solution is to create a file with a list of files/folders to be excluded, then use rsync with
--exclude-from=file
to copy all the files to a tmp directory, and then use tar to archive that directory.Can anybody think of a better/more efficient solution?
EDIT: cma 's solution works well. The big gotcha is that the
--exclude='./folder'
MUST be at the beginning of the tar command. Full command (cd first, so backup is relative to that directory):cd /folder_to_backup tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .Rekhyt ,May 1, 2012 at 12:55
Another thing caught me out on that, might be worth a note:Trailing slashes at the end of excluded folders will cause tar to not exclude those folders at all. – Rekhyt May 1 '12 at 12:55
Brice ,Jun 24, 2014 at 16:06
I had to remove the single quotation marks in order to exclude sucessfully the directories. (tar -zcvf gatling-charts-highcharts-1.4.6.tar.gz /opt/gatling-charts-highcharts-1.4.6 --exclude=results --exclude=target
) – Brice Jun 24 '14 at 16:06Charles Ma ,Jun 11, 2009 at 23:11
You can have multiple exclude options for tar so$ tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .etc will work. Make sure to put
--exclude
before the source and destination items.shasi kanth ,Feb 27, 2015 at 10:49
As an example, if you are trying to backup your wordpress project folder, excluding the uploads folder, you can use this command:tar -cvf wordpress_backup.tar wordpress --exclude=wp-content/uploads –
Alfred Bez ,Jul 16, 2015 at 7:28
I came up with the following command:tar -zcv --exclude='file1' --exclude='pattern*' --exclude='file2' -f /backup/filename.tgz .
note that the-f
flag needs to precede the tar file see:flickerfly ,Aug 21, 2015 at 16:22
A "/" on the end of the exclude directory will cause it to fail. I guess tar thinks an ending / is part of the directory name to exclude. BAD: --exclude=mydir/ GOOD: --exclude=mydir – flickerfly Aug 21 '15 at 16:22NightKnight on Cloudinsidr.com ,Nov 24, 2016 at 9:55
> Make sure to put --exclude before the source and destination items. OR use an absolute path for the exclude: tar -cvpzf backups/target.tar.gz --exclude='/home/username/backups' /home/username – NightKnight on Cloudinsidr.com Nov 24 '16 at 9:55Johan Soderberg ,Jun 11, 2009 at 23:10
To clarify, you can use full path for --exclude. – Johan Soderberg Jun 11 '09 at 23:10Stephen Donecker ,Nov 8, 2012 at 0:22
Possible options to exclude files/directories from backup using tar:Exclude files using multiple patterns
tar -czf backup.tar.gz --exclude=PATTERN1 --exclude=PATTERN2 ... /path/to/backupExclude files using an exclude file filled with a list of patterns
tar -czf backup.tar.gz -X /path/to/exclude.txt /path/to/backupExclude files using tags by placing a tag file in any directory that should be skipped
tar -czf backup.tar.gz --exclude-tag-all=exclude.tag /path/to/backupAnish Ramaswamy ,May 16, 2015 at 0:11
This answer definitely helped me! The gotcha for me was that my command looked something liketar -czvf mysite.tar.gz mysite --exclude='./mysite/file3' --exclude='./mysite/folder3'
, and this didn't exclude anything. – Anish Ramaswamy May 16 '15 at 0:11Hubert ,Feb 22, 2017 at 7:38
Nice and clear thank you. For me the issue was that other answers include absolute or relative paths. But all you have to do is add the name of the folder you want to exclude. – Hubert Feb 22 '17 at 7:38GeertVc ,Dec 31, 2013 at 13:35
Just want to add to the above, that it is important that the directory to be excluded should NOT contain a final backslash. So, --exclude='/path/to/exclude/dir' is CORRECT , --exclude='/path/to/exclude/dir/' is WRONG . – GeertVc Dec 31 '13 at 13:35Eric Manley ,May 14, 2015 at 14:10
You can use standard "ant notation" to exclude directories relative.
This works for me and excludes any .git or node_module directories.tar -cvf myFile.tar --exclude=**/.git/* --exclude=**/node_modules/* -T /data/txt/myInputFile.txt 2> /data/txt/myTarLogFile.txtmyInputFile.txt Contains:
/dev2/java
/dev2/javascriptnot2qubit ,Apr 4 at 3:24
I believe this require that the Bash shell option variableglobstar
has to be enabled. Check withshopt -s globstar
. I think it off by default on most unix based OS's. From Bash manual: " globstar: If set, the pattern**
used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a '/', only directories and subdirectories match. " – not2qubit Apr 4 at 3:24Benoit Duffez ,Jun 19, 2016 at 21:14
Don't forgetCOPYFILE_DISABLE=1
when using tar, otherwise you may get ._ files in your tarball – Benoit Duffez Jun 19 '16 at 21:14Scott Stensland ,Feb 12, 2015 at 20:55
This exclude pattern handles filename suffix like png or mp3 as well as directory names like .git and node_modulestar --exclude={*.png,*.mp3,*.wav,.git,node_modules} -Jcf ${target_tarball} ${source_dirname}Alex B ,Jun 11, 2009 at 23:03
Use the find command in conjunction with the tar append (-r) option. This way you can add files to an existing tar in a single step, instead of a two pass solution (create list of files, create tar).find /dir/dir -prune ... -o etc etc.... -exec tar rvf ~/tarfile.tar {} \;carlo ,Mar 4, 2012 at 15:18
To avoid possible'xargs: Argument list too long'
errors due to the use offind ... | xargs ...
when processing tens of thousands of files, you can pipe the output offind
directly totar
usingfind ... -print0 | tar --null ...
.# archive a given directory, but exclude various files & directories # specified by their full file paths find "$(pwd -P)" -type d \( -path '/path/to/dir1' -or -path '/path/to/dir2' \) -prune \ -or -not \( -path '/path/to/file1' -or -path '/path/to/file2' \) -print0 | gnutar --null --no-recursion -czf archive.tar.gz --files-from - #bsdtar --null -n -czf archive.tar.gz -T -Znik ,Mar 4, 2014 at 12:20
you can quote 'exclude' string, like this: 'somedir/filesdir/*' then shell isn't going to expand asterisks and other white chars.Tuxdude ,Nov 15, 2014 at 5:12
xargs -n 1
is another option to avoidxargs: Argument list too long
error ;) – Tuxdude Nov 15 '14 at 5:12Aaron Votre ,Jul 15, 2016 at 15:56
I agree the --exclude flag is the right approach.$ tar --exclude='./folder_or_file' --exclude='file_pattern' --exclude='fileA'A word of warning for a side effect that I did not find immediately obvious: The exclusion of 'fileA' in this example will search for 'fileA' RECURSIVELY!
Example:A directory with a single subdirectory containing a file of the same name (data.txt)
data.txt config.txt --+dirA | data.txt | config.docx
- If using
--exclude='data.txt'
the archive will not contain EITHER data.txt file. This can cause unexpected results if archiving third party libraries, such as a node_modules directory.- To avoid this issue make sure to give the entire path, like
--exclude='./dirA/data.txt'
Mike ,May 9, 2014 at 21:26
After reading this thread, I did a little testing on RHEL 5 and here are my results for tarring up the abc directory:This will exclude the directories error and logs and all files under the directories:
tar cvpzf abc.tgz abc/ --exclude='abc/error' --exclude='abc/logs'Adding a wildcard after the excluded directory will exclude the files but preserve the directories:
tar cvpzf abc.tgz --exclude='abc/error/*' --exclude='abc/logs/*' abc/camh ,Jun 12, 2009 at 5:53
You can use cpio(1) to create tar files. cpio takes the files to archive on stdin, so if you've already figured out the find command you want to use to select the files the archive, pipe it into cpio to create the tar file:find ... | cpio -o -H ustar | gzip -c > archive.tar.gzfrommelmak ,Sep 10, 2012 at 14:08
You can also use one of the "--exclude-tag" options depending on your needs:
- --exclude-tag=FILE
- --exclude-tag-all=FILE
- --exclude-tag-under=FILE
The folder hosting the specified FILE will be excluded.
Joe ,Jun 11, 2009 at 23:04
Your best bet is to use find with tar, via xargs (to handle the large number of arguments). For example:find / -print0 | xargs -0 tar cjf tarfile.tar.bz2jørgensen ,Mar 4, 2012 at 15:23
That can cause tar to be invoked multiple times - and will also pack files repeatedly. Correct is:find / -print0 | tar -T- --null --no-recursive -cjf tarfile.tar.bz2
– jørgensen Mar 4 '12 at 15:23Stphane ,Dec 19, 2015 at 11:10
I read somewhere that when usingxargs
, one should use tarr
option instead ofc
because whenfind
actually finds loads of results, the xargs will split those results (based on the local command line arguments limit) into chuncks and invoke tar on each part. This will result in a archive containing the last chunck returned byxargs
and not all results found by thefind
command. – Stphane Dec 19 '15 at 11:10Andrew ,Apr 14, 2014 at 16:21
gnu tar v 1.26 the --exclude needs to come after archive file and backup directory arguments, should have no leading or trailing slashes, and prefers no quotes (single or double). So relative to the PARENT directory to be backed up, it's:
tar cvfz /path_to/mytar.tgz ./dir_to_backup --exclude=some_path/to_exclude
Ashwini Gupta ,Jan 12 at 10:30
tar -cvzf destination_folder source_folder -X /home/folder/excludes.txt-X indicates a file which contains a list of filenames which must be excluded from the backup. For Instance, you can specify *~ in this file to not include any filenames ending with ~ in the backup.
Georgios ,Sep 4, 2013 at 22:35
Possible redundant answer but since I found it useful, here it is:While a FreeBSD root (i.e. using csh) I wanted to copy my whole root filesystem to /mnt but without /usr and (obviously) /mnt. This is what worked (I am at /):
tar --exclude ./usr --exclude ./mnt --create --file - . (cd /mnt && tar xvd -)My whole point is that it was necessary (by putting the ./ ) to specify to tar that the excluded directories where part of the greater directory being copied.
My €0.02
user2792605 ,Sep 30, 2013 at 20:07
I had no luck getting tar to exclude a 5 Gigabyte subdirectory a few levels deep. In the end, I just used the unix Zip command. It worked a lot easier for me.So for this particular example from the original post
(tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz . )The equivalent would be:
zip -r /backup/filename.zip . -x upload/folder/**\* upload/folder2/**\*
(NOTE: Here is the post I originally used that helped me https://superuser.com/questions/312301/unix-zip-directory-but-excluded-specific-subdirectories-and-everything-within-t )
t0r0X ,Sep 29, 2014 at 20:25
Beware:zip
does not pack empty directories, buttar
does! – t0r0X Sep 29 '14 at 20:25RohitPorwal ,Jul 21, 2016 at 9:56
Check it outtar cvpzf zip_folder.tgz . --exclude=./public --exclude=./tmp --exclude=./log --exclude=fileNameJames ,Oct 28, 2016 at 14:01
The following bash script should do the trick. It uses the answer given here by Marcus Sundman.#!/bin/bash echo -n "Please enter the name of the tar file you wish to create with out extension " read nam echo -n "Please enter the path to the directories to tar " read pathin echo tar -czvf $nam.tar.gz excludes=`find $pathin -iname "*.CC" -exec echo "--exclude \'{}\'" \;|xargs` echo $pathin echo tar -czvf $nam.tar.gz $excludes $pathinThis will print out the command you need and you can just copy and paste it back in. There is probably a more elegant way to provide it directly to the command line.
Just change *.CC for any other common extension, file name or regex you want to exclude and this should still work.
EDIT
Just to add a little explanation; find generates a list of files matching the chosen regex (in this case *.CC). This list is passed via xargs to the echo command. This prints --exclude 'one entry from the list'. The slashes () are escape characters for the ' marks.
tripleee ,Sep 14, 2017 at 4:27
Requiring interactive input is a poor design choice for most shell scripts. Make it read command-line parameters instead and you get the benefit of the shell's tab completion, history completion, history editing, etc. – tripleee Sep 14 '17 at 4:27tripleee ,Sep 14, 2017 at 4:38
Additionally, your script does not work for paths which contain whitespace or shell metacharacters. You should basically always put variables in double quotes unless you specifically require the shell to perform whitespace tokenization and wildcard expansion. For details, please see stackoverflow.com/questions/10067266/ – tripleee Sep 14 '17 at 4:38> ,Apr 18 at 0:31
For those who have issues with it, some versions of tar would only work properly without the './' in the exclude value.Tar --versiontar (GNU tar) 1.27.1
Command syntax that work:
tar -czvf ../allfiles-butsome.tar.gz * --exclude=acme/fooThese will not work:
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude=./acme/foo $ tar -czvf ../allfiles-butsome.tar.gz * --exclude='./acme/foo' $ tar --exclude=./acme/foo -czvf ../allfiles-butsome.tar.gz * $ tar --exclude='./acme/foo' -czvf ../allfiles-butsome.tar.gz * $ tar -czvf ../allfiles-butsome.tar.gz * --exclude=/full/path/acme/foo $ tar -czvf ../allfiles-butsome.tar.gz * --exclude='/full/path/acme/foo' $ tar --exclude=/full/path/acme/foo -czvf ../allfiles-butsome.tar.gz * $ tar --exclude='/full/path/acme/foo' -czvf ../allfiles-butsome.tar.gz *
Nov 08, 2002 | www.linuxjournal.com
Anonymous on Fri, 11/08/2002 - 03:00.
Its here .. Unbeliveable..
[I had intended to leave the discussion of "rm -r *" behind after the compendium I sent earlier, but I couldn't resist this one.
I also received a response from rutgers!seismo!hadron!jsdy (Joseph S. D. Yao) that described building a list of "dangerous" commands into a shell and dropping into a query when a glob turns up. They built it in so it couldn't be removed, like an alias. Anyway, on to the story! RWH.] I didn't see the message that opened up the discussion on rm, but thought you might like to read this sorry tale about the perils of rm....
(It was posted to net.unix some time ago, but I think our postnews didn't send it as far as it should have!)
----------------------------------------------------------------
Have you ever left your terminal logged in, only to find when you came back to it that a (supposed) friend had typed "rm -rf ~/*" and was hovering over the keyboard with threats along the lines of "lend me a fiver 'til Thursday, or I hit return"? Undoubtedly the person in question would not have had the nerve to inflict such a trauma upon you, and was doing it in jest. So you've probably never experienced the worst of such disasters....
It was a quiet Wednesday afternoon. Wednesday, 1st October, 15:15 BST, to be precise, when Peter, an office-mate of mine, leaned away from his terminal and said to me, "Mario, I'm having a little trouble sending mail." Knowing that msg was capable of confusing even the most capable of people, I sauntered over to his terminal to see what was wrong. A strange error message of the form (I forget the exact details) "cannot access /foo/bar for userid 147" had been issued by msg.
My first thought was "Who's userid 147?; the sender of the message, the destination, or what?" So I leant over to another terminal, already logged in, and typed
grep 147 /etc/passwdonly to receive the response
/etc/passwd: No such file or directory.Instantly, I guessed that something was amiss. This was confirmed when in response to
ls /etcI got
ls: not found.I suggested to Peter that it would be a good idea not to try anything for a while, and went off to find our system manager. When I arrived at his office, his door was ajar, and within ten seconds I realised what the problem was. James, our manager, was sat down, head in hands, hands between knees, as one whose world has just come to an end. Our newly-appointed system programmer, Neil, was beside him, gazing listlessly at the screen of his terminal. And at the top of the screen I spied the following lines:
# cd # rm -rf *Oh, *****, I thought. That would just about explain it.
I can't remember what happened in the succeeding minutes; my memory is just a blur. I do remember trying ls (again), ps, who and maybe a few other commands beside, all to no avail. The next thing I remember was being at my terminal again (a multi-window graphics terminal), and typing
cd / echo *I owe a debt of thanks to David Korn for making echo a built-in of his shell; needless to say, /bin, together with /bin/echo, had been deleted. What transpired in the next few minutes was that /dev, /etc and /lib had also gone in their entirety; fortunately Neil had interrupted rm while it was somewhere down below /news, and /tmp, /usr and /users were all untouched.
Meanwhile James had made for our tape cupboard and had retrieved what claimed to be a dump tape of the root filesystem, taken four weeks earlier. The pressing question was, "How do we recover the contents of the tape?". Not only had we lost /etc/restore, but all of the device entries for the tape deck had vanished. And where does mknod live?
You guessed it, /etc.
How about recovery across Ethernet of any of this from another VAX? Well, /bin/tar had gone, and thoughtfully the Berkeley people had put rcp in /bin in the 4.3 distribution. What's more, none of the Ether stuff wanted to know without /etc/hosts at least. We found a version of cpio in /usr/local, but that was unlikely to do us any good without a tape deck.
Alternatively, we could get the boot tape out and rebuild the root filesystem, but neither James nor Neil had done that before, and we weren't sure that the first thing to happen would be that the whole disk would be re-formatted, losing all our user files. (We take dumps of the user files every Thursday; by Murphy's Law this had to happen on a Wednesday).
Another solution might be to borrow a disk from another VAX, boot off that, and tidy up later, but that would have entailed calling the DEC engineer out, at the very least. We had a number of users in the final throes of writing up PhD theses and the loss of a maybe a weeks' work (not to mention the machine down time) was unthinkable.
So, what to do? The next idea was to write a program to make a device descriptor for the tape deck, but we all know where cc, as and ld live. Or maybe make skeletal entries for /etc/passwd, /etc/hosts and so on, so that /usr/bin/ftp would work. By sheer luck, I had a gnuemacs still running in one of my windows, which we could use to create passwd, etc., but the first step was to create a directory to put them in.
Of course /bin/mkdir had gone, and so had /bin/mv, so we couldn't rename /tmp to /etc. However, this looked like a reasonable line of attack.
By now we had been joined by Alasdair, our resident UNIX guru, and as luck would have it, someone who knows VAX assembler. So our plan became this: write a program in assembler which would either rename /tmp to /etc, or make /etc, assemble it on another VAX, uuencode it, type in the uuencoded file using my gnu, uudecode it (some bright spark had thought to put uudecode in /usr/bin), run it, and hey presto, it would all be plain sailing from there. By yet another miracle of good fortune, the terminal from which the damage had been done was still su'd to root (su is in /bin, remember?), so at least we stood a chance of all this working.
Off we set on our merry way, and within only an hour we had managed to concoct the dozen or so lines of assembler to create /etc. The stripped binary was only 76 bytes long, so we converted it to hex (slightly more readable than the output of uuencode), and typed it in using my editor. If any of you ever have the same problem, here's the hex for future reference:
070100002c000000000000000000000000000000000000000000000000000000 0000dd8fff010000dd8f27000000fb02ef07000000fb01ef070000000000bc8f 8800040000bc012f65746300I had a handy program around (doesn't everybody?) for converting ASCII hex to binary, and the output of /usr/bin/sum tallied with our original binary. But hang on---how do you set execute permission without /bin/chmod? A few seconds thought (which as usual, lasted a couple of minutes) suggested that we write the binary on top of an already existing binary, owned by me...problem solved.
So along we trotted to the terminal with the root login, carefully remembered to set the umask to 0 (so that I could create files in it using my gnu), and ran the binary. So now we had a /etc, writable by all.
From there it was but a few easy steps to creating passwd, hosts, services, protocols, (etc), and then ftp was willing to play ball. Then we recovered the contents of /bin across the ether (it's amazing how much you come to miss ls after just a few, short hours), and selected files from /etc. The key file was /etc/rrestore, with which we recovered /dev from the dump tape, and the rest is history.
Now, you're asking yourself (as I am), what's the moral of this story? Well, for one thing, you must always remember the immortal words, DON'T PANIC. Our initial reaction was to reboot the machine and try everything as single user, but it's unlikely it would have come up without /etc/init and /bin/sh. Rational thought saved us from this one.
The next thing to remember is that UNIX tools really can be put to unusual purposes. Even without my gnuemacs, we could have survived by using, say, /usr/bin/grep as a substitute for /bin/cat. And the final thing is, it's amazing how much of the system you can delete without it falling apart completely. Apart from the fact that nobody could login (/bin/login?), and most of the useful commands had gone, everything else seemed normal. Of course, some things can't stand life without say /etc/termcap, or /dev/kmem, or /etc/utmp, but by and large it all hangs together.
I shall leave you with this question: if you were placed in the same situation, and had the presence of mind that always comes with hindsight, could you have got out of it in a simpler or easier way?
Answers on a postage stamp to:
Mario Wolczko
------------------------------------------------------------------------
Dept. of Computer Science ARPA: miw%[email protected]
The University USENET: mcvax!ukc!man.cs.ux!miw
Manchester M13 9PL JANET: [email protected]
U.K. 061-273 7121 x 5699
Jul 20, 2017 | www.makeuseof.com
Back in college, I used to work just about every day as a computer cluster consultant. I remember a month after getting promoted to a supervisor, I was in the process of training a new consultant in the library computer cluster. Suddenly, someone tapped me on the shoulder, and when I turned around I was confronted with a frantic graduate student – a 30-something year old man who I believe was Eastern European based on his accent – who was nearly in tears."Please need help – my document is all gone and disk stuck!" he said as he frantically pointed to his PC.
Now, right off the bat I could have told you three facts about the guy. One glance at the blue screen of the archaic DOS-based version of Wordperfect told me that – like most of the other graduate students at the time – he had not yet decided to upgrade to the newer, point-and-click style word processing software. For some reason, graduate students had become so accustomed to all of the keyboard hot-keys associated with typing in a DOS-like environment that they all refused to evolve into point-and-click users.
The second fact, gathered from a quick glance at his blank document screen and the sweat on his brow told me that he had not saved his document as he worked. The last fact, based on his thick accent, was that communicating the gravity of his situation wouldn't be easy. In fact, it was made even worse by his answer to my question when I asked him when he last saved.
"I wrote 30 pages."
Calculated out at about 600 words a page, that's 18000 words. Ouch.
Then he pointed at the disk drive. The floppy disk was stuck, and from the marks on the drive he had clearly tried to get it out with something like a paper clip. By the time I had carefully fished the torn and destroyed disk out of the drive, it was clear he'd never recover anything off of it. I asked him what was on it.
"My thesis."
I gulped. I asked him if he was serious. He was. I asked him if he'd made any backups. He hadn't.
Making Backups of BackupsIf there is anything I learned during those early years of working with computers (and the people that use them), it was how critical it is to not only save important stuff, but also to save it in different places. I would back up floppy drives to those cool new zip drives as well as the local PC hard drive. Never, ever had a single copy of anything.
Unfortunately, even today, people have not learned that lesson. Whether it's at work, at home, or talking with friends, I keep hearing stories of people losing hundreds to thousands of files, sometimes they lose data worth actual dollars in time and resources that were used to develop the information.
To drive that lesson home, I wanted to share a collection of stories that I found around the Internet about some recent cases were people suffered that horrible fate – from thousands of files to entire drives worth of data completely lost. These are people where the only remaining option is to start running recovery software and praying, or in other cases paying thousands of dollars to a data recovery firm and hoping there's something to find.
Not Backing Up ProjectsThe first example comes from Yahoo Answers , where a user that only provided a "?" for a user name (out of embarrassment probably), posted:
"I lost all my files from my hard drive? help please? I did a project that took me 3 days and now i lost it, its powerpoint presentation, where can i look for it? its not there where i save it, thank you"
The folks answering immediately dove into suggesting that the person run recovery software, and one person suggested that the person run a search on the computer for *.ppt.
... ... ...
Doing Backups Wrong
Then, there's a scenario of actually trying to do a backup and doing it wrong, losing all of the files on the original drive. That was the case for the person who posted on Tech Support Forum , that after purchasing a brand new Toshiba Laptop and attempting to transfer old files from an external hard drive, inadvertently wiped the files on the hard drive.
Please someone help me I last week brought a Toshiba Satellite laptop running windows 7, to replace my blue screening Dell vista laptop. On plugged in my sumo external hard drive to copy over some much treasured photos and some of my (work – music/writing.) it said installing driver. it said completed I clicked on the hard drive and found a copy of my documents from the new laptop and nothing else.
While the description of the problem is a little broken, from the sound of it, the person thought they were backing up from one direction, while they were actually backing up in the other direction. At least in this case not all of the original files were deleted, but a majority were.
May 01, 2018 | Techdirt
Here's a random story, found via Kottke , highlighting how Pixar came very close to losing a very large portion of Toy Story 2 , because someone did an rm * (non geek: "remove all" command). And that's when they realized that their backups hadn't been working for a month. Then, the technical director of the film noted that, because she wanted to see her family and kids, she had been making copies of the entire film and transferring it to her home computer. After a careful trip from the Pixar offices to her home and back, they discovered that, indeed, most of the film was saved:
Now, mostly, this is just an amusing little anecdote, but two things struck me: How in the world do they not have more "official" backups of something as major as Toy Story 2 . In the clip they admit that it was potentially 20 to 30 man-years of work that may have been lost. It makes no sense to me that this would include a single backup system. I wonder if the copy, made by technical director Galyn Susman, was outside of corporate policy. You would have to imagine that at a place like Pixar, there were significant concerns about things "getting out," and so the policy likely wouldn't have looked all that kindly on copies being used on home computers.
The Mythbusters folks wonder if this story was a little over-dramatized , and others have wondered how the technical director would have "multiple terabytes of source material" on her home computer back in 1999. That resulted in an explanation from someone who was there that what was deleted was actually the database containing the master copies of the characters, sets, animation, etc. rather than the movie itself. Of course, once again, that makes you wonder how it is that no one else had a simple backup. You'd think such a thing would be backed up in dozens of places around the globe for safe keeping...
Hans B PUFAL ( profile ), 18 May 2012 @ 5:53amReminds me of .... Some decades ago I was called to a customer site, a bank, to diagnose a computer problem. On my arrival early in the morning I noted a certain panic in the air. On querying my hosts I was told that there had been an "issue" the previous night and that they were trying, unsuccessfully, to recover data from backup tapes. The process was failing and panic ensued.Anonymous Coward , 18 May 2012 @ 6:00amThough this was not the problem I had been called on to investigate, I asked some probing questions, made a short phone call, and provided the answer, much to the customer's relief.
What I found was that for months if not years the customer had been performing backups of indexed sequential files, that is data files with associated index files, without once verifying that the backed-up data could be recovered. On the first occasion of a problem requiring such a recovery they discovered that they just did not work.
The answer? Simply recreate the index files from the data. For efficiency reasons (this was a LONG time ago) the index files referenced the data files by physical disk addresses. When the backup tapes were restored the data was of course no longer at the original place on the disk and the index files were useless. A simple procedure to recreate the index files solved the problem.
Clearly whoever had designed that system had never tested a recovery, nor read the documentation which clearly stated the issue and its simple solution.
So here is a case of making backups, but then finding them flawed when needed.
Re: Reminds me of .... That's why, in the IT world, you ALWAYS do a "dry run" when you want to deploy something, and you monitor the heck out of critical systems.Rich Kulawiec , 18 May 2012 @ 6:30amTwo notes on backupssaulgoode ( profile ), 18 May 2012 @ 6:38am1. Everyone who has worked in computing for any period of time has their own backup horror story. I'll spare you mine, but note that as a general observation, large organizations/corporations tend to opt for incredibly expensive, incredibly complex, incredibly overblown backup "solutions" sold to them by vendors rather than using the stock, well-tested, reliable tools that they already have. (e.g., "why should we use dump, which is open-source/reliable/portable/tested/proven/efficient/etc., when we could drop $40K on closed-source/proprietary/non-portable/slow/bulky software from a vendor?"
Okay, okay, one comment: in over 30 years of working in the field, the second-worst product I have ever had the misfortune to deal with is Legato (now EMC) NetWorker.
2. Hollywood has a massive backup and archiving problem. How do we know? Because they keep telling us about it. There are a series of self-promoting commercials that they run in theaters before movies, in which they talk about all of the old films that are slowly decaying in their canisters in vast warehouses, and how terrible this is, and how badly they need charitable contributions from the public to save these treasures of cinema before they erode into dust, etc.
Let's skip the irony of Hollywood begging for money while they're paying professional liar Chris Dodd millions and get to the technical point: the easiest and cheapest way to preserve all of these would be to back them up to the Internet. Yes, there's a one-time expense of cleaning up the analog versions and then digitizing them at high resolution, but once that's done, all the copies are free. There's no need for a data center or elaborate IT infrastructure: put 'em on BitTorrent and let the world do the work. Or give copies to the Internet Archive. Whatever -- the point is that once we get past the analog issues, the only reason that this is a problem is that they
made it a problem by refusing to surrender control.Re: Two notes on backups "Real Men don't make backups. They upload it via ftp and let the world mirror it." - Linus TorvaldsAnonymous Coward , 18 May 2012 @ 7:02amWhat I suspect is that she was copying the rendered footage. If the footage was rendered at a resolution and rate fitting to DVD spec, that'd put the raw footage at around 3GB to 4GB for a full 90min, which just might fit on the 10GB HDD that were available back then on a laptop computer (remember how small OSes were back then).aldestrawk ( profile ), 18 May 2012 @ 8:34amEven losing just the rendered raw footage (or even processed footage), would be a massive setback. It takes a long time across a lot of very powerful computers to render film quality footage. If it was processed footage then it's even more valuable as that takes a lot of man hours of post fx to make raw footage presentable to a consumer audience.
a retelling by Oren Jacob Oren Jacob, the Pixar director featured in the animation, has made a comment on the Quora post that explains things in much more detail. The narration and animation was telling a story, as in storytelling. Despite the 99% true caption at the end, a lot of details were left out which misrepresented what had happened. Still, it was a fun tale for anyone who had dealt with backup problems. Oren Jacob's retelling in the comment makes it much more realistic and believable.Mason Wheeler , 18 May 2012 @ 10:01am
The terabytes level of data came from whoever posted the video on Quora. The video itself never mentions the actual amount of data lost or the total amount the raw files represent. Oren says, vaguely, that it was much less than a terabyte. There were backups! The last one was from two days previous to the delete event. The backup was flawed in that it produced files that when tested, by rendering, exhibited errors.They ended up patching a two-month old backup together with the home computer version (two weeks old). This was labor intensive as some 30k files had to be individually checked.
The moral of the story.
- Firstly, always test a restore at some point when implementing a backup system.
- Secondly, don't panic! Panic can lead to further problems. They could well have introduced corruption in files by abruptly unplugging the computer.
- Thirdly, don't panic! Despite, somehow, deleting a large set of files these can be recovered apart from a backup system.
Deleting files, under Linux as well as just about any OS, only involves deleting the directory entries. There is software which can recover those files as long as further use of the computer system doesn't end up overwriting what is now free space.
Re: a retelling by Oren Jacobaldestrawk ( profile ), 18 May 2012 @ 10:38amPanic can lead to further problems. They could well have introduced corruption in files by abruptly unplugging the computer.What's worse? Corrupting some files or deleting all files?
Re: Re: a retelling by Oren JacobDanny ( profile ), 18 May 2012 @ 10:49amIn this case they were not dealing with unknown malware that was steadily erasing the system as they watched. There was, apparently, a delete event at a single point in time that had repercussions that made things disappear while people worked on the movie.
I'll bet things disappeared when whatever editing was being done required a file to be refreshed.
A refresh operation would make the related object disappear when the underlying file was no longer available.
Apart from the set of files that had already been deleted, more files could have been corrupted when the computer was unplugged.
Having said that, this occurred in 1999 when they were probably using the Ext2 filesystem under Linux. These days most everyone uses a filesystem that includes journaling which protects against corruption that may occur when a computer loses power. Ext3 is a journaling filesystem and was introduced in 2001.
In 1998 I had to rebuild my entire home computer system. A power glitch introduced corruption in a Windows 95 system file and use of a Norton recovery tool rendered the entire disk into a handful of unusable files. It took me ten hours to rebuild the OS and re-install all the added hardware, software, and copy personal files from backup floppies. The next day I went out and bought a UPS. Nowadays, sometimes the UPS for one of my computers will fail during one of the three dozen power outages a year I get here. I no longer have problems with that because of journaling.
I've gotta story like this too Ive posted in athe past on Techdirt that I used to work for Ticketmaster. The is an interesting TM story that I don't think ever made it into the public, so I will do it now.aldestrawk ( profile ), 18 May 2012 @ 11:30amBack in the 1980s each TM city was on an independent computer system (PDP unibus systems with RM05 or CDC9766 disk drives. The drives were fixed removable boxes about the size of a washing machine, the removable disk platters about the size of the proverbial breadbox. Each platter held 256mb formatted.
Each city had itts own operations policies, but generally, the systems ran with mirrored drives, the database was backed up every night, archival copies were made monthly. In Chicago, where I worked, we did not have offsite backup in the 1980s. The Bay Area had the most interesting system for offsite backup.
The Bay Area BASS operation, bought by TM in the mid 1980s, had a deal with a taxi driver. They would make their nightly backup copies in house, and make an extra copy on a spare disk platter. Tis cabbie would come by the office about 2am each morning, and they'd put the spare disk platter in his trunk, swapping it for the previous day's copy that had been his truck for 24 hours. So, for the cost of about two platters ($700 at the time) and whatever cash they'd pay the cabbie, they had a mobile offsite copy of their database circulating the Bay Area at all times.
When the World Series earthquake hit in October 1988, the TM office in downtown Oakland was badly damaged. The only copy of the database that survived was the copy in the taxi cab.
That incident led TM corporate to establish much more sophisticated and redundant data redundancy policies.
Re: I've gotta story like this too I like that story. Not that it matters anymore, but taxi cab storage was probably a bad idea. The disks were undoubtedly the "Winchester" type and when powered down the head would be parked on a "landing strip". Still, subjecting these drives to jolts from a taxi riding over bumps in the road could damage the head or cause it to be misaligned. You would have known though it that actually turned out to be a problem. Also, I wouldn't trust a taxi driver with the company database. Although, that is probably due to an unreasonable bias towards cab drivers. I won't mention the numerous arguments with them (not in the U.S.) over fares and the one physical fight with a driver who nearly ran me down while I was walking.Huw Davies , 19 May 2012 @ 1:20amRe: Re: I've gotta story like this too RM05s are removable pack drives. The heads stay in the washing machine size unit - all you remove are the platters.That One Guy ( profile ), 18 May 2012 @ 5:00pmWhat I want to know is this... She copied bits of a movie to her home system... how hard did they have to pull in the leashes to keep Disney's lawyers from suing her to infinity and beyond after she admitted she'd done so(never mind the fact that he doing so saved them apparently years of work...)?Lance , 3 May 2014 @ 8:53am
http://thenextweb.com/media/2012/05/21/how-pixars-toy-story-2-was-deleted-twice-once-by-technology-a nd-again-for-its-own-good/
Evidently, the film data only took up 10 GB in those days. Nowhere near TB...
Nov 07, 2002 | Linux Journal
The dangers of not testing your backup procedures and some common pitfalls to avoid.
Backups. We all know the importance of making a backup of our most important systems. Unfortunately, some of us also know that realizing the importance of performing backups often is a lesson learned the hard way. Everyone has their scary backup stories. Here are mine. Scary Story #1
Like a lot of people, my professional career started out in technical support. In my case, I was part of a help-desk team for a large professional practice. Among other things, we were responsible for performing PC LAN backups for a number of systems used by other departments. For one especially important system, we acquired fancy new tape-backup equipment and a large collection of tapes. A procedure was put in place, and before-you-go-home-at-night backups became a standard. Some months later, a crash brought down the system, and all the data was lost. Shortly thereafter, a call came in for the latest backup tape. It was located and dispatched, and a recovery was attempted. The recovery failed, however, as the tape was blank . A call came in for the next-to-last backup tape. Nervously, it was located and dispatched, and a recovery was attempted. It also failed because this tape also was blank. Amid long silences and pink-slip glares, panic started to set in as the tape from three nights prior was called up. This attempt resulted in a lot of shouting.
All the tapes were then checked, and they were all blank. To add insult to injury, the problem wasn't only that the tapes were blank--they weren't even formatted! The fancy new backup equipment wasn't smart enough to realize the tapes were not formatted, so it allowed them to be used. Note: writing good data to an unformatted tape is never a good idea.
Now, don't get me wrong, the backup procedures themselves were good. The problem was that no one had ever tested the whole process--no one had ever attempted a recovery. Was it no small wonder then that each recovery failed?
For backups to work, you need to do two things: (1) define and implement a good procedure and (2) test that it works.
To this day, I can't fathom how my boss (who had overall responsibility for the backup procedures) managed not to get fired over this incident. And what happened there has always stayed with me.
A Good Solution
When it comes to doing backups on Linux systems, a number of standard tools can help avoid the problems discussed above. Marcel Gagné's excellent book (see Resources) contains a simple yet useful script that not only performs the backup but verifies that things went well. Then, after each backup, the script sends an e-mail to root detailing what occurred.
I'll run through the guts of a modified version of Marcel's script here, to show you how easy this process actually is. This bash script starts by defining the location of a log and an error file. Two mv commands then copy the previous log and error files to allow for the examination of the next-to-last backup (if required):
#! /bin/bash backup_log=/usr/local/.Backups/backup.log backup_err=/usr/local/.Backups/backup.err mv $backup_log $backup_log.old mv $backup_err $backup_err.oldWith the log and error files ready, a few echo commands append messages (note the use of >>) to each of the files. The messages include the current date and time (which is accessed using the back-ticked date command). The cd command then changes to the location of the directory to be backed up. In this example, that directory is /mnt/data, but it could be any location:
echo "Starting backup of /mnt/data: `date`." >> $backup_log echo "Errors reported for backup/verify: `date`." >> $backup_err cd /mnt/dataThe backup then starts, using the tried and true tar command. The -cvf options request the creation of a new archive (c), verbose mode (v) and the name of the file/device to backup to (f). In this example, we backup to /dev/st0, the location of an attached SCSI tape drive:
tar -cvf /dev/st0 . 2>>$backup_errAny errors produced by this command are sent to STDERR (standard error). The above command exploits this behaviour by appending anything sent to STDERR to the error file as well (using the 2>> directive).
When the backup completes, the script then rewinds the tape using the mt command, before listing the files on the tape with another tar command (the -t option lists the files in the named archive). This is a simple way of verifying the contents of the tape. As before, we append any errors reported during this tar command to the error file. Additionally, informational messages are added to the log file at appropriate times:
mt -f /dev/st0 rewind echo "Verifying this backup: `date`" >>$backup_log tar -tvf /dev/st0 2>>$backup_err echo "Backup complete: `date`" >>$backup_logTo conclude the script, we concatenate the error file to the log file (with cat ), then e-mail the log file to root (where the -s option to the mail command allows the specification of an appropriate subject line):
cat $backup_err >> $backup_log mail -s "Backup status report for /mnt/data" root < $backup_logAnd there you have it, Marcel's deceptively simple solution to performing a verified backup and e-mailing the results to an interested party. If only we'd had something similar all those years ago.
... ... ...
Jul 20, 2017 | www.tldp.org
.1.1. Backing up with ``tar'':If you decide to use ``tar'' as your backup solution, you should probably take the time to get to know the various command-line options that are available; type " man tar " for a comprehensive list. You will also need to know how to access the appropriate backup media; although all devices are treated like files in the Unix world, if you are writing to a character device such as a tape, the name of the "file" is the device name itself (eg. `` /dev/nst0 '' for a SCSI-based tape drive).
The following command will perform a backup of your entire Linux system onto the `` /archive/ '' file system, with the exception of the `` /proc/ '' pseudo-filesystem, any mounted file systems in `` /mnt/ '', the `` /archive/ '' file system (no sense backing up our backup sets!), as well as Squid's rather large cache files (which are, in my opinion, a waste of backup media and unnecessary to back up):
tar -zcvpf /archive/full-backup-`date '+%d-%B-%Y'`.tar.gz \ --directory / --exclude=mnt --exclude=proc --exclude=var/spool/squid .Don't be intimidated by the length of the command above! As we break it down into its components, you will see the beauty of this powerful utility.
The above command specifies the options `` z '' (compress; the backup data will be compressed with ``gzip''), `` c '' (create; an archive file is begin created), `` v '' (verbose; display a list of files as they get backed up), `` p '' (preserve permissions; file protection information will be "remembered" so they can be restored). The `` f '' (file) option states that the very next argument will be the name of the archive file (or device) being written. Notice how a filename which contains the current date is derived, simply by enclosing the ``date'' command between two back-quote characters. A common naming convention is to add a `` tar '' suffix for non-compressed archives, and a `` tar.gz '' suffix for compressed ones.
The `` --directory '' option tells tar to first switch to the following directory path (the `` / '' directory in this example) prior to starting the backup. The `` --exclude '' options tell tar not to bother backing up the specified directories or files. Finally, the `` . '' character tells tar that it should back up everything in the current directory.
Note: Note: It is important to realize that the options to tar are cAsE-sEnSiTiVe! In addition, most of the options can be specified as either single mneumonic characters (eg. ``f''), or by their easier-to-memorize full option names (eg. ``file''). The mneumonic representations are identified by prefixing them with a ``-'' character, while the full names are prefixed with two such characters. Again, see the "man" pages for information on using tar.
Another example, this time writing only the specified file systems (as opposed to writing them all with exceptions as demonstrated in the example above) onto a SCSI tape drive follows:
tar -cvpf /dev/nst0 --label="Backup set created on `date '+%d-%B-%Y'`." \ --directory / --exclude=var/spool/ etc home usr/local var/spoolIn the above command, notice that the `` z '' (compress) option is not used. I strongly recommend against writing compressed data to tape, because if data on a portion of the tape becomes corrupted, you will lose your entire backup set! However, archive files stored without compression have a very high recoverability for non-affected files, even if portions of the tape archive are corrupted.
Because the tape drive is a character device, it is not possible to specify an actual file name. Therefore, the file name used as an argument to tar is simply the name of the device, `` /dev/nst0 '', the first tape device on the SCSI bus.
Note: Note: The `` /dev/nst0 '' device does not rewind after the backup set is written; therefore it is possible to write multiple sets on one tape. (You may also refer to the device as `` /dev/st0 '', in which case the tape is automatically rewound after the backup set is written.)
Since we aren't able to specify a filename for the backup set, the `` --label '' option can be used to write some information about the backup set into the archive file itself.
Finally, only the files contained in the `` /etc/ '', `` /home/ '', `` /usr/local '', and `` /var/spool/ '' (with the exception of Squid's cache data files) are written to the tape.
When working with tapes, you can use the following commands to rewind, and then eject your tape:
mt -f /dev/nst0 rewind mt -f /dev/nst0 offlineTip: Tip: You will notice that leading `` / '' (slash) characters are stripped by tar when an archive file is created. This is tar's default mode of operation, and it is intended to protect you from overwriting critical files with older versions of those files, should you mistakenly recover the wrong file(s) in a restore operation. If you really dislike this behavior (remember, its a feature !) you can specify the `` --absolute-paths '' option to tar, which will preserve the leading slashes. However, I don't recommend doing so, as it is Dangerous !
Aug 26, 2010 | www.linuxjournal.com
One of the most common programs on Linux systems for packaging files is the venerable tar. tar is short for tape archive, and originally, it would archive your files to a tape device. Now, you're more likely to use a file to make your archive. To use a tarfile, use the command-line option -f . To create a new tarfile, use the command-line option -c. To extract files from a tarfile, use the option -x. You also can compress the resulting tarfile via two methods. To use bzip2, use the -j option, or for gzip, use the -z option.
Instead of using a tarfile, you can output your tarfile to stdout or input your tarfile from stdin by using a hyphen (-). With these options, you can tar up a directory and all of its subdirectories by using:
tar cf archive.tar dirThen, extract it in another directory with:
tar xf archive.tarWhen creating a tarfile, you can assign a volume name with the option -V . You can move an entire directory structure with tar by executing:
tar cf - dir1 | (cd dir2; tar xf -)You can go even farther and move an entire directory structure over the network by executing:
tar cf - dir1 | ssh remote_host "( cd /path/to/dir2; tar xf - )"GNU tar includes an option that lets you skip the cd part, -C /path/to/dest. You also can interact with tarfiles over the network by including a host part to the tarfile name. For example:
tar cvf username@remotehost:/path/to/dest/archive.tar dir1This is done by using rsh as the communication mechanism. If you want to use something else, like ssh, use the command-line option --rsh-command CMD. Sometimes, you also may need to give the path to the rmt executable on the remote host. On some hosts, it won't be in the default location /usr/sbin/rmt. So, all together, this would look like:
tar -c -v --rsh-command ssh --rmt-command /sbin/rmt ↪-f username@host:/path/to/dest/archive.tar dir1Although tar originally used to write its archive to a tape drive, it can be used to write to any device. For example, if you want to get a dump of your current filesystem to a secondary hard drive, use:
tar -cvzf /dev/hdd /Of course, you need to run the above command as root. If you are writing your tarfile to a device that is too small, you can tell tar to do a multivolume archive with the -M option. For those of you who are old enough to remember floppy disks, you can back up your home directory to a series of floppy disks by executing:
tar -cvMf /dev/fd0 $HOMEIf you are doing backups, you may want to preserve the file permissions. You can do this with the -p option. If you have symlinked files on your filesystem, you can dereference the symlinks with the -h option. This tells tar actually to dump the file that the symlink points to, not just the symlink.
Along the same lines, if you have several filesystems mounted, you can tell tar to stick to only one filesystem with the option -l. Hopefully, this gives you lots of ideas for ways to archive your files.
Feb 04, 2017 | superuser.com
linux - Undo tar file extraction mess - Super User
first try to issue
tar tf archivetar will list the contents line by line.This can be piped to
xargs
directly, but beware : do the deletion very carefully. You don't want to just rm -r everything that tar tf tells you, since it might include directories that were not empty before unpacking!You could do
tar tf archive.tar | xargs -d'\n' rm -v tar tf archive.tar | sort -r | xargs -d'\n' rmdir -vto first remove all files that were in the archive, and then the directories that are left empty.
sort -r
(glennjackman suggestedtac
instead ofsort -r
in the comments to the accepted answer, which also works sincetar
's output is regular enough) is needed to delete the deepest directories first; otherwise a case wheredir1
contains a single empty directorydir2
will leavedir1
after thermdir
pass, since it was not empty beforedir2
was removed.This will generate a lot of
rm: cannot remove `dir/': Is a directoryand
rmdir: failed to remove `dir/': Directory not empty
rmdir: failed to remove `file': Not a directoryShut this up with
2>/dev/null
if it annoys you, but I'd prefer to keep as much information on the process as possible.And don't do it until you are sure that you match the right files. And perhaps try
rm -i
to confirm everything. And have backups, eat your breakfast, brush your teeth, etc.===
List the contents of the tar file like so:
tar tzf myarchive.tarThen, delete those file names by iterating over that list:
while IFS= read -r file; do echo "$file"; done < <(tar tzf myarchive.tar.gz)This will still just list the files that would be deleted. Replace echo with rm if you're really sure these are the ones you want to remove. And maybe make a backup to be sure.
In a second pass, remove the directories that are left over:
while IFS= read -r file; do rmdir "$file"; done < <(tar tzf myarchive.tar.gz)This prevents directories with from being deleted if they already existed before.
Another nice trick by @glennjackman, which preserves the order of files, starting from the deepest ones. Again, remove echo when done.
tar tvf myarchive.tar | tac | xargs -d'\n' echo rmThis could then be followed by the normal rmdir cleanup.
Here's a possibility that will take the extracted files and move them to a subdirectory, cleaning up your main folder.#!/usr/bin/perl -w use strict ; use Getopt :: Long ; my $clean_folder = "clean" ; my $DRY_RUN ; die "Usage: $0 [--dry] [--clean=dir-name]\n" if ( ! GetOptions ( "dry!" => \$DRY_RUN , "clean=s" => \$clean_folder )); # Protect the 'clean_folder' string from shell substitution $clean_folder =~ s / '/' \\ '' / g ; # Process the "tar tv" listing and output a shell script. print "#!/bin/sh\n" if ( ! $DRY_RUN ); while (<>) { chomp ; # Strip out permissions string and the directory entry from the 'tar' list my $perms = substr ( $_ , 0 , 10 ); my $dirent = substr ( $_ , 48 ); # Drop entries that are in subdirectories next if ( $dirent =~ m :/.: ); # If we're in "dry run" mode, just list the permissions and the directory # entries. # if ( $DRY_RUN ) { print "$perms|$dirent\n" ; next ; } # Emit the shell code to clean up the folder $dirent =~ s / '/' \\ '' / g ; print "mv -i '$dirent' '$clean_folder'/.\n" ; }
Save this to the file
fix-tar.pl
and then execute it like this:$ tar tvf myarchive . tar | perl fix - tar . pl -- dry
This will confirm that your
tar
list is like mine. You should get output like:- rw - rw - r --| batch - rw - rw - r --| book - report . png - rwx ------| CaseReports . png - rw - rw - r --| caseTree . png - rw - rw - r --| tree . png drwxrwxr - x | sample /
If that looks good, then run it again like this:
$ mkdir cleanup $ tar tvf myarchive . tar | perl fix - tar . pl -- clean = cleanup > fixup . sh
The
fixup.sh
script will be the shell commands that will move the top-level files and directories into a "clean" folder (in this instance, the folder calledcleanup
). Have a peek through this script to confirm that it's all kosher. If it is, you can now clean up your mess with:$ sh fixup . sh
I prefer this kind of cleanup because it doesn't destroy anything that isn't already destroyed by being overwritten by that initial
tar xv
.Note: if that initial dry run output doesn't look right, you should be able to fiddle with the numbers in the two
substr
function calls until they look proper. The$perms
variable is used only for the dry run so really only the$dirent
substring needs to be proper.One other thing: you may need to use the
tar
option--numeric-owner
if the user names and/or group names in thetar
listing make the names start in an unpredictable column.One other thing: you may need to use the
===tar
option--numeric-owner
if the user names and/or group names in thetar
listing make the names start in an unpredictable column.That kind of (antisocial) archive is called a tar bomb because of what it does. Once one of these "explodes" on you, the solutions in the other answers are way better than what I would have suggested.
The best "solution", however, is to prevent the problem in the first place.
The easiest (laziest) way to do that is to always unpack a tar archive into an empty directory. If it includes a top level directory, then you just move that to the desired destination. If not, then just rename your working directory (the one that was empty) and move that to the desired location.
If you just want to get it right the first time, you can run tar -tvf archive-file.tar | less and it will list the contents of the archive so you can see how it is structured and then do what is necessary to extract it to the desired location to start with.
The t option also comes in handy if you want to inspect the contents of an archive just to see if it has something you're looking for in it. If it does, you can, optionally, just extract the file(s) you want.
www.unix.com
Q:A:tar -cjpf /backup /bin /etc /home /opt /root /sbin /usr /var /boot
When i include the / directory it also tar's the /lib /sys /proc /dev filesystems too (and more but these seem to be problem directories.)
Although i have never tried to restore the /sys /proc and /dev directories I have not seen anyone mention that your cant restore /lib but when i tried the server crashed and would not even start the kernel (not even in single user mode).
Can anyone let me know why this happened and provide a more comprehensive list of directories than the 4 mentioned as to what should and shouldn't be backed up and restored? Or point me to a useful site that might explain why you should or shouldn't backup each one?
There's no point in backing-up things like /proc because that's the dynamic handling of processes and memory working sets (virtual memory).However, directories like /lib, although problematic to restore on a running system, you would definitely need them in a disaster recovery situation. You would restore /lib to hard disk in single user or cd boot mode.
So you need to backup all non-process, non-memory files for the backup to be sufficient to recover. It doesn't mean, however, that you should attempt to restore them on a running (multi-user) system.
www.gnu.org
The `-C' option allows to avoid using subshells:
$ tar -C sourcedir -cf - . | tar -C targetdir -xpf -
serverfault.com
Antonio Alimba Jun 9 '14 at 13:01
How can i restore from a backup.tgz file generated from another linux server on my own server? I tried the command the following command:
tar xvpfz backup.tgz -C /The above command worked, but it replaced the existing system files which made my linux server not to work properly.
How can i restore without running into trouble?
You can use --skip-old-files command to tell tar not to overwrite existing files.
You could still run into problem with the backup files, if the software versions are different between the two servers. Some data file structure changes might have happened, and things might stop working.
A more refined backup process should be developed.
# save everything except /mnt and /proc. time tar cfpPzf $TARBALL --directory=/ --one-file-system -xattrs \ --exclude /mnt --exclude /procWhere:
- c - create a new backup archive.
- -C, --directory=DIR
- change to directory DIR
- v - verbose mode, tar will print what it's doing to the screen.
- p - preserves the permissions of the files put in the archive for restoration later.
- P creates absolute names
- z - compress the backup file with 'gzip' to make it smaller.
- f <filename> - specifies where to store the backup, backup.tar.gz is the filename used in this example. It will be stored in the current working directory, the one you set when you used the cd command.
- --xattrs
- Save the user/root xattrs to the archive
- --exclude=/example/path - The options following this model instruct tar what directories NOT to backup. We don't want to backup everything since some directories aren't very useful to include. The first exclusion rule directs tar not to back itself up, this is important to avoid errors during the operation.
- --one-file-system Available at least from tar 1.15 (RHEL 5) Do not include files on a different filesystem. If you want other filesystems, such as a /home partition, or external media mounted in /media backed up, you either need to back them up separately, or omit this flag. If you do omit this flag, you will need to add several more --exclude= arguments to avoid filesystems you do not want. These would be /proc, /sys, /mnt, /media, /run and /dev directories in root. /proc and /sys are virtual filesystems that provide windows into variables of the running kernel, so you do not want to try and backup or restore them. /dev is a tmpfs whose contents are created and deleted dynamically by udev, so you also do not want to backup or restore it. Likewise, /run is a tmpfs that holds variables about the running system that do not need backed up.
- It is important to note that these exclusions are recursive. This means that all folders located within the one excluded will be ignored as well. In the example, excluding the /media folder excludes all mounted drives and media there.
Warning: Exclude actuallyally is condired to be a simple regex expression
Extract tar.bz2/bzip archivesFiles with extension bz2 are compressed with the bzip algorithm and tar command can deal with them as well. Use the j option instead of the z option.
$ tar -xvjf archivefile.tar.bz2
2. Extract files to a specific directory or path
To extract out the files to a specific directory, specify the path using the "-C" option. Note that its a capital C.
$ tar -xvzf abc.tar.gz -C /opt/folder/However first make sure that the destination directory exists, since tar is not going to create the directory for you and will fail if it does not exist.
3. Extract a single file
To extract a single file out of an archive just add the file name after the command like this
$ tar -xz -f abc.tar.gz "./new/abc.txt"More than once file can be specified in the above command like this
$ tar -xv -f abc.tar.gz "./new/cde.txt" "./new/abc.txt"
4. Extract multiple files using wildcards
Wildcards can be used to extract out a bunch of files matching the given wildcards. For example all files with ".txt" extension.
$ tar -xv -f abc.tar.gz --wildcards "*.txt"
tardiff.tar.gz v2.1.4 (41,334 bytes)"tardiff" is a Perl script used to quickly make a tarball of changes between versions of an archive, or between pre- and post-build of an application. There are many, many other possible uses.
More complete documentation is now available here.
Some documentation for applying patches of various sorts is now available here.
Stack Overflow
tarsum is almost what you need. Take its output, run it through sort to get the ordering identical on each, and then compare the two with diff. That should get you a basic implementation going, and it would be easily enough to pull those steps into the main program by modifying the Python code to do the whole job.
if you try to expand tar file in some production directory you accidentally can overwrite and change ownership of such directories and then spend a lot of type restored status quo. It is safer to expand such tar files in /tmp first and only after that seeing the results then decide whether to copy some directories of re-expand the tar file. Now in production directory.
Sometimes directories are very similar, for example numbered directories created by some application such as task0001, task0002, ... task0256. In this case you can well perform operation on a wrong directory. For example send to tech support a tar file with the directory that instead of test data contain production run.
Viktor Balogh's HP-UX blog
This is how to tar a bunch of files and send it over network to another machine over SSH, in one turn:
# cd /etc; tar cf - passwd | ssh hp01a01.w1 "cd /root;tar xf - passwd"Note that with tar you must always use relative path, anyway the files on the target system will be extracted with fullpath and the original files will be overwritten. GNU tar also offers some options which allow the user to modify/transform the paths when files are extracted. You can find the GNU tar on HP-UX under the name gtar, you can download it from the HP-UX porting center:
# which gtar /usr/local/bin/gtarIf you have a 'tar' archive that was made with absolute paths, use 'pax' to extract it to a different directory:
# pax -r -s '|/tmp/|/opt/|' -f test.tarIf you unpack this archive with other user privileges (non-root) all uid and gid will be replaced with the uid and gid from this user. Keep that in mind, if you make backups/restore, practically always do any backup/restore with UID 0.
The use of tar with find isn't apt to work if there are lots of files. Instead use pax(1):
# find . -atime +7 | pax -w | gzip > backup.tgz
Splitting big files into pieces is a common task. Another common task is to create a tar archive, and split it into smaller chunks that can be burned onto CD/DVD. The straightforward approach is to create the archive and then use 'split.' To do this, you will need more free space on your disk. In fact, you'll need space twice the size of the created archive. To avoid this limitation, split the archive as it is being created.
To create a tar archive that splits itself on the fly use the following set of commands:
First create the archive:
tar -czf /dev/stdout $(DIRECTORY_OR_FILE_TO_COMPRESS) | split -d -b $(CHUNK_SIZE_IN_BYTES) - $(FILE_NAME_PREFIX)To extract the contents:
cat $(FILE_NAME_PREFIX)* >> /dev/stdout | tar -xzf /dev/stdinThe above shown set of commands works on the fly. You don't need additional free space for temporary files.
A few notes about this exercise:
- 'tar -L' prompts you on every chunk created. Compression can not be used with -L option. The above command is not interactive and does not prompt for anything. Compression can be used.
- The number of separate files is 100. This is because we use numerical suffixes – 'split -d.' If the specified chunk size is small you will get 'split: Output file suffixes exhausted' error. Try with bigger chunk size or with alphabetic suffixes.
- 'cat' will concatenate the files properly if they are not renamed. This is due to the fact that the sort order is retained by the appended chunk suffixes.
- Replace 'tar -z' with 'tar -j' for bzip2 compression or try your favourite compression program. Almost all 'tar' and 'split' options should be possible.
- The resulting chunk files are not valid tar archives. They can not be extracted separately. If you want such functionality use 'split-tar,' which also needs more free space.
The information provided in this article is for your information only. The origin of this information may be internal or external to Red Hat. While Red Hat attempts to verify the validity of this information before it is posted, Red Hat makes no express or implied claims to its validity.
vsego:
Isn't it easier to just omit the "f"?
tar cz $(DIRECTORY_OR_FILE_TO_COMPRESS) | split -d -b $(CHUNK_SIZE_IN_BYTES) – $(FILE_NAME_PREFIX)
cat $(FILE_NAME_PREFIX)* | tar xzAlexander Todorov:
You are right. Using /dev/stdin and /dev/stdout is to be more clear.
- Klaus Lichtenwalder says:
December 14th, 2007 at 2:49 pmJust a few nitbits… If you want to use stdin/stdout with tar, it's simply a -
e.g.: tar cf – . | (cd /elsewhere; tar xf -)cat always appends its arguments to stdout, so
cat $(prefix)* | command
is sufficient. I don't know and (honestly) don't care if gnu-tar sends its output to stdout if no f argument given, every other unix uses the default tape device (which is /dev/rmt) if no f argument given (I have to work with Solaris and AIX too…).
To backup to multiple tape use the following command (backup /home file system):
# tar -clpMzvf /dev/st0 /home
To compare tape backup, enter:
# tar -dlpMzvf /dev/st0 /home
To restore tape in case of data loss or hard disk failure:
# tar -xlpMzvf /dev/st0 /home
Where,
- d : find differences between archive and file system
- x : extract files from an archive
- l : list the contents of an archive
- p : ignore umask when extracting files
- M : create/list/extract multi-volume archive (multiple tapes)
- z : Compress backup using gzip
- v : verbosely list files processed
- f /dev/st0 : Tape device name
- /home : Backup /home file system
freshmeat.net
Changes: This release adds support for xz compression (with the --xz option) and reassigns the short option -J as a shortcut for --xz. The option -I is now a shortcut for --use-compress-program,... and the --no-recursive option works with --incremental
Changes: This release adds new options: --lzop, --no-auto-compress, and --no-null. It has compressed format recognition and VCS support (--exclude-vcs). It fixes the --null option and... fixes record size autodetection
Changes: This release has new options: -a (selects a compression algorithm basing on the suffix of the archive file name), --lzma (selects the LZMA compression algorithm), and --hard-dereference,... which dereferences hard links during archive creation and stores the files they refer to (instead of creating the usual hard link members)
Posted: 12 Jun 2002 23:47 PDT
Expires: 19 Jun 2002 23:47 PDT
Question ID: 25116What is the size constraint to "tar" in a UNIX or Linux environment?Subject: Re: UNIX Question! tar size constraint.
Answered By: philip_lynx-ga on 13 Jun 2002 01:05 PDT
Rated:Hi pwharff, The quick answer is: 2^63-1 for the archive, 68'719'476'735 (8^12-1) bytes for each file, if your environment permits that. as I understand your question, you want to know if you can produce tar files that are biger than 2 GBytes (and how big you can really make them). The answer to this question depends on a few simple parameters: 1) Does your operating system support large files? 2) What version of tar are you using? 3) What is the underlying file system? You can answer question 1) for yourself by verifying that your kernel supports 64bit file descriptors. For Linux this is the case for several years now. A quick look in /usr/include/sys/features.h will tell you, if there is any line containing 'FILE_OFFSET_BITS'. If there is, your OS very very probably has support for large files. For Solaris, just check whether 'man largefile' works, or try 'getconf -a|grep LARGEFILE'. If it works, then you have support for large files in the operating system. Again, support for large files has been there for several years. For other operating systems, try "man -k large file', and see what you get -- I'll be gladly providing help if you need to ask for clarification to this answer. Something like "cd /usr/include; grep 'FILE_OFFSET_BITS' * */*" should tell you quickly if there is standard large file support. 2) What version of tar are you using? This is important. Obviously, older tar programs won't be able to handle files or archives that are larger than 2^31-1 bytes (2.1 Gigabytes). Try running 'tar --version'. If the first line indicates you are using gnu tar, then any version newer than 1.12.64 will in principle be able to provide you with large files. Try to run this command: "strings `which tar`|grep 64", and you should see some lines saying lseek64, creat64, fopen64. If yes, your tar contains support for large files. If your tar program does not contain support for large files (most really do, but maybe you are working on a machine older than 1998?), you can download the newest gnu tar from ftp://ftp.gnu.org/pub/gnu/tar and compile it for yourself. The size of files you put into a tar archive (not the archive itself) is limited to 11 octal digits, the max. size of a single file is thus ca. 68 GBytes. 3) Given that both your operating system (and C library), and tar application support large files, the only really limiting factor is the file system that you try to create the file in. The theoretical limit for the tar archive size is 2^63-1 (9'223'372 Terabytes), but you will reach more practical limits (disk or tape size) much quicker. Also take into consideration what the file system is. DOS FAT 12 filesystems don't allow files as big as the Linux EXT2, or Sun UFS file systems. If you need more precise data (for a specific file system type, or for the OS, etc.) please do not hesitate to ask for clarification. I hope my answer is helpful to you, --philip
Linux Forums
rockytopvolsJust Joined!
Join Date: Sep 2006
Posts: 2ftp tar file size limit?
I am trying to back up my linux box to my windows box's hard drive. To do this I am using the Knoppix distro to boot my linux box. Then I am taring and ftping every file and sending it to my windows box through ftp. (I wanted to tar the files first, so I can preserve permissions) On my windows xp box I am running filezilla's ftp server, and I am transfering to an external external 320Gb NTFS formated hard drive attached to to it through usb. I don't have enough space left on my linux box to tar everything and then transfer, so I am using the following commands:ftp 192.168.1.101 21
binary
put |"tar -cvlO *.*" stuff.tarIt always stops transfering just before 2Gb (1,972,460KB), and the file should be 20Gb or so. What am I doing wrong? Is there some file size limit that I don't know of for ftp or tar? The NTFS files systems should allow bigger files from what I have read. I couldn't find any limit for filezilla. Is this the right place to ask?
Thanks Marsolin
Linux Newbie
rockytopvolsJoin Date: Aug 2006
Posts: 222 I believe NTFS has a 2GB file limitation unless you are running a storage driver with 44-bit LBA support.__________________
Chad
http://linuxappfinder.comJust Joined!
Join Date: Sep 2006
Posts: 2 Everywhere I have read the NTFS limit is in the tens of Terabytes range. I have some files that are bigger than that now.nlsteffens
tar file size limitGenerally, tar can't handle files larger than 2GB. I suggest using an alternative to tar, 'star'. A more comprehensive answer is available here:http://answers.google.com/answers/threadview?id=25116
By the looks of it, gnu tar versions newer than 1.12.64 can handle large files but I can't confirm this.
Regards,
Nick
Alex Stan
Join Date: Aug 2006sbhagatLocation: Hamilton, Ontario
I have a similar problem with big files:I have a 2.2 Gig file on a linux computer. And i mounted Shared Documents(smbfs) from a another(windows) computer. So when i try to copy it it stops at 2 GB. I even tried moving the file in apache, so i can download the file, but apache won't let me.
I can't archive it either.
Is there any way to move that file?
__________________
I like linux!If you are using smbclient, then follow the kbase article:
Regards,
Subodh Bhagat
ONLamp.com
One last thing about creating archives with
tar
:tar
was designed to back up everything in the specified directory. This means that every single file and subdirectory that exists beneath the specified directory will be backed up. It is possible to specify which files you don't want backed up using theX
switch.Let's say I want to backup everything in the
www
directory except for theapache2
andzope
subdirectories. In order to use theX
switch, I have to create a file containing the names of the files I wish to exclude. I've found that if you try to create this file using a text editor, it doesn't always work. However, If you create the file usingecho
, it does. So I'll make a file calledexclude
:echo apache2 > exclude echo zope >> excludeHere, I used the
echo
command to redirect (>
) the wordapache2
to a new file calledexclude
. I then asked it to append (>>
) the wordzope
to that same file. If I had forgotten to use two>
's, I would have overwritten the wordapache2
with the wordzope
.Now that I have a file to use with the
X
switch, I can make that backup:tar cvfX backup.tar exclude wwwThis is the first backup I've demonstrated where the order of the switches is important. I need to tell
tar
that thef
switch belongs with the wordbackup.tar
and theX
switch belongs with the wordexclude
. So if I decide to place thef
switch before theX
switch, I need to have the wordbackup.tar
before the wordexclude
.This command will also work as the right switch is still associated with the right word:
tar cvXf exclude backup.tar wwwBut this command would not work the way I want it to:
tar cvfX exclude backup.tar www tar: can't open backup.tar : No such file or directoryHere you'll note that the
X
switch toldtar
to look for a file calledbackup.tar
to tell it which files to exclude, which isn't what I meant to telltar
.Let's return to the command that did work. To test that it didn't back up the file called
apache2
, I usedgrep
to sort throughtar
's listing:tar tf backup.tar | grep apache2Since I just received my prompt back, I know my exclude file worked. It is interesting to note that since
apache2
was really a subdirectory ofwww
, all of the files in theapache2
subdirectory were also excluded from the backup. I then tested to see if thezope
subdirectory was also excluded in the backup:tar tf backup.tar | grep zope www/zope-zpt/ www/zope-zpt/Makefile www/zope-zpt/distinfo www/zope-zpt/pkg-comment <output snipped>This time I got some information back, as there were other subdirectories that started with the term "zope," but the subdirectory that was just called
zope
was excluded from the backup.Now that we know how to make backups, let's see how we can restore data from a backup. Remember from last week the difference between a relative and an absolute pathname, as this has an impact when you are restoring data. Relative pathnames are considered a good thing in a backup. Fortunately, the
tar
utility that comes with your FreeBSD system strips the leading slash, so it will always use a relative pathname -- unless you specifically overrride this default by using theP
switch.It's always a good idea to do a listing of the data in an archive before you try to restore it, especially if you receive a
tar
archive from someone else. You want to make sure that the listed files do not begin with "/
", as that indicates an absolute pathname. I'll check the first few lines in my backup:tar tf backup.tar | head www/ www/mod_trigger/ www/mod_trigger/Makefile www/mod_trigger/distinfo www/mod_trigger/pkg-comment www/mod_trigger/pkg-descr www/mod_trigger/pkg-plist www/Mosaic/ www/Mosaic/files/ www/Mosaic/files/patch-aiNone of these files begin with a "
/
", so I'll be able to restore this backup anywhere I would like. I'll practice a restore by making a directory I'll calltesting
, and then I'll restore the entire backup to that directory:mkdir testing cd testing tar xvf ~test/backup.tarYou'll note that I
cd
'ed into the directory to contain the restored files, then toldtar
to restore or extract the entirebackup.tar
file using thex
switch. Once the restore was complete, I did a listing of the testing directory:ls wwwI then did a listing of that new
www
directory and saw that I had successfully restored the entirewww
directory structure, including all of its subdirectories and files.It's also possible to just restore a specific file from the archive. Let's say I only need to restore one file from the
www/chimera
directory. First, I'll need to know the name of the file, so I'll get a listing fromtar
and usegrep
to search for the files in thechimera
subdirectory:tar tf backup.tar | grep chimera www/chimera/ www/chimera/files/ www/chimera/files/patch-aa www/chimera/scripts/ www/chimera/scripts/configure www/chimera/pkg-comment www/chimera/Makefile <snip>I'd like to just restore the file
www/chimera/Makefile
, and I'd like to restore it to the home directory of the user namedgenisis
. First, I'llcd
to the directory to which I want that file restored, and then I'll telltar
just to restore that one file:cd ~genisis tar xvf ~test/backup.tar www/chimera/MakefileYou'll note some interesting things if you try this at home. When I did a listing of
genisis
' home directory, I didn't see a file calledMakefile
, but I did see a directory calledwww
. This directory contained a subdirectory calledchimera
, which contained a file calledMakefile
. Remember, when you make an archive, you are including a directory structure, and when you restore from an archive, you recreate that directory structure.You'll also note that the original ownesship, permissions, and file creation time were also restored with that file:
ls -l ~genisis/www/chimera/Makefile -rw-r--r-- 1 test wheel 406 May 11 09:52 www/chimera/MakefileThat should get you started with using the
tar
utility. In next week's article, I'll continue with some of the interesting options that can be used withtar
, and then I'll introduce thecpio
archiver.
The tar program is an archiving program designed to store and extract files from an archive file known as a tarball. A tarball may be made on a tape drive; however, it is also common to write a tarball to a normal file.
If you want to know more options about tar click here
Making backups with tar
A full backup can easily be made with tar:
# tar --create --file /dev/ftape /usr/src
tar: Removing leading / from absolute path names in the archiveThe example above uses the GNU version of tar and its long option names. The traditional version of tar only understands single character options. The GNU version can also handle backups that don't fit on one tape or floppy, and also very long paths; not all traditional versions can do these things. (Linux only uses GNU tar.)
If your backup doesn't fit on one tape, you need to use the --multi-volume (-M) option:
# tar -cMf /dev/fd0H1440 /usr/src
tar: Removing leading / from absolute path names in the archive Prepare volume #2 for /dev/fd0H1440 and hit return:Note that you should format the floppies before you begin the backup, or else use another window or virtual terminal and do it when tar asks for a new floppy.
After you've made a backup, you should check that it is OK, using the --compare (-d) option:
# tar --compare --verbose -f /dev/ftape
usr/src/
usr/src/linux
usr/src/linux-1.2.10-includes/Failing to check a backup means that you will not notice that your backups aren't working until after you've lost the original data.
An incremental backup can be done with tar using the --newer
# tar --create --newer '8 Sep 1995' --file /dev/ftape /usr/src --verbose
tar: Removing leading / from absolute path names in the archive
usr/src/
usr/src/linux-1.2.10-includes/
usr/src/linux-1.2.10-includes/include/
usr/src/linux-1.2.10-includes/include/linux/
usr/src/linux-1.2.10-includes/include/linux/modules/
usr/src/linux-1.2.10-includes/include/asm-generic/
usr/src/linux-1.2.10-includes/include/asm-i386/
usr/src/linux-1.2.10-includes/include/asm-mips/
usr/src/linux-1.2.10-includes/include/asm-alpha/
usr/src/linux-1.2.10-includes/include/asm-m68k/
usr/src/linux-1.2.10-includes/include/asm-sparc/
usr/src/patch-1.2.11.gzUnfortunately, tar can't notice when a file's inode information has changed, for example, that its permission bits have been changed, or when its name has been changed. This can be worked around using find and comparing current filesystem state with lists of files that have been previously backed up. Scripts and programs for doing this can be found on Linux ftp sites.
12.4.2. Restoring files with tar
# tar --extract --same-permissions --verbose --file /dev/fd0H1440
The --extract (-x) option for tar extracts files:
usr/src/
usr/src/linux
usr/src/linux-1.2.10-includes/
usr/src/linux-1.2.10-includes/include/
usr/src/linux-1.2.10-includes/include/linux/
usr/src/linux-1.2.10-includes/include/linux/hdreg.h
usr/src/linux-1.2.10-includes/include/linux/kernel.hYou also extract only specific files or directories (which includes all their files and subdirectories) by naming on the command line:
# tar xpvf /dev/fd0H1440
usr/src/linux-1.2.10-includes/include/linux/hdreg.h
usr/src/linux-1.2.10-includes/include/linux/hdreg.h# tar --list --file /dev/fd0H1440
Use the --list (-t) option, if you just want to see what files are on a backup volume:
usr/src/
usr/src/linux
usr/src/linux-1.2.10-includes/
usr/src/linux-1.2.10-includes/include/
usr/src/linux-1.2.10-includes/include/linux/
usr/src/linux-1.2.10-includes/include/linux/hdreg.h
usr/src/linux-1.2.10-includes/include/linux/kernel.hNote that tar always reads the backup volume sequentially, so for large volumes it is rather slow. It is not possible, however, to use random access database techniques when using a tape drive or some other sequential medium.
tar doesn't handle How-to Using Tar (Taring)
By SuperHornet from http://www.fluidgravity.com/ (http://www.fluidgravity.com/)
Ok well here is a short listing on how to use the command tar to backup you data..
Tar is solely an archiving app. Tar by its self wont compress files.But you say "then what is a .tar.gz"
It's a tar file that has been compressed with a different compression utility. The .gz=gzip is the compression app use to compress it.
Here is tar in its simplest form
tar -cvf filename.tar /path/to/files
- -c means create
- -f means filename (-f should always be last when you using syntax)
- -v Verbose will display all the files its puts in the tar and error you might have incurred
You should see the filename.tar file in what ever directory you ran tar from.
You say "But I want to make the tarball compressed"
Well then -z is the option you want to include in your syntax
tar -zvcf filename.tar.gz /path/to/files
#notice I had to add the .gz extension.
-Z( no not -z) will run it thru the old compress app.Now when I make a tarball I like to keep all the path's from which the file is in.
For this use the -P (absolute path)tar -zPvcf filename.tar.gz /path/to/fileWhen I extract it I will see a new directory called /path
and under that I will see the "to" directory, and the "file" is under "to"Now you say "I want to backup ALL my files in my home directory EXCEPT the temp directory I use". No problem.
tar -zPvcf myhomebackup.tar.gz --exclude /home/erik/temp /home/erik
The --exclude will give you this option, just slip it in between the tar filename and the path your going to backup. This will exclude the whole temp directory.You say "Ok this tar thing is pretty cool but I want to backup only single files from all around the drive.
No problem, this requires a bit more work, but hey this is UNIX, get used to it.
Make a file called locations (call it anything you like). In locations place the full path to each file you want to backup on a new line. Please be aware that you have to have read rights to the files you are going to backup.
/etc/mail/sendmail.cf
/usr/local/apache/conf/httpd.conf
/home/erik/scripts
Now with the -T option I can tell it to use the locations file.tar -zPvcf backup.tar.gz -T locations
Now if you want to backup the whole drive. Then you will have to exclude lots of files like /var/log/* and /usr/local/named/*
Using the -X option you can create an exclude file just like the locations file.
tar -zPvcf fullbackup.tar.gz -X /path/to/excludefile -T /path/to/locationsfile
Now a month has gone by and you need to update your myhomebackup.tar.gz with new or changed files.
This requires a extra step (quit your bitching I already told you why)
You have to uncompress it first but not untar it.gunzip /path/to/myhomebackup.tar.gz
This will leave your myhomebackup.tar.gz mising the .gz.
Now we can update your tarball with -u and then we are going to compress it again.tar -Puvf myhomebackup.tar /home/erik | gzip mybackup.tar
It will add the .gz for you.Tar is a pretty old app and has lots of Fetchers.
I suggest reading the man pages to get a lits of all the options.
I have included a little perl script that I made so I can run it as cron job evernight and get a full backup each time.
It wouldn't be that hard to update the tarball but I just like full backups.
Feel free to use it.If you want to extract the tarball that is compressed
tar -zxvf filename.tar.gz
-x extractIf it is not compressed then
tar -xvf filename.tar
#!/usr/bin/perl
#sysbkup.pl
#Created by Erik Mathis [email protected] 7/02#Change These paths to fix your needs.
my $filename="/home/sysbkup/backup";
my $exclude="/home/erik/exclude";
my $data="/home/erik/locations";
my $tar="\.tar";
my $gz="\.gz";
$file=$filename.$tar.$gz;system ("tar -Pzcvf $file -X $exclude -T $data");
Sort of answered my own question. I downloaded and install star:
http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/star.html
an enhanced version of tar that includes a name-modification option:
-s replstr
Modify file or archive member names named by a pattern
according to the substitution expression replstr. The
format of replstr is:-s /old/new/[gp]
eks
- To: LUCI List <[email protected]>
- Subject: Re: Tar question -- star is recommended until GNU Tar 1.14 is
- From: "Bryan J. Smith" <[email protected]>
- Date: Sat, 07 Aug 2004 01:24:33 -0400
- In-Reply-To: <Pine.LNX.4.44.0408052156230.16610-100000@demeter.museum.state.il.us>
- Organization: Linux Users of Central Illinois
- References: <Pine.LNX.4.44.0408052156230.16610-100000@demeter.museum.state.il.us>
- Reply-To: [email protected]
- Sender: [email protected]
On Thu, 2004-08-05 at 22:58, Erich Schroeder wrote: > Sort of answered my own question. I downloaded and install star: > http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/star.html > an enhanced version of tar It's not really an "enhanced version of tar" but a more _POSIX_ compliant version. That's why it has been a part of Fedora Core (FC) since version 0.8* and _recommended_over_ GNU Tar 1.13. Understand that cpio, tar and, their new replacement, pax, just write what is known as "ustar" format. The latest IEEE POSIX 2001 and X/Open Single Unix Specification (SUS) 3 from the "Austin Group" defines a lot of new functionality that really makes up for lack of capability in the older 1988 and subsequent releases until the late '90s drafts. This includes overcoming POSIX 1988+ path/naming limitations, as well as newer POSIX 2001 capabilities like storing POSIX EA/ACLs. In the meanwhile, the GNU maintainers decided to release their own extensions that are not compliant. It was a necessary evil, but now that the POSIX/SUS standard has been updated, it's time for GNU to come around. The current GNU Tar 1.14 alpha adds these capabilities. star actually had EA/ACLs support on Solaris** _before_ the POSIX standardization, so adopting it for POSIX 2001 / SUS 3 ustar meta-data format was easy. Unfortunately POSIX 2001 / SUS 3 still does _not_ address the issue of compression. I hate the idea of block compressing the entire archive, which renders it largely unrecoverable after a single byte error (at least with LZ77/gzip or LZO/lzop -- BWT/bzip2 may be better at recovery though). That's my "beef" with the whole ustar format in general. I would have really liked a flexible per-file compression meta-data tag in the standard. Until then, we have aging cpio replacements like afio. -- Bryan *NOTE: This is the actual "disttag" versioning (i.e., technical reasons) for pre-Fedora Core "community Linux" releases from Red Hat that are now recommended for Fedora Legacy support (i.e., FC 0.8 is fka "RHL" 8), in addition to any relevant trademark (i.e., non-technical) considerations. **NOTE: legacy star used Sun's tar approach -- an ACL "attribute file" preceding the data file, but using the same name. That way if the tar program extracting it was "Sun EA/ACL aware," it would read it, but if not, it would just overwrite the attribute file with the actual when extracted. Quite ingenious of an approach. -- Engineers scoff at me because I have IT certifications IT Pros scoff at me because I am a degreed engineer I see and understand both of their viewpoints Unfortunately Engineers and IT Pros only see in me what they dislike about the other trade ------------------------------------------------------ Bryan J. Smith [email protected]
Martin Maney maney at pobox.com
Sun Jun 8 21:25:09 CDT 2003
- Previous message: [LUNI] Incremental Tar
- Next message: [LUNI] Incremental Tar
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, Jun 08, 2003 at 05:31:10PM -0500, Patrick R. White wrote: > So isn't this a good reason to use the dump/restore utilities to begin > with?
Maybe, but dump/restore is no panacea. Back in 1991, at LISA V, Elizabeth Zwicky of SRI presented a fascinating paper comparing the performance and problems of then-extant version of tar, cpio, pax, afio, as well as dump.
dump did well on most of the tests, but by its design it is capable of really frighetening errors if the filesystem is not quiesced (in practice, unmounted or mounted r/o would appear to be necessary) during dump. Also worth noting is that dump is quite filesystem-specific, and I seem to recall hearing that the ext2 version was interestingly broken a while ago. Since I don't employ dump, I can't tell you any more than that, sorry.
The only link I can find to the paper is from here:
http://www.phys.washington.edu/~belonis/
There's a postscript file and jpegs of the printed document. I thought there used to be something less cumbersome, but Google isn't finding it for me. It did find a good number of now-dead links, though. :-(
Ah, "zwicky backup torture" is a better search key. Still mostly passing mentions of this seminal work. Here's a more recent survey paper about *nix backup techniques:
http://citeseer.nj.nec.com/chervenak98protecting.html
Here's another useful compendium that seems to be currently maintained
http://www.cybertiggyr.com/gene/htdocs/unix_tape/unix_tape.html
OTOH, "cpio: Some Linux folks appear to use this" seems... odd.
Ah, google-diving!
--
A delicate balance is necessary between sticking with the things you know and can rely upon, and exploring things which have the potential to be better. Assuming that either of these strategies is the one true way is silly. -- Graydon Hoare
>From: Paul Eggert <[email protected]> >> From: Joey Hess <[email protected]> >> Date: Mon, 25 Mar 2002 14:57:20 -0500 >> >> According to the test suite documentation, POSIX 10.1.1-12(A) says >> that Fields mode, uid, gid, size, mtime, chksum, devmajor and >> devminor are leading zero-filled octal numbers in ASCII and are >> terminated by one or more space or null characters. >OK, I'll change the behavior of GNU "tar" in a future release. I am not sure what the text from Joey Hess should be related to... ... his mail did not reach this group. >From looking at the archives created by GNUtar, I see the following deviations: - Checksum field repeats a bug found in ancient TAR implementaions. This seems to be a rudiment from early tests done by John Gilmore in PD tar where he did try to run "cmp" on PD-tar vs. Sun-tar archives. This is a minor deviation and easy to fix. - The devmajor/devminor fields are missing if the file is not a block/char device - here we see non left zero filled fields. A minor deviation that is easy to fix. - The Magic Version field contains spaces instead of "00". This is just a proof that GNUtar is not POSIX.1-1990 compliant and should not be changed before GNUtar has been validated to create POSIX.1 compliant archives. ... >conformance by running the "tar" command. A POSIX test suite should >invoke the "pax" command instead. While this is the correct answer from theory, you should take into account that "pax" has not been accepted by a major number of people in the community. AFAIK, LSB intends to be UNIX-98 compliant, so it would make sense to support cpio/pax/tar in a way compliant to the SUSv2 document. Let me comment on the current Linux status. We have: - GNUcpio which is neither POSIX.1-1990 compliant nor does it allow to archive files >= 2 GB. For a list of problems look into: ftp://ftp.fokus.gmd.de/pub/unix/star/README.otherbugs - GNUtar which is not POSIX compliant too but supports files >= 2 GB. Problems with archive exchange with POSIX compliant platforms: - does not handle long filenames in a POSIX compliant way. This has become better with recent alpha releases, but gnutar -tvf archive still does not work at all. Archives containing long filenames and created with gtar cannot be read by POSIX (only) tar implementations correctly. - Is for unknown reason unable to list archives created with other TAR implementations (e.g. Sun's tar on Solaris or star). For an example look into: ftp://ftp.fokus.gmd.de/pub/unix/star/testscripts/README.gtarfail - Pax (the version fixed by Thorsten Kukuk) is POSIX.1-1990 compliant but it is not able to handle files >= 2 GB. as part of commercial Linux distributions. From a standpoint of what people might like to see, this could be better. A year 2002 POSIX OS should include at least one program that creates POSIX compliant tar archives _and_ supports large files. People who get and compile software themselves may also use "star" which is POSIX.1-1990 andd POSIX.1-2001 compliant and supports files >= 2 GB. So why is star missing from Linux distributions? >Also, I should mention that GNU tar does not generate POSIX-format >ustar archives, nor does it claim to. Volunteers to fix this >deficiency would be welcome, but that's a different topic. It is a >quality-of-implementation issue, and is not strictly a >POSIX-conformance issue. There is "star" which is POSIX compliant. A good idea would be to move gnutar to /bin/gtar on Linux and put star on /bin/star and /bin/tar. This way, Linux gets a POSIX compliant TAR and users of gnutar will be granted to have 100% backward compatibility when calling "gtar". ftp://ftp.fokus.gmd.de/pub/unix/star/aplha/ If you don't like to do the transition too fast, here is an idea for an intermediate step: Put star on /bin/star, install the star man page for "star" and "tar" and move the GNUtar man page to "gtar". /*--------------------------------------------------------------------------*/ Another topic: >From a discussion at CeBIT, I am now aware of the fact that LSB did "standardise" on the GNUtar options at: http://www.linuxbase.org/spec/gLSB/gLSB/tar.html Let me comment on this too: It seems to be a bad idea to standardize TAR options that are incompatible with POSIX standards. So let me first introduce a list of incompatible options found in GNUtar. The complete list is in: ftp://ftp.fokus.gmd.de/pub/unix/star/aplha/STARvsGNUTAR /*--------------------------------------------------------------------------*/ Gnu tar options that (in the single char variant) are incompatible: BsS -F, --info-script=FILE run script at end of each tape (implies -M) s -L, --tape-length=NUM change tape after writing NUM x 1024 bytes s -M, --multi-volume create/list/extract multi-volume archive s -O, --to-stdout extract files to standard output sS (+) -P, --absolute-names don't strip leading `/'s from file names s -S, --sparse handle sparse files efficiently s -T, -I, --files-from=NAME get names to extract or create from file NAME s -U, --unlink-first remove each file prior to extracting over it s -V, --label=NAME create archive with volume name NAME s -d, --diff, --compare find differences between archive and file system sP -l, --one-file-system stay in local file system when creating archive sP -o, --old-archive, --portability write a V7 format archive B Incompatible with BSD tar s Incompatible with star S Incompatible with Sun's/SVr4 tar P Incompatible with POSIX +) This option is the only option where star deviates from other tar implementations, but as there is no other nice way to have an option to specify that the last record should be partial and the star option -/ is easy to remember as well as -P for Partial record is I see no need to change star. /*--------------------------------------------------------------------------*/ Please note that all these incompatibilities are "against" other TAR implementations that are much older than GNUtar. As as example, let me use the -M (do not cross mount points) option in star which is available since 1985. It looks inapropriate to me to include single char options from GNUtar that are not found in other tar implementations into something like LSB. To avoid LSB systems to break POSIX.1-1990 and SVSv2, I would recommend to change http://www.linuxbase.org/spec/gLSB/gLSB/tar.html so that the following single char options will disappear (oder is the order from the web page): -A This option has low importance and there is no need to have a single char option for it. -d (*) Use by star with different semantic, the short option should not 1be in the LSB standard. -F (*) Used with a different semantic by BSD tar for a long time the short option should not be in the LSB standard. -G The short option should not be in the LSB standard. -g The short option should not be in the LSB standard. -K The short option should not be in the LSB standard. -l This option violates the POSIX/SUSv2 semantics, it needs to be removed from the LSB standard. -L (*) The short option should not be in the LSB standard. -M (*) The short option should not be in the LSB standard. -N The short option should not be in the LSB standard. -o This option violates the POSIX/SUSv2 semantics, it needs to be removed from the LSB standard. -O (*) The short option should not be in the LSB standard. -P (*) The short option should not be in the LSB standard. -R The short option should not be in the LSB standard. -s The short option should not be in the LSB standard. -S (*) The short option should not be in the LSB standard. -T (*) The short option should not be in the LSB standard. -V (*) The short option should not be in the LSB standard. -W The short option should not be in the LSB standard. *) Used by one or more other TAR implementations with different semantics so defining it in LSB creates problems. Jörg EMail:[email protected] (home) Jörg Schilling D-13353 Berlin [email protected] (uni) If you don't have iso-8859-1 [email protected] (work) chars I am J"org Schilling URL: http://www.fokus.gmd.de/usr/schilling ftp://ftp.fokus.gmd.de/pub/unix -- To UNSUBSCRIBE, email to [email protected] with subject of "unsubscribe". Trouble? Email [email protected]
Google matched content |
[Jun 23, 2019] Utilizing multi core for tar+gzip-bzip compression-decompression Published on Jun 23, 2019 | stackoverflow.com
[Apr 27, 2018] Shell command to tar directory excluding certain files-folders Published on Apr 27, 2018 | stackoverflow.com
[Feb 04, 2017] How do I fix mess created by accidentally untarred files in the current dir, aka tar bomb Published on Feb 04, 2017 | superuser.com
...
Full system backup with tar - ArchWiki
Mounting archives with FUSE and archivemount by Ben Martin Linux.com, April 16, 2008
How to properly backup your system using TAR
Solaris 9 tar manpage - Solaris tar understands ACLs, but GNU tar don't
(gnu)tar - GNU version of tar archiving utility
Linux and Solaris ACLs - Backup
The Star tape archiver by Jörg Schilling, available at ftp://ftp.berlios.de/pub/star/, since version 1.4a07 supports backing up and restoring of POSIX Access Control Lists. For best results, it is recommended to use a recent star-1.5 version. Star is compatible with SUSv2 tar (UNIX-98 tar), understands the GNU tar archive extensions, and can generate pax archives.
- -A, --catenate, --concatenate
- append tar files to an archive
- -c, --create
- create a new archive
- -d, --diff, --compare
- find differences between archive and file system
- -r, --append
- append files to the end of an archive
- -t, --list
- list the contents of an archive
- -u, --update
- only append files that are newer than the existing in archive
- -x, --extract, --get
- extract files from an archive
- --delete
- delete from the archive (not for use on mag tapes!)
- --atime-preserve
- don't change access times on dumped files
- -b, --blocking-factor N
- block size of Nx512 bytes (default N=20)
- -B, --read-full-blocks
- reblock as we read (for reading 4.2BSD pipes)
- --backup BACKUP-TYPE
- backup files instead of deleting them using BACKUP-TYPE simple or numbered
- --block-compress
- block the output of compression program for tapes
- -C, --directory DIR
- change to directory DIR
- --check-links
- warn if number of hard links to the file on the filesystem mismatch the number of links recorded in the archive
- --checkpoint
- print directory names while reading the archive
- -f, --file [HOSTNAME:]F
- use archive file or device F (default "-", meaning stdin/stdout)
- -F, --info-script F --new-volume-script F
- run script at end of each tape (implies --multi-volume)
- --force-local
- archive file is local even if has a colon
- --format FORMAT
- selects output archive format
v7 - Unix V7 oldgnu - GNU tar <=1.12 gnu - GNU tar 1.13 ustar - POSIX.1-1988 posix - POSIX.1-2001- -g, --listed-incremental F
- create/list/extract new GNU-format incremental backup
- -G, --incremental
- create/list/extract old GNU-format incremental backup
- -h, --dereference
- don't dump symlinks; dump the files they point to
- --help
- like this manpage, but not as cool
- -i, --ignore-zeros
- ignore blocks of zeros in archive (normally mean EOF)
- --ignore-case
- ignore case when excluding files
- --ignore-failed-read
- don't exit with non-zero status on unreadable files
- --index-file FILE
- send verbose output to FILE instead of stdout
- -j, --bzip2
- filter archive through bzip2, use to decompress .bz2 files
- -k, --keep-old-files
- keep existing files; don't overwrite them from archive
- -K, --starting-file F
- begin at file F in the archive
- --keep-newer-files
- do not overwrite files which are newer than the archive
- -l, --one-file-system
- stay in local file system when creating an archive
- -L, --tape-length N
- change tapes after writing N*1024 bytes
- -m, --touch, --modification-time
- don't extract file modified time
- -M, --multi-volume
- create/list/extract multi-volume archive
- --mode PERMISSIONS
- apply PERMISSIONS while adding files (see chmod(1))
- -N, --after-date DATE, --newer DATE
- only store files newer than DATE
- --newer-mtime DATE
- like --newer, but with a DATE
- --no-anchored
- match any subsequenceof the name's components with --exclude
- --no-ignore-case
- use case-sensitive matching with --exclude
- --no-recursion
- don't recurse into directories
- --no-same-permissions
- apply user's umask when extracting files instead of recorded permissions
- --no-wildcards
- don't use wildcards with --exclude
- --no-wildcards-match-slash
- wildcards do not match slashes (/) with --exclude
- --null
- --files-from reads null-terminated names, disable --directory
- --numeric-owner
- always use numbers for user/group names
- -o, --old-archive, --portability
- like --format=v7; -o exhibits this behavior when creating an archive (deprecated behavior)
- -o, --no-same-owner
- do not attempt to restore ownesship when extracting; -o exhibits this behavior when extracting an archive
- -O, --to-stdout
- extract files to standard output
- --occurrence NUM
- process only NUM occurrences of each named file; used with --delete, --diff, --extract, or --list
- --overwrite
- overwrite existing files and directory metadata when extracting
- --overwrite-dir
- overwrite directory metadata when extracting
- --owner USER
- change owner of extraced files to USER
- -p, --same-permissions, --preserve-permissions
- extract all protection information
- -P, --absolute-names
- don't strip leading '/'s from file names
- --pax-option KEYWORD-LIST
- used only with POSIX.1-2001 archives to modify the way tar handles extended header keywords
- --posix
- like --format=posix
- --preserve
- like --preserve-permissions --same-order
- --acls
- this option causes tar to store each file's ACLs in the archive.
- --selinux
- this option causes tar to store each file's SELinux security context information in the archive.
- --xattrs
- this option causes tar to store each file's extended attributes in the archive. This option also enables --acls and--selinux if they haven't been set already, due to the fact that the data for those are stored in special xattrs.
- --no-acls
- This option causes tar not to store each file's ACLs in the archive and not to extract any ACL information in an archive.
- --no-selinux
- this option causes tar not to store each file's SELinux security context information in the archive and not to extract any SELinux information in an archive.
- --no-xattrs
- this option causes tar not to store each file's extended attributes in the archive and not to extract any extended attributes in an archive. This option also enables --no-acls and --no-selinux if they haven't been set already.
- -R, --record-number
- show record number within archive with each message
- --record-size SIZE
- use SIZE bytes per record when accessing archives
- --recursion
- recurse into directories
- --recursive-unlink
- remove existing directories before extracting directories of the same name
- --remove-files
- remove files after adding them to the archive
- --rmt-command CMD
- use CMD instead of the default /usr/sbin/rmt
- --ssh-command CMD
- use remote CMD instead of ssh(1)
- -s, --same-order, --preserve-order
- list of names to extract is sorted to match archive
- -S, --sparse
- handle sparse files efficiently
- --same-owner
- create extracted files with the same ownesship
- --show-defaults
- display the default options used by tar
- --show-omitted-dirs
- print directories tar skips while operating on an archive
- --strip-components NUMBER, --strip-path NUMBER
- strip NUMBER of leading
components from file names before extraction
(1) tar-1.14 uses --strip-path, tar-1.14.90+ uses --strip-components
- --suffix SUFFIX
- use SUFFIX instead of default '~' when backing up files
- -T, --files-from F
- get names to extract or create from file F
- --totals
- print total bytes written with --create
- -U, --unlink-first
- remove existing files before extracting files of the same name
- --use-compress-program PROG
- access the archive through PROG which is generally a compression program
- --utc
- display file modification dates in UTC
- -v, --verbose
- verbosely list files processed
- -V, --label NAME
- create archive with volume name NAME
- --version
- print tar program version number
- --volno-file F
- keep track of which volume of a multi-volume archive its working in FILE; used with --multi-volume
- -w, --interactive, --confirmation
- ask for confirmation for every action
- -W, --verify
- attempt to verify the archive after writing it
- --wildcards
- use wildcards with --exclude
- --wildcards-match-slash
- wildcards match slashes (/) with --exclude
- --exclude PATTERN
- exclude files based upon PATTERN
- -X, --exclude-from FILE
- exclude files listed in FILE
- -Z, --compress, --uncompress
- filter the archive through compress
- -z, --gzip, --gunzip, --ungzip
- filter the archive through gzip
- --use-compress-program PROG
- filter the archive through PROG (which must accept -d)
- -[0-7][lmh]
- specify drive and density
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: August 15, 2021