Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better

Scriptorama: A Slightly Skeptical View
on Scripting Languages

News Introduction Recommended Books Recommended Links Programming Languages Design Recommended Papers Scripting Languages for Java vs. Pure Java
Software Engineering Anti-OO John Ousterhout Larry Wall Shell Giants Software Prototyping Software Life Cycle Models
Shells AWK Perl Perl Warts and Quirks Python PHP Javascript
Ruby Tcl/Tk R programming language Rexx Lua S-lang JVM-based scripting languages
Pipes Regex Program understanding Beautifiers and Pretty Printers Neatbash -- a simple bash prettyprinter Neatperl -- a simple Perl prettyprinter  
Brooks law Conway Law KISS Principle Featuritis Software Prototyping Unix Component Model  
Programming as a Profession Programming style Language design and programming quotes History Humor Random Findings Etc

This is the central page of the Softpanorama WEB site because I am strongly convinced that the development of scripting languages, not the replication of the efforts of BSD group undertaken by Stallman and Torvalds is the central part of open source. See Scripting languages as VHLL for more details.

 
Ordinarily technology changes fast. But programming languages are different: programming languages are not just technology, but what programmers think in.

They're half technology and half religion. And so the median language, meaning whatever language the median programmer uses, moves as slow as an iceberg.

Paul Graham: Beating the Averages

Libraries are more important that the language.

Donald Knuth


Introduction

A fruitful way to think about language development is to consider it a to be special type of theory building. Peter Naur suggested that programming in general is theory building activity in his 1985 paper "Programming as Theory Building". But idea is especially applicable to compilers and interpreters. What Peter Naur failed to understand was that design of programming languages has religious overtones and sometimes represent an activity, which is pretty close to the process of creating a new, obscure cult ;-). Clueless academics publishing junk papers at obscure conferences are high priests of the church of programming languages. Some, like Niklaus Wirth and Edsger W. Dijkstra, (temporary) reached the status close to (false) prophets :-).

On a deep conceptual level building of a new language is a human way of solving complex problems. That means that complier construction in probably the most underappreciated paradigm of programming of large systems. Much more so then greatly oversold object-oriented programming. OO benefits are greatly overstated.

For users, programming languages distinctly have religious aspects, so decisions about what language to use are often far from being rational and are mainly cultural.  Indoctrination at the university plays a very important role. Recently they were instrumental in making Java a new Cobol.

The second important observation about programming languages is that language per se is just a tiny part of what can be called language programming environment. The latter includes libraries, IDE, books, level of adoption at universities,  popular, important applications written in the language, level of support and key players that support the language on major platforms such as Windows and Linux and other similar things.

A mediocre language with good programming environment can give a run for the money to similar superior in design languages that are just naked.  This is  a story behind success of  Java and PHP. Critical application is also very important and this is a story of success of PHP which is nothing but a bastardatized derivative of Perl (with all the most interesting Perl features surgically removed ;-) adapted to creation of dynamic web sites using so called LAMP stack.

Progress in programming languages has been very uneven and contain several setbacks. Currently this progress is mainly limited to development of so called scripting languages.  Traditional high level languages field is stagnant for many decades.  From 2000 to 2017 we observed the huge sucess of Javascript; Python encroached in Perl territory (including genomics/bioinformatics) and R in turn start squeezing Python in several areas. At the same time Ruby despite initial success remained niche language.  PHP still holds its own in web-site design.

Some observations about scripting language design and  usage

At the same time there are some mysterious, unanswered question about factors that help the particular scripting language to increase its user base, or fail in popularity. Among them:

Nothing succeed like success

Those are difficult questions to answer without some way of classifying languages into different categories. Several such classifications exists. First of all like with natural languages, the number of people who speak a given language is a tremendous force that can overcome any real of perceived deficiencies of the language. In programming languages, like in natural languages nothing succeed like success.

The second interesting category is number of applications written in particular language that became part of Linux or, at least, are including in standard RHEL/FEDORA/CENTOS or Debian/Ubuntu repository.

The third relevant category is the number and quality of books for the particular language.

Complexity Curse

History of programming languages raises interesting general questions about the limit of complexity of programming languages. There is strong historical evidence that a language with simpler core, or even simplistic core Basic, Pascal) have better chances to acquire high level of popularity.

The underlying fact here probably is that most programmers are at best mediocre and such programmers tend on intuitive level to avoid more complex, more rich languages and prefer, say, Pascal to PL/1 and PHP to Perl. Or at least avoid it on a particular phase of language development (C++ is not simpler language then PL/1, but was widely adopted because of the progress of hardware, availability of compilers and not the least, because it was associated with OO exactly at the time OO became a mainstream fashion).

Complex non-orthogonal languages can succeed only as a result of a long period of language development (which usually adds complexly -- just compare Fortran IV with Fortran 99; or PHP 3 with PHP 5 ) from a smaller core. Attempts to ride some fashionable new trend extending existing popular language to this new "paradigm" also proved to be relatively successful (OO programming in case of C++, which is a superset of C).

Historically, few complex languages were successful (PL/1, Ada, Perl, C++), but even if they were successful, their success typically was temporary rather then permanent  (PL/1, Ada, Perl). As Professor Wilkes noted   (iee90):

Things move slowly in the computer language field but, over a sufficiently long period of time, it is possible to discern trends. In the 1970s, there was a vogue among system programmers for BCPL, a typeless language. This has now run its course, and system programmers appreciate some typing support. At the same time, they like a language with low level features that enable them to do things their way, rather than the compiler’s way, when they want to.

They continue, to have a strong preference for a lean language. At present they tend to favor C in its various versions. For applications in which flexibility is important, Lisp may be said to have gained strength as a popular programming language.

Further progress is necessary in the direction of achieving modularity. No language has so far emerged which exploits objects in a fully satisfactory manner, although C++ goes a long way. ADA was progressive in this respect, but unfortunately it is in the process of collapsing under its own great weight.

ADA is an example of what can happen when an official attempt is made to orchestrate technical advances. After the experience with PL/1 and ALGOL 68, it should have been clear that the future did not lie with massively large languages.

I would direct the reader’s attention to Modula-3, a modest attempt to build on the appeal and success of Pascal and Modula-2 [12].

Complexity of the compiler/interpreter also matter as it affects portability: this is one thing that probably doomed PL/1 (and later Ada), although those days a new language typically come with open source compiler (or in case of scripting languages, an interpreter) and this is less of a problem.

Here is an interesting take on language design from the preface to The D programming language book 9D language failed to achieve any significant level of popularity):

Programming language design seeks power in simplicity and, when successful, begets beauty.

Choosing the trade-offs among contradictory requirements is a difficult task that requires good taste from the language designer as much as mastery of theoretical principles and of practical implementation matters. Programming language design is software-engineering-complete.

D is a language that attempts to consistently do the right thing within the constraints it chose: system-level access to computing resources, high performance, and syntactic similarity with C-derived languages. In trying to do the right thing, D sometimes stays with tradition and does what other languages do, and other times it breaks tradition with a fresh, innovative solution. On occasion that meant revisiting the very constraints that D ostensibly embraced. For example, large program fragments or indeed entire programs can be written in a well-defined memory-safe subset of D, which entails giving away a small amount of system-level access for a large gain in program debuggability.

You may be interested in D if the following values are important to you:

The role of fashion

At the initial, the most difficult stage of language development the language should solve an important problem that was inadequately solved by currently popular languages.  But at the same time the language has few chances to succeed unless it perfectly fits into the current software fashion. This "fashion factor" is probably as important as several other factors combined. With the notable exclusion of "language sponsor" factor.  The latter can make or break the language.

Like in woman dress fashion rules in language design.  And with time this trend became more and more pronounced.  A new language should simultaneously represent the current fashionable trend.  For example OO-programming was a visit card into the world of "big, successful languages" since probably early 90th (C++, Java, Python).  Before that "structured programming" and "verification" (Pascal, Modula) played similar role.

Programming environment and the role of "powerful sponsor" in language success

PL/1, Java, C#, Ada, Python are languages that had powerful sponsors. Pascal, Basic, Forth, partially Perl (O'Reilly was a sponsor for a short period of time)  are examples of the languages that had no such sponsor during the initial period of development.  C and C++ are somewhere in between.

But language itself is not enough. Any language now need a "programming environment" which consists of a set of libraries, debugger and other tools (make tool, lint, pretty-printer, etc). The set of standard" libraries and debugger are probably two most important elements. They cost  lot of time (or money) to develop and here the role of powerful sponsor is difficult to underestimate.

While this is not the necessary condition for becoming popular, it really helps: other things equal the weight of the sponsor of the language does matter. For example Java, being a weak, inconsistent language (C-- with garbage collection and OO) was pushed through the throat on the strength of marketing and huge amount of money spend on creating Java programming environment. 

The same was partially true for  C# and Python. That's why Python, despite its "non-Unix" origin is more viable scripting language now then, say, Perl (which is better integrated with Unix and has pretty innovative for scripting languages support of pointers and regular expressions), or Ruby (which has support of coroutines from day 1, not as "bolted on" feature like in Python).

Like in political campaigns, negative advertizing also matter. For example Perl suffered greatly from blackmail comparing programs in it with "white noise". And then from withdrawal of O'Reilly from the role of sponsor of the language (although it continue to milk that Perl book publishing franchise ;-)

People proved to be pretty gullible and in this sense language marketing is not that different from woman clothing marketing :-)

Language level and success

One very important classification of programming languages is based on so called the level of the language.  Essentially after there is at least one language that is successful on a given level, the success of other languages on the same level became more problematic. Higher chances for success are for languages that have even slightly higher, but still higher level then successful predecessors.

The level of the language informally can be described as the number of statements (or, more correctly, the number of  lexical units (tokens)) needed to write a solution of a particular problem in one language versus another. This way we can distinguish several levels of programming languages:

 "Nanny languages" vs "Sharp razor" languages

Some people distinguish between "nanny languages" and "sharp razor" languages. The latter do not attempt to protect user from his errors while the former usually go too far... Right compromise is extremely difficult to find.

For example, I consider the explicit availability of pointers as an important feature of the language that greatly increases its expressive power and far outweighs risks of errors in hands of unskilled practitioners.  In other words attempts to make the language "safer" often misfire.

Expressive style of the languages

Another useful typology is based in expressive style of the language:

Those categories are not pure and somewhat overlap. For example, it's possible to program in an object-oriented style in C, or even assembler. Some scripting languages like Perl have built-in regular expressions engines that are a part of the language so they have functional component despite being procedural. Some relatively low level languages (Algol-style languages) implement garbage collection. A good example is Java. There are scripting languages that compile into common language framework which was designed for high level languages. For example, Iron Python compiles into .Net.

Weak correlation between quality of design and popularity

Popularity of the programming languages is not strongly connected to their quality. Some languages that look like a collection of language designer blunders (PHP, Java ) became quite popular. Java became  a new Cobol and PHP dominates dynamic Web sites construction. The dominant technology for such Web sites is often called LAMP, which means Linux - Apache -MySQL- PHP. Being a highly simplified but badly constructed subset of Perl, kind of new Basic for dynamic Web sites construction PHP provides the most depressing experience. I was unpleasantly surprised when I had learnt that the Wikipedia engine was rewritten in PHP from Perl some time ago, but this fact quite illustrates the trend.

So language design quality has little to do with the language success in the marketplace. Simpler languages have more wide appeal as success of PHP (which at the beginning was at the expense of Perl) suggests. In addition much depends whether the language has powerful sponsor like was the case with Java (Sun and IBM) as well as Python (Google).

Progress in programming languages has been very uneven and contain several setbacks like Java. Currently this progress is usually associated with scripting languages. History of programming languages raises interesting general questions about "laws" of programming language design. First let's reproduce several notable quotes:

  1. Knuth law of optimization: "Premature optimization is the root of all evil (or at least most of it) in programming." - Donald Knuth
  2. "Greenspun's Tenth Rule of Programming: any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp." - Phil Greenspun
  3. "The key to performance is elegance, not battalions of special cases." - Jon Bentley and Doug McIlroy
  4. "Some may say Ruby is a bad rip-off of Lisp or Smalltalk, and I admit that. But it is nicer to ordinary people." - Matz, LL2
  5. Most papers in computer science describe how their author learned what someone else already knew. - Peter Landin
  6. "The only way to learn a new programming language is by writing programs in it." - Kernighan and Ritchie
  7. "If I had a nickel for every time I've written "for (i = 0; i < N; i++)" in C, I'd be a millionaire." - Mike Vanier
  8. "Language designers are not intellectuals. They're not as interested in thinking as you might hope. They just want to get a language done and start using it." - Dave Moon
  9. "Don't worry about what anybody else is going to do. The best way to predict the future is to invent it." - Alan Kay
  10. "Programs must be written for people to read, and only incidentally for machines to execute." - Abelson & Sussman, SICP, preface to the first edition

Please note that one thing is to read language manual and appreciate how good the concepts are, and another to bet your project on a new, unproved language without good debuggers, manuals and, what is very important, libraries. Debugger is very important but standard libraries are crucial: they represent a factor that makes or breaks new languages.

In this sense languages are much like cars. For many people car is the thing that they use get to work and shopping mall and they are not very interesting is engine inline or V-type and the use of fuzzy logic in the transmission. What they care is safety, reliability, mileage, insurance and the size of trunk. In this sense "Worse is better" is very true. I already mentioned the importance of the debugger. The other important criteria is quality and availability of libraries. Actually libraries are what make 80% of the usability of the language, moreover in a sense libraries are more important than the language...

A popular belief that scripting is "unsafe" or "second rate" or "prototype" solution is completely wrong. If a project had died than it does not matter what was the implementation language, so for any successful project and tough schedules scripting language (especially in dual scripting language+C combination, for example TCL+C) is an optimal blend that for a large class of tasks. Such an approach helps to separate architectural decisions from implementation details much better that any OO model does.

Moreover even for tasks that handle a fair amount of computations and data (computationally intensive tasks) such languages as Python and Perl are often (but not always !) competitive with C++, C# and, especially, Java.

The second important observation about programming languages is that language per se is just a tiny part of what can be called language programming environment. the latter includes libraries, IDE, books, level of adoption at universities, popular, important applications written in the language, level of support and key players that support the language on major platforms such as Windows and Linux and other similar things. A mediocre language with good programming environment can give a run for the money to similar superior in design languages that are just naked. This is a story behind success of Java. Critical application is also very important and this is a story of success of PHP which is nothing but a bastardatized derivative of Perl (with all most interesting Perl features removed ;-) adapted to creation of dynamic web sites using so called LAMP stack.

History of programming languages raises interesting general questions about the limit of complexity of programming languages. There is strong historical evidence that languages with simpler core, or even simplistic core has more chanced to acquire high level of popularity. The underlying fact here probably is that most programmers are at best mediocre and such programmer tend on intuitive level to avoid more complex, more rich languages like, say, PL/1 and Perl. Or at least avoid it on a particular phase of language development (C++ is not simpler language then PL/1, but was widely adopted because OO became a fashion). Complex non-orthogonal languages can succeed only as a result on long period of language development from a smaller core or with the banner of some fashionable new trend (OO programming in case of C++).

Programming Language Development Timeline

Here is modified from Byte the timeline of Programming Languages (for the original see BYTE.com September 1995 / 20th Anniversary /)

Forties

ca. 1946


1949

Fifties


1951


1952

1957


1958


1959

Sixties


1960


1962


1963


1964


1965


1966


1967



1969

Seventies


1970


1972


1974


1975


1976


1977


1978


1979

Eighties


1980


1981


1982


1983


1984


1985


1986


1987


1988


1989

Nineties


1990


1991


1992


1993


1994


1995


1996


1997


2000


2006

2007 

2008:

2009:

2011

2017:

Special note on Scripting languages

Scripting helps to avoid OO trap that is pushed by
  "a hoard of practically illiterate researchers
publishing crap papers in junk conferences."

Despite the fact that scripting languages are really important computer science phenomena, they are usually happily ignored in university curriculums.  Students are usually indoctrinated (or in less politically correct terms  "brainwashed")  in Java and OO programming ;-)

This site tries to give scripting languages proper emphasis and  promotes scripting languages as an alternative to mainstream reliance on "Java as a new Cobol" approach for software development. Please read my introduction to the topic that was recently converted into the article: A Slightly Skeptical View on Scripting Languages.  

The tragedy of scripting language designer is that there is no way to overestimate the level of abuse of any feature of the language.  Half of the programmers by definition is below average and it is this half that matters most in enterprise environment.  In a way the higher is the level of programmer, the less relevant for him are limitations of the language. That's why statements like "Perl is badly suitable for large project development" are plain vanilla silly. With proper discipline it is perfectly suitable and programmers can be more productive with Perl than with Java. The real question is "What is the team quality and quantity?".  

Scripting is a part of Unix cultural tradition and Unix was the initial development platform for most of mainstream scripting languages with the exception of REXX. But they are portable and now all can be used in Windows and other OSes.

List of Softpanorama pages related to scripting languages

Standard topics

Main Representatives of the family Related topics History Etc

Different scripting languages provide different level of integration with base OS API (for example, Unix or Windows). For example Iron Python compiles into .Net and provides pretty high level of integration with Windows. The same is true about Perl and Unix: almost all Unix system calls are available directly from Perl. Moreover Perl integrates most of Unix API in a very natural way, making it perfect replacement of shell for coding complex scripts. It also have very good debugger. The latter is weak point of shells like bash and ksh93

Unix proved that treating everything like a file is a powerful OS paradigm. In a similar way scripting languages proved that "everything is a string" is also an extremely powerful programming paradigm.

Unix proved that treating everything like a file is a powerful OS paradigm. In a similar way scripting languages proved that "everything is a string" is also extremely powerful programming paradigm.

There are also several separate pages devoted to scripting in different applications. The main emphasis is on shells and Perl. Right now I am trying to convert my old Perl lecture notes into a eBook Nikolai Bezroukov. Introduction to Perl for Unix System Administrators.

Along with pages devoted to major scripting languages this site has many pages devoted to scripting in different applications.  There are more then a dozen of "Perl/Scripting tools for a particular area" type of pages. The most well developed and up-to-date pages of this set are probably Shells and Perl. This page main purpose is to follow the changes in programming practices that can be called the "rise of  scripting," as predicted in the famous John Ousterhout article Scripting: Higher Level Programming for the 21st Century in IEEE COMPUTER (1998). In this brilliant paper he wrote:

...Scripting languages such as Perl and Tcl represent a very different style of programming than system programming languages such as C or Java. Scripting languages are designed for "gluing" applications; they use typeless approaches to achieve a higher level of programming and more rapid application development than system programming languages. Increases in computer speed and changes in the application mix are making scripting languages more and more important for applications of the future.

...Scripting languages and system programming languages are complementary, and most major computing platforms since the 1960's have provided both kinds of languages. The languages are typically used together in component frameworks, where components are created with system programming languages and glued together with scripting languages. However, several recent trends, such as faster machines, better scripting languages, the increasing importance of graphical user interfaces and component architectures, and the growth of the Internet, have greatly increased the applicability of scripting languages. These trends will continue over the next decade, with more and more new applications written entirely in scripting languages and system programming languages used primarily for creating components.

My e-book Portraits of Open Source Pioneers contains several chapters on scripting (most are in early draft stage) that expand on this topic. 

The reader must understand that the treatment of the scripting languages in press, and especially academic press is far from being fair: entrenched academic interests often promote old or commercially supported paradigms until they retire, so change of paradigm often is possible only with the change of generations. And people tend to live longer those days... Please also be aware that even respectable academic magazines like Communications of ACM and IEEE Software often promote "Cargo cult software engineering" like Capability Maturity (CMM) model.

Dr. Nikolai Bezroukov


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

2007 2006 2005 2004 2003 2002 2001 2000 1999

[Sep 17, 2019] How can a Perl regex re-use part of the previous match for the next match?

Sep 17, 2019 | stackoverflow.com

Ask Question Asked 10 years, 1 month ago Active 10 years, 1 month ago Viewed 2k times 2


dlw ,Aug 16, 2009 at 3:52

I need some Perl regular expression help. The following snippet of code:
use strict; 
use warnings; 
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L"; 
my $word = "plus"; 
my @results = ();
1 while $str =~ s/(.{2}\b$word\b.{2})/push(@results,"$1\n")/e;
print @results;

Produces the following output:

A plus B
D plus E
2 plus F
H plus I
4 plus J
5 plus K

What I want to see is this, where a character already matched can appear in a new match in a different context:

A plus B
D plus E
E plus F
H plus I
I plus J
J plus K

How do I change the regular expression to get this result? Thanks --- Dan

Michael Carman ,Aug 16, 2009 at 4:11

General advice: Don't use s/// when you want m// . Be specific in what you match.

The answer is pos :

#!/usr/bin/perl -l

use strict;
use warnings;

my $str = 'In this example, ' . 'A plus B equals C, ' .
          'D plus E plus F equals G ' .
          'and H plus I plus J plus K equals L';

my $word = "plus";

my @results;

while ( $str =~ /([A-Z] $word [A-Z])/g ) {
    push @results, $1;
    pos($str) -= 1;
}

print "'$_'" for @results;

Output:

C:\Temp> b
'A plus B'
'D plus E'
'E plus F'
'H plus I'
'I plus J'
'J plus K'

Michael Carman ,Aug 16, 2009 at 2:56

You can use a m//g instead of s/// and assign to the pos function to rewind the match location before the second term:
use strict;
use warnings;

my $str  = 'In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L';
my $word = 'plus';
my @results;

while ($str =~ /(.{2}\b$word\b(.{2}))/g) {
    push @results, "$1\n";
    pos $str -= length $2;
}
print @results;

dlw ,Aug 18, 2009 at 13:00

Another option is to use a lookahead:
use strict; 
use warnings; 
my $str = "In this example, A plus B equals C, D plus E "
        . "plus F equals G and H plus I plus J plus K equals L"; 
my $word = "plus"; 
my $chars = 2;
my @results = ();

push @results, $1 
  while $str =~ /(?=((.{0,$chars}?\b$word\b).{0,$chars}))\2/g;

print "'$_'\n" for @results;

Within the lookahead, capturing group 1 matches the word along with a variable number of leading and trailing context characters, up to whatever maximum you've set. When the lookahead finishes, the backreference \2 matches "for real" whatever was captured by group 2, which is the same as group 1 except that it stops at the end of the word. That sets pos where you want it, without requiring you to calculate how many characters you actually matched after the word.

ysth ,Aug 16, 2009 at 9:01

Given the "Full Disclosure" comment (but assuming .{0,35} , not .{35} ), I'd do
use List::Util qw/max min/;
my $context = 35;
while ( $str =~ /\b$word\b/g ) {
    my $pre = substr( $str, max(0, $-[0] - $context), min( $-[0], $context ) );
    my $post = substr( $str, $+[0], $context );
    my $match = substr( $str, $-[0], $+[0] - $-[0] );
    $pre =~ s/.*\n//s;
    $post =~ s/\n.*//s;
    push @results, "$pre$match$post";
}
print for @results;

You'd skip the substitutions if you really meant (?s:.{0,35}) .

Greg Hewgill ,Aug 16, 2009 at 2:29

Here's one way to do it:
use strict; 
use warnings; 
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L"; 
my $word = "plus"; 
my @results = ();
my $i = 0;
while (substr($str, $i) =~ /(.{2}\b$word\b.{2})/) {
    push @results, "$1\n";
    $i += $-[0] + 1;
}
print @results;

It's not terribly Perl-ish, but it works and it doesn't use too many obscure regular expression tricks. However, you might have to look up the function of the special variable @- in perlvar .

ghostdog74 ,Aug 16, 2009 at 3:44

don't have to use regex. basically, just split up the string, use a loop to go over each items, check for "plus" , then get the word from before and after.
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L"; 
@s = split /\s+/,$str;
for($i=0;$i<=scalar @s;$i++){
    if ( "$s[$i]"  eq "plus" ){
        print "$s[$i-1] plus $s[$i+1]\n";
    }
}

[Sep 16, 2019] How can I capture multiple matches from the same Perl regex - Stack Overflow

Sep 16, 2019 | stackoverflow.com

How can I capture multiple matches from the same Perl regex? Ask Question Asked 9 years, 4 months ago Active 7 years, 4 months ago Viewed 35k times 24 1


brian d foy ,May 22, 2010 at 15:42

I'm trying to parse a single string and get multiple chunks of data out from the same string with the same regex conditions. I'm parsing a single HTML doc that is static (For an undisclosed reason, I can't use an HTML parser to do the job.) I have an expression that looks like:
$string =~ /\<img\ssrc\="(.*)"/;

and I want to get the value of $1. However, in the one string, there are many img tags like this, so I need something like an array returned (@1?) is this possible?

VolatileRig ,Jan 14, 2014 at 19:41

As Jim's answer, use the /g modifier (in list context or in a loop).

But beware of greediness, you dont want the .* to match more than necessary (and dont escape < = , they are not special).

while($string =~ /<img\s+src="(.*?)"/g ) {
  ...
}

Robert Wohlfarth ,May 21, 2010 at 18:44

@list = ($string =~ m/\<img\ssrc\="(.*)"/g);

The g modifier matches all occurences in the string. List context returns all of the matches. See the m// operator in perlop .

dalton ,May 21, 2010 at 18:42

You just need the global modifier /g at the end of the match. Then loop through until there are no matches remaining
my @matches;
while ($string =~ /\<img\ssrc\="(.*)"/g) {
        push(@matches, $1);
}

VolatileRig ,May 24, 2010 at 16:37

Use the /g modifier and list context on the left, as in
@result = $string =~ /\<img\ssrc\="(.*)"/g;

[Sep 16, 2019] Pretty-print for shell script

Sep 16, 2019 | stackoverflow.com

Benoit ,Oct 21, 2010 at 13:19

I'm looking for something similiar to indent but for (bash) scripts. Console only, no colorizing, etc.

Do you know of one ?

Jamie ,Sep 11, 2012 at 3:00

Vim can indent bash scripts. But not reformat them before indenting.
Backup your bash script, open it with vim, type gg=GZZ and indent will be corrected. (Note for the impatient: this overwrites the file, so be sure to do that backup!)

Though, some bugs with << (expecting EOF as first character on a line) e.g.

EDIT: ZZ not ZQ

Daniel Martí ,Apr 8, 2018 at 13:52

A bit late to the party, but it looks like shfmt could do the trick for you.

Brian Chrisman ,Sep 9 at 7:47

In bash I do this:
reindent() {
source <(echo "Zibri () {";cat "$1"; echo "}")
declare -f Zibri|head --lines=-1|tail --lines=+3 | sed -e "s/^\s\s\s\s//"
}

this eliminates comments and reindents the script "bash way".

If you have HEREDOCS in your script, they got ruined by the sed in the previous function.

So use:

reindent() {
source <(echo "Zibri () {";cat "$1"; echo "}")
declare -f Zibri|head --lines=-1|tail --lines=+3"
}

But all your script will have a 4 spaces indentation.

Or you can do:

reindent () 
{ 
    rstr=$(mktemp -u "XXXXXXXXXX");
    source <(echo "Zibri () {";cat "$1"|sed -e "s/^\s\s\s\s/$rstr/"; echo "}");
    echo '#!/bin/bash';
    declare -f Zibri | head --lines=-1 | tail --lines=+3 | sed -e "s/^\s\s\s\s//;s/$rstr/    /"
}

which takes care also of heredocs.

> ,

Found this http://www.linux-kheops.com/doc/perl/perl-aubert/fmt.script .

Very nice, only one thing i took out is the [...]->test substitution.

[Sep 16, 2019] A command-line HTML pretty-printer Making messy HTML readable - Stack Overflow

Notable quotes:
"... Have a look at the HTML Tidy Project: http://www.html-tidy.org/ ..."
Sep 16, 2019 | stackoverflow.com

nisetama ,Aug 12 at 10:33

I'm looking for recommendations for HTML pretty printers which fulfill the following requirements:

> ,

Have a look at the HTML Tidy Project: http://www.html-tidy.org/

The granddaddy of HTML tools, with support for modern standards.

There used to be a fork called tidy-html5 which since became the official thing. Here is its GitHub repository .

Tidy is a console application for Mac OS X, Linux, Windows, UNIX, and more. It corrects and cleans up HTML and XML documents by fixing markup errors and upgrading legacy code to modern standards.

For your needs, here is the command line to call Tidy:

[Sep 11, 2019] string - Extract substring in Bash - Stack Overflow

Sep 11, 2019 | stackoverflow.com

Jeff ,May 8 at 18:30

Given a filename in the form someletters_12345_moreleters.ext , I want to extract the 5 digits and put them into a variable.

So to emphasize the point, I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.

I am very interested in the number of different ways that this can be accomplished.

Berek Bryan ,Jan 24, 2017 at 9:30

Use cut :
echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2

More generic:

INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING

JB. ,Jan 6, 2015 at 10:13

If x is constant, the following parameter expansion performs substring extraction:
b=${a:12:5}

where 12 is the offset (zero-based) and 5 is the length

If the underscores around the digits are the only ones in the input, you can strip off the prefix and suffix (respectively) in two steps:

tmp=${a#*_}   # remove prefix ending in "_"
b=${tmp%_*}   # remove suffix starting with "_"

If there are other underscores, it's probably feasible anyway, albeit more tricky. If anyone knows how to perform both expansions in a single expression, I'd like to know too.

Both solutions presented are pure bash, with no process spawning involved, hence very fast.

A Sahra ,Mar 16, 2017 at 6:27

Generic solution where the number can be anywhere in the filename, using the first of such sequences:
number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)

Another solution to extract exactly a part of a variable:

number=${filename:offset:length}

If your filename always have the format stuff_digits_... you can use awk:

number=$(echo $filename | awk -F _ '{ print $2 }')

Yet another solution to remove everything except digits, use

number=$(echo $filename | tr -cd '[[:digit:]]')

sshow ,Jul 27, 2017 at 17:22

In case someone wants more rigorous information, you can also search it in man bash like this
$ man bash [press return key]
/substring  [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]

Result:

${parameter:offset}
       ${parameter:offset:length}
              Substring Expansion.  Expands to  up  to  length  characters  of
              parameter  starting  at  the  character specified by offset.  If
              length is omitted, expands to the substring of parameter  start‐
              ing at the character specified by offset.  length and offset are
              arithmetic expressions (see ARITHMETIC  EVALUATION  below).   If
              offset  evaluates  to a number less than zero, the value is used
              as an offset from the end of the value of parameter.  Arithmetic
              expressions  starting  with  a - must be separated by whitespace
              from the preceding : to be distinguished from  the  Use  Default
              Values  expansion.   If  length  evaluates to a number less than
              zero, and parameter is not @ and not an indexed  or  associative
              array,  it is interpreted as an offset from the end of the value
              of parameter rather than a number of characters, and the  expan‐
              sion is the characters between the two offsets.  If parameter is
              @, the result is length positional parameters beginning at  off‐
              set.   If parameter is an indexed array name subscripted by @ or
              *, the result is the length members of the array beginning  with
              ${parameter[offset]}.   A  negative  offset is taken relative to
              one greater than the maximum index of the specified array.  Sub‐
              string  expansion applied to an associative array produces unde‐
              fined results.  Note that a negative offset  must  be  separated
              from  the  colon  by  at least one space to avoid being confused
              with the :- expansion.  Substring indexing is zero-based  unless
              the  positional  parameters are used, in which case the indexing
              starts at 1 by default.  If offset  is  0,  and  the  positional
              parameters are used, $0 is prefixed to the list.

Aleksandr Levchuk ,Aug 29, 2011 at 5:51

Building on jor's answer (which doesn't work for me):
substring=$(expr "$filename" : '.*_\([^_]*\)_.*')

kayn ,Oct 5, 2015 at 8:48

I'm surprised this pure bash solution didn't come up:
a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345

You probably want to reset IFS to what value it was before, or unset IFS afterwards!

zebediah49 ,Jun 4 at 17:31

Here's how i'd do it:
FN=someletters_12345_moreleters.ext
[[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}

Note: the above is a regular expression and is restricted to your specific scenario of five digits surrounded by underscores. Change the regular expression if you need different matching.

TranslucentCloud ,Jun 16, 2014 at 13:27

Following the requirements

I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.

I found some grep ways that may be useful:

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+" 
12345

or better

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}" 
12345

And then with -Po syntax:

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+' 
12345

Or if you want to make it fit exactly 5 characters:

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}' 
12345

Finally, to make it be stored in a variable it is just need to use the var=$(command) syntax.

Darron ,Jan 9, 2009 at 16:13

Without any sub-processes you can:
shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}

A very small variant of this will also work in ksh93.

user2350426

add a comment ,Aug 5, 2014 at 8:11
If we focus in the concept of:
"A run of (one or several) digits"

We could use several external tools to extract the numbers.
We could quite easily erase all other characters, either sed or tr:

name='someletters_12345_moreleters.ext'

echo $name | sed 's/[^0-9]*//g'    # 12345
echo $name | tr -c -d 0-9          # 12345

But if $name contains several runs of numbers, the above will fail:

If "name=someletters_12345_moreleters_323_end.ext", then:

echo $name | sed 's/[^0-9]*//g'    # 12345323
echo $name | tr -c -d 0-9          # 12345323

We need to use regular expresions (regex).
To select only the first run (12345 not 323) in sed and perl:

echo $name | sed 's/[^0-9]*\([0-9]\{1,\}\).*$/\1/'
perl -e 'my $name='$name';my ($num)=$name=~/(\d+)/;print "$num\n";'

But we could as well do it directly in bash (1) :

regex=[^0-9]*([0-9]{1,}).*$; \
[[ $name =~ $regex ]] && echo ${BASH_REMATCH[1]}

This allows us to extract the FIRST run of digits of any length
surrounded by any other text/characters.

Note : regex=[^0-9]*([0-9]{5,5}).*$; will match only exactly 5 digit runs. :-)

(1) : faster than calling an external tool for each short texts. Not faster than doing all processing inside sed or awk for large files.

codist ,May 6, 2011 at 12:50

Here's a prefix-suffix solution (similar to the solutions given by JB and Darron) that matches the first block of digits and does not depend on the surrounding underscores:
str='someletters_12345_morele34ters.ext'
s1="${str#"${str%%[[:digit:]]*}"}"   # strip off non-digit prefix from str
s2="${s1%%[^[:digit:]]*}"            # strip off non-digit suffix from s1
echo "$s2"                           # 12345

Campa ,Oct 21, 2016 at 8:12

I love sed 's capability to deal with regex groups:
> var="someletters_12345_moreletters.ext"
> digits=$( echo $var | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345

A slightly more general option would be not to assume that you have an underscore _ marking the start of your digits sequence, hence for instance stripping off all non-numbers you get before your sequence: s/[^0-9]\+\([0-9]\+\).*/\1/p .


> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
    Attempt to match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the special  character  &  to
    refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

More on this, in case you're not too confident with regexps:

All escapes \ are there to make sed 's regexp processing work.

Dan Dascalescu ,May 8 at 18:28

Given test.txt is a file containing "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST" 
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST

Alex Raj Kaliamoorthy ,Jul 29, 2016 at 7:41

My answer will have more control on what you want out of your string. Here is the code on how you can extract 12345 out of your string
str="someletters_12345_moreleters.ext"
str=${str#*_}
str=${str%_more*}
echo $str

This will be more efficient if you want to extract something that has any chars like abc or any special characters like _ or - . For example: If your string is like this and you want everything that is after someletters_ and before _moreleters.ext :

str="someletters_123-45-24a&13b-1_moreleters.ext"

With my code you can mention what exactly you want. Explanation:

#* It will remove the preceding string including the matching key. Here the key we mentioned is _ % It will remove the following string including the matching key. Here the key we mentioned is '_more*'

Do some experiments yourself and you would find this interesting.

Dan Dascalescu ,May 8 at 18:27

similar to substr('abcdefg', 2-1, 3) in php:
echo 'abcdefg'|tail -c +2|head -c 3

olibre ,Nov 25, 2015 at 14:50

Ok, here goes pure Parameter Substitution with an empty string. Caveat is that I have defined someletters and moreletters as only characters. If they are alphanumeric, this will not work as it is.
filename=someletters_12345_moreletters.ext
substring=${filename//@(+([a-z])_|_+([a-z]).*)}
echo $substring
12345

gniourf_gniourf ,Jun 4 at 17:33

There's also the bash builtin 'expr' command:
INPUT="someletters_12345_moreleters.ext"  
SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)_.*' `  
echo $SUBSTRING

russell ,Aug 1, 2013 at 8:12

A little late, but I just ran across this problem and found the following:
host:/tmp$ asd=someletters_12345_moreleters.ext 
host:/tmp$ echo `expr $asd : '.*_\(.*\)_'`
12345
host:/tmp$

I used it to get millisecond resolution on an embedded system that does not have %N for date:

set `grep "now at" /proc/timer_list`
nano=$3
fraction=`expr $nano : '.*\(...\)......'`
$debug nano is $nano, fraction is $fraction

> ,Aug 5, 2018 at 17:13

A bash solution:
IFS="_" read -r x digs x <<<'someletters_12345_moreleters.ext'

This will clobber a variable called x . The var x could be changed to the var _ .

input='someletters_12345_moreleters.ext'
IFS="_" read -r _ digs _ <<<"$input"

[Sep 10, 2019] How do I avoid an uninitialized value

Sep 10, 2019 | stackoverflow.com

marto ,Jul 15, 2011 at 16:52

I use this scrub function to clean up output from other functions.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;

my %h = (
    a => 1,
    b => 1
    );

print scrub($h{c});

sub scrub {
    my $a = shift;

    return ($a eq '' or $a eq '~' or not defined $a) ? -1 : $a;
}

The problem occurs when I also would like to handle the case, where the key in a hash doesn't exist, which is shown in the example with scrub($h{c}) .

What change should be make to scrub so it can handle this case?

Sandra Schlichting ,Jun 22, 2017 at 19:00

You're checking whether $a eq '' before checking whether it's defined, hence the warning "Use of uninitialized value in string eq". Simply change the order of things in the conditional:
return (!defined($a) or $a eq '' or $a eq '~') ? -1 : $a;

As soon as anything in the chain of 'or's matches, Perl will stop processing the conditional, thus avoiding the erroneous attempt to compare undef to a string.

Sandra Schlichting ,Jul 14, 2011 at 14:34

In scrub it is too late to check, if the hash has an entry for key key . scrub() only sees a scalar, which is undef , if the hash key does not exist. But a hash could have an entry with the value undef also, like this:
my %h = (
 a => 1,
 b => 1,
 c => undef
);

So I suggest to check for hash entries with the exists function.

[Sep 10, 2019] How do I check if a Perl scalar variable has been initialized - Stack Overflow

Sep 10, 2019 | stackoverflow.com

How do I check if a Perl scalar variable has been initialized? Ask Question Asked 8 years, 11 months ago Active 3 years ago Viewed 49k times 33 10


brian d foy ,Sep 18, 2010 at 13:53

Is the following the best way to check if a scalar variable is initialized in Perl, using defined ?
my $var;

if (cond) {
    $var = "string1";
}

# Is this the correct way?
if (defined $var) {
    ...
}

mob ,Sep 25, 2010 at 21:35

Perl doesn't offer a way to check whether or not a variable has been initialized.

However, scalar variables that haven't been explicitly initialized with some value happen to have the value of undef by default. You are right about defined being the right way to check whether or not a variable has a value of undef .

There's several other ways tho. If you want to assign to the variable if it's undef , which your example code seems to indicate, you could, for example, use perl's defined-or operator:

$var //= 'a default value';

vol7ron ,Sep 17, 2010 at 23:17

It depends on what you're trying to do. The proper C way to do things is to initialize variables when they are declared; however, Perl is not C , so one of the following may be what you want:
  1)   $var = "foo" unless defined $var;      # set default after the fact
  2)   $var = defined $var? $var : {...};     # ternary operation
  3)   {...} if !(defined $var);              # another way to write 1)
  4)   $var = $var || "foo";                  # set to $var unless it's falsy, in which case set to 'foo'
  5)   $var ||= "foo";                        # retain value of $var unless it's falsy, in which case set to 'foo' (same as previous line)
  6)   $var = $var // "foo";                  # set to $var unless it's undefined, in which case set to 'foo'
  7)   $var //= "foo";                        # 5.10+ ; retain value of $var unless it's undefined, in which case set to 'foo' (same as previous line)


C way of doing things ( not recommended ):

# initialize the variable to a default value during declaration
#   then test against that value when you want to see if it's been changed
my $var = "foo";
{...}
if ($var eq "foo"){
   ... # do something
} else {
   ... # do something else
}

Another long-winded way of doing this is to create a class and a flag when the variable's been changed, which is unnecessary.

Axeman ,Sep 17, 2010 at 20:39

If you don't care whether or not it's empty, it is. Otherwise you can check
if ( length( $str || '' )) {}

swilliams ,Sep 17, 2010 at 20:53

It depends on what you plan on doing with the variable whether or not it is defined; as of Perl 5.10, you can do this (from perl51000delta ):

A new operator // (defined-or) has been implemented. The following expression:

 $a // $b

is merely equivalent to

defined $a ? $a : $b

and the statement

$c //= $d;

can now be used instead of

$c = $d unless defined $c;

rafl ,Jun 24, 2012 at 7:53

'defined' will return true if a variable has a real value.

As an aside, in a hash, this can be true:

if(exists $h{$e} && !defined $h{$e})

[Sep 10, 2019] logging - Perl - Output the log files - Stack Overflow

Aug 27, 2015 | stackoverflow.com

Perl - Output the log files Ask Question Asked 4 years ago Active 4 years ago Viewed 3k times 1 2


Arunesh Singh ,Aug 27, 2015 at 8:53

I have created a perl that telnet to multiple switches. I would like to check if telnet functions properly by telneting the switch.

This is my code to telnet to the switches:

#!/usr/bin/perl
use warnings;
use Net::Cisco;

open( OUTPUT, ">log.txt" );
open( SWITCHIP, "ip.txt" ) or die "couldn't open ip.txt";

my $count = 0;

while (<SWITCHIP>) {
    chomp($_);
    my $switch = $_;
    my $tl     = 0;
    my $t      = Net::Telnet::Cisco->new(
        Host => $switch,
        Prompt =>
            '/(?m:^(?:[\w.\/]+\:)?[\w.-]+\s?(?:\(config[^\)]*\))?\s?[\$#>]\s?(?:\(enable\))?\s*$)/',
        Timeout => 5,
        Errmode => 'return'
    ) or $tl = 1;

    my @output = ();
    if ( $tl != 1 ) {
        print "$switch Telnet success\n";
    }
    else {
        my $telnetstat = "Telnet Failed";
        print "$switch $telnetstat\n";
    }
    close(OUTPUT);
    $count++;
}

This is my output status after I was testing 7 switches:

10.xxx.3.17 Telnet success
10.xxx.10.12 Telnet success
10.xxx.136.10 Telnet success
10.xxx.136.12 Telnet success
10.xxx.188.188 Telnet Failed
10.xxx.136.13 Telnet success

I would like to convert the telnet result as log file.
How to separate successful and failed telnet results by using perl?

Danny Luk ,Aug 28, 2015 at 8:40

Please Try the following
#!/usr/bin/perl
use warnings;
use Net::Cisco;
################################### S
open( OUTPUTS, ">log_Success.txt" );
open( OUTPUTF, ">log_Fail.txt" );
################################### E
open( SWITCHIP, "ip.txt" ) or die "couldn't open ip.txt";

my $count = 0;

while (<SWITCHIP>) {
    chomp($_);
    my $switch = $_;
    my $tl     = 0;
    my $t      = Net::Telnet::Cisco->new(
        Host => $switch,
        Prompt =>
            '/(?m:^(?:[\w.\/]+\:)?[\w.-]+\s?(?:\(config[^\)]*\))?\s?[\$#>]\s?(?:\(enable\))?\s*$)/',
        Timeout => 5,
        Errmode => 'return'
    ) or $tl = 1;

    my @output = ();
################################### S
    if ( $tl != 1 ) {
        print "$switch Telnet success\n"; # for printing it in screen
        print OUTPUTS "$switch Telnet success\n"; # it will print it in the log_Success.txt
    }
    else {
        my $telnetstat = "Telnet Failed";
        print "$switch $telnetstat\n"; # for printing it in screen
        print OUTPUTF "$switch $telnetstat\n"; # it will print it in the log_Fail.txt
    }
################################### E
    $count++;
}
################################### S
close(SWITCHIP);
close(OUTPUTS);
close(OUTPUTF);
################################### E

Danny Luk ,Aug 28, 2015 at 8:39

In print statement after print just write the filehandle name which is OUTPUT in your code:
print OUTPUT "$switch Telnet success\n";

and

print OUTPUT "$switch $telnetstat\n";

A side note: always use a lexical filehandle and three arguments with error handling to open a file. This line open(OUTPUT, ">log.txt"); you can write like this:

open my $fhout, ">", "log.txt" or die $!;

Sobrique ,Aug 28, 2015 at 8:39

Use Sys::Syslog to write log messages.

But since you're opening a log.txt file with the handle OUTPUT , just change your two print statements to have OUTPUT as the first argument and the string as the next (without a comma).

my $telnetstat;
if($tl != 1) {
  $telnetstat = "Telnet success";
} else {
  $telnetstat = "Telnet Failed";
}
print OUTPUT "$switch $telnetstat\n";

# Or the shorter ternary operator line for all the above:
print OUTPUT $swtich . (!$tl ? " Telnet success\n" : " Telnet failed\n");

You might consider moving close to an END block:

END {
  close(OUTPUT);
}

Not only because it's in your while loop.

[Sep 08, 2019] How to replace spaces in file names using a bash script

Sep 08, 2019 | stackoverflow.com

Ask Question Asked 9 years, 4 months ago Active 2 months ago Viewed 226k times 238 127


Mark Byers ,Apr 25, 2010 at 19:20

Can anyone recommend a safe solution to recursively replace spaces with underscores in file and directory names starting from a given root directory? For example:
$ tree
.
|-- a dir
|   `-- file with spaces.txt
`-- b dir
    |-- another file with spaces.txt
    `-- yet another file with spaces.pdf

becomes:

$ tree
.
|-- a_dir
|   `-- file_with_spaces.txt
`-- b_dir
    |-- another_file_with_spaces.txt
    `-- yet_another_file_with_spaces.pdf

Jürgen Hötzel ,Nov 4, 2015 at 3:03

Use rename (aka prename ) which is a Perl script which may be on your system already. Do it in two steps:
find -name "* *" -type d | rename 's/ /_/g'    # do the directories first
find -name "* *" -type f | rename 's/ /_/g'

Based on Jürgen's answer and able to handle multiple layers of files and directories in a single bound using the "Revision 1.5 1998/12/18 16:16:31 rmb1" version of /usr/bin/rename (a Perl script):

find /tmp/ -depth -name "* *" -execdir rename 's/ /_/g' "{}" \;

oevna ,Jan 1, 2016 at 8:25

I use:
for f in *\ *; do mv "$f" "${f// /_}"; done

Though it's not recursive, it's quite fast and simple. I'm sure someone here could update it to be recursive.

The ${f// /_} part utilizes bash's parameter expansion mechanism to replace a pattern within a parameter with supplied string. The relevant syntax is ${parameter/pattern/string} . See: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html or http://wiki.bash-hackers.org/syntax/pe .

armandino ,Dec 3, 2013 at 20:51

find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done

failed to get it right at first, because I didn't think of directories.

Edmund Elmer ,Jul 3 at 7:12

you can use detox by Doug Harple
detox -r <folder>

Dennis Williamson ,Mar 22, 2012 at 20:33

A find/rename solution. rename is part of util-linux.

You need to descend depth first, because a whitespace filename can be part of a whitespace directory:

find /tmp/ -depth -name "* *" -execdir rename " " "_" "{}" ";"

armandino ,Apr 26, 2010 at 11:49

bash 4.0
#!/bin/bash
shopt -s globstar
for file in **/*\ *
do 
    mv "$file" "${file// /_}"       
done

Itamar ,Jan 31, 2013 at 21:27

you can use this:
    find . -name '* *' | while read fname 

do
        new_fname=`echo $fname | tr " " "_"`

        if [ -e $new_fname ]
        then
                echo "File $new_fname already exists. Not replacing $fname"
        else
                echo "Creating new file $new_fname to replace $fname"
                mv "$fname" $new_fname
        fi
done

yabt ,Apr 26, 2010 at 14:54

Here's a (quite verbose) find -exec solution which writes "file already exists" warnings to stderr:
function trspace() {
   declare dir name bname dname newname replace_char
   [ $# -lt 1 -o $# -gt 2 ] && { echo "usage: trspace dir char"; return 1; }
   dir="${1}"
   replace_char="${2:-_}"
   find "${dir}" -xdev -depth -name $'*[ \t\r\n\v\f]*' -exec bash -c '
      for ((i=1; i<=$#; i++)); do
         name="${@:i:1}"
         dname="${name%/*}"
         bname="${name##*/}"
         newname="${dname}/${bname//[[:space:]]/${0}}"
         if [[ -e "${newname}" ]]; then
            echo "Warning: file already exists: ${newname}" 1>&2
         else
            mv "${name}" "${newname}"
         fi
      done
  ' "${replace_char}" '{}' +
}

trspace rootdir _

degi ,Aug 8, 2011 at 9:10

This one does a little bit more. I use it to rename my downloaded torrents (no special characters (non-ASCII), spaces, multiple dots, etc.).
#!/usr/bin/perl

&rena(`find . -type d`);
&rena(`find . -type f`);

sub rena
{
    ($elems)=@_;
    @t=split /\n/,$elems;

    for $e (@t)
    {
    $_=$e;
    # remove ./ of find
    s/^\.\///;
    # non ascii transliterate
    tr [\200-\377][_];
    tr [\000-\40][_];
    # special characters we do not want in paths
    s/[ \-\,\;\?\+\'\"\!\[\]\(\)\@\#]/_/g;
    # multiple dots except for extension
    while (/\..*\./)
    {
        s/\./_/;
    }
    # only one _ consecutive
    s/_+/_/g;
    next if ($_ eq $e ) or ("./$_" eq $e);
    print "$e -> $_\n";
    rename ($e,$_);
    }
}

Junyeop Lee ,Apr 10, 2018 at 9:44

Recursive version of Naidim's Answers.
find . -name "* *" | awk '{ print length, $0 }' | sort -nr -s | cut -d" " -f2- | while read f; do base=$(basename "$f"); newbase="${base// /_}"; mv "$(dirname "$f")/$(basename "$f")" "$(dirname "$f")/$newbase"; done

ghoti ,Dec 5, 2016 at 21:16

I found around this script, it may be interesting :)
 IFS=$'\n';for f in `find .`; do file=$(echo $f | tr [:blank:] '_'); [ -e $f ] && [ ! -e $file ] && mv "$f" $file;done;unset IFS

ghoti ,Dec 5, 2016 at 21:17

Here's a reasonably sized bash script solution
#!/bin/bash
(
IFS=$'\n'
    for y in $(ls $1)
      do
         mv $1/`echo $y | sed 's/ /\\ /g'` $1/`echo "$y" | sed 's/ /_/g'`
      done
)

user1060059 ,Nov 22, 2011 at 15:15

This only finds files inside the current directory and renames them . I have this aliased.

find ./ -name "* *" -type f -d 1 | perl -ple '$file = $_; $file =~ s/\s+/_/g; rename($_, $file);

Hongtao ,Sep 26, 2014 at 19:30

I just make one for my own purpose. You may can use it as reference.
#!/bin/bash
cd /vzwhome/c0cheh1/dev_source/UB_14_8
for file in *
do
    echo $file
    cd "/vzwhome/c0cheh1/dev_source/UB_14_8/$file/Configuration/$file"
    echo "==> `pwd`"
    for subfile in *\ *; do [ -d "$subfile" ] && ( mv "$subfile" "$(echo $subfile | sed -e 's/ /_/g')" ); done
    ls
    cd /vzwhome/c0cheh1/dev_source/UB_14_8
done

Marcos Jean Sampaio ,Dec 5, 2016 at 20:56

For files in folder named /files
for i in `IFS="";find /files -name *\ *`
do
   echo $i
done > /tmp/list


while read line
do
   mv "$line" `echo $line | sed 's/ /_/g'`
done < /tmp/list

rm /tmp/list

Muhammad Annaqeeb ,Sep 4, 2017 at 11:03

For those struggling through this using macOS, first install all the tools:
 brew install tree findutils rename

Then when needed to rename, make an alias for GNU find (gfind) as find. Then run the code of @Michel Krelin:

alias find=gfind 
find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done

[Sep 07, 2019] As soon as you stop writing code on a regular basis you stop being a programmer. You lose you qualification very quickly. That's a typical tragedy of talented programmers who became mediocre managers or, worse, theoretical computer scientists

Programming skills are somewhat similar to the skills of people who play violin or piano. As soon a you stop playing violin or piano still start to evaporate. First slowly, then quicker. In two yours you probably will lose 80%.
Notable quotes:
"... I happened to look the other day. I wrote 35 programs in January, and 28 or 29 programs in February. These are small programs, but I have a compulsion. I love to write programs and put things into it. ..."
Sep 07, 2019 | archive.computerhistory.org

Dijkstra said he was proud to be a programmer. Unfortunately he changed his attitude completely, and I think he wrote his last computer program in the 1980s. At this conference I went to in 1967 about simulation language, Chris Strachey was going around asking everybody at the conference what was the last computer program you wrote. This was 1967. Some of the people said, "I've never written a computer program." Others would say, "Oh yeah, here's what I did last week." I asked Edsger this question when I visited him in Texas in the 90s and he said, "Don, I write programs now with pencil and paper, and I execute them in my head." He finds that a good enough discipline.

I think he was mistaken on that. He taught me a lot of things, but I really think that if he had continued... One of Dijkstra's greatest strengths was that he felt a strong sense of aesthetics, and he didn't want to compromise his notions of beauty. They were so intense that when he visited me in the 1960s, I had just come to Stanford. I remember the conversation we had. It was in the first apartment, our little rented house, before we had electricity in the house.

We were sitting there in the dark, and he was telling me how he had just learned about the specifications of the IBM System/360, and it made him so ill that his heart was actually starting to flutter.

He intensely disliked things that he didn't consider clean to work with. So I can see that he would have distaste for the languages that he had to work with on real computers. My reaction to that was to design my own language, and then make Pascal so that it would work well for me in those days. But his response was to do everything only intellectually.

So, programming.

I happened to look the other day. I wrote 35 programs in January, and 28 or 29 programs in February. These are small programs, but I have a compulsion. I love to write programs and put things into it. I think of a question that I want to answer, or I have part of my book where I want to present something. But I can't just present it by reading about it in a book. As I code it, it all becomes clear in my head. It's just the discipline. The fact that I have to translate my knowledge of this method into something that the machine is going to understand just forces me to make that crystal-clear in my head. Then I can explain it to somebody else infinitely better. The exposition is always better if I've implemented it, even though it's going to take me more time.

[Sep 07, 2019] Knuth about computer science and money: At that point I made the decision in my life that I wasn't going to optimize my income;

Sep 07, 2019 | archive.computerhistory.org

So I had a programming hat when I was outside of Cal Tech, and at Cal Tech I am a mathematician taking my grad studies. A startup company, called Green Tree Corporation because green is the color of money, came to me and said, "Don, name your price. Write compilers for us and we will take care of finding computers for you to debug them on, and assistance for you to do your work. Name your price." I said, "Oh, okay. $100,000.", assuming that this was In that era this was not quite at Bill Gate's level today, but it was sort of out there.

The guy didn't blink. He said, "Okay." I didn't really blink either. I said, "Well, I'm not going to do it. I just thought this was an impossible number."

At that point I made the decision in my life that I wasn't going to optimize my income; I was really going to do what I thought I could do for well, I don't know. If you ask me what makes me most happy, number one would be somebody saying "I learned something from you". Number two would be somebody saying "I used your software". But number infinity would be Well, no. Number infinity minus one would be "I bought your book". It's not as good as "I read your book", you know. Then there is "I bought your software"; that was not in my own personal value. So that decision came up. I kept up with the literature about compilers. The Communications of the ACM was where the action was. I also worked with people on trying to debug the ALGOL language, which had problems with it. I published a few papers, like "The Remaining Trouble Spots in ALGOL 60" was one of the papers that I worked on. I chaired a committee called "Smallgol" which was to find a subset of ALGOL that would work on small computers. I was active in programming languages.

[Sep 07, 2019] Knuth: maybe 1 in 50 people have the "computer scientist's" type of intellect

Sep 07, 2019 | conservancy.umn.edu

Frana: You have made the comment several times that maybe 1 in 50 people have the "computer scientist's mind." Knuth: Yes. Frana: I am wondering if a large number of those people are trained professional librarians? [laughter] There is some strangeness there. But can you pinpoint what it is about the mind of the computer scientist that is....

Knuth: That is different?

Frana: What are the characteristics?

Knuth: Two things: one is the ability to deal with non-uniform structure, where you have case one, case two, case three, case four. Or that you have a model of something where the first component is integer, the next component is a Boolean, and the next component is a real number, or something like that, you know, non-uniform structure. To deal fluently with those kinds of entities, which is not typical in other branches of mathematics, is critical. And the other characteristic ability is to shift levels quickly, from looking at something in the large to looking at something in the small, and many levels in between, jumping from one level of abstraction to another. You know that, when you are adding one to some number, that you are actually getting closer to some overarching goal. These skills, being able to deal with nonuniform objects and to see through things from the top level to the bottom level, these are very essential to computer programming, it seems to me. But maybe I am fooling myself because I am too close to it.

Frana: It is the hardest thing to really understand that which you are existing within.

Knuth: Yes.

[Sep 07, 2019] Knuth: I can be a writer, who tries to organize other people's ideas into some kind of a more coherent structure so that it is easier to put things together

Sep 07, 2019 | conservancy.umn.edu

Knuth: I can be a writer, who tries to organize other people's ideas into some kind of a more coherent structure so that it is easier to put things together. I can see that I could be viewed as a scholar that does his best to check out sources of material, so that people get credit where it is due. And to check facts over, not just to look at the abstract of something, but to see what the methods were that did it and to fill in holes if necessary. I look at my role as being able to understand the motivations and terminology of one group of specialists and boil it down to a certain extent so that people in other parts of the field can use it. I try to listen to the theoreticians and select what they have done that is important to the programmer on the street; to remove technical jargon when possible.

But I have never been good at any kind of a role that would be making policy, or advising people on strategies, or what to do. I have always been best at refining things that are there and bringing order out of chaos. I sometimes raise new ideas that might stimulate people, but not really in a way that would be in any way controlling the flow. The only time I have ever advocated something strongly was with literate programming; but I do this always with the caveat that it works for me, not knowing if it would work for anybody else.

When I work with a system that I have created myself, I can always change it if I don't like it. But everybody who works with my system has to work with what I give them. So I am not able to judge my own stuff impartially. So anyway, I have always felt bad about if anyone says, 'Don, please forecast the future,'...

[Sep 06, 2019] Knuth: No, I stopped going to conferences. It was too discouraging. Computer programming keeps getting harder because more stuff is discovered

Sep 06, 2019 | conservancy.umn.edu

Knuth: No, I stopped going to conferences. It was too discouraging. Computer programming keeps getting harder because more stuff is discovered. I can cope with learning about one new technique per day, but I can't take ten in a day all at once. So conferences are depressing; it means I have so much more work to do. If I hide myself from the truth I am much happier.

[Sep 06, 2019] How TAOCP was hatched

Notable quotes:
"... Also, Addison-Wesley was the people who were asking me to do this book; my favorite textbooks had been published by Addison Wesley. They had done the books that I loved the most as a student. For them to come to me and say, "Would you write a book for us?", and here I am just a secondyear gradate student -- this was a thrill. ..."
"... But in those days, The Art of Computer Programming was very important because I'm thinking of the aesthetical: the whole question of writing programs as something that has artistic aspects in all senses of the word. The one idea is "art" which means artificial, and the other "art" means fine art. All these are long stories, but I've got to cover it fairly quickly. ..."
Sep 06, 2019 | archive.computerhistory.org

Knuth: This is, of course, really the story of my life, because I hope to live long enough to finish it. But I may not, because it's turned out to be such a huge project. I got married in the summer of 1961, after my first year of graduate school. My wife finished college, and I could use the money I had made -- the $5000 on the compiler -- to finance a trip to Europe for our honeymoon.

We had four months of wedded bliss in Southern California, and then a man from Addison-Wesley came to visit me and said "Don, we would like you to write a book about how to write compilers."

The more I thought about it, I decided "Oh yes, I've got this book inside of me."

I sketched out that day -- I still have the sheet of tablet paper on which I wrote -- I sketched out 12 chapters that I thought ought to be in such a book. I told Jill, my wife, "I think I'm going to write a book."

As I say, we had four months of bliss, because the rest of our marriage has all been devoted to this book. Well, we still have had happiness. But really, I wake up every morning and I still haven't finished the book. So I try to -- I have to -- organize the rest of my life around this, as one main unifying theme. The book was supposed to be about how to write a compiler. They had heard about me from one of their editorial advisors, that I knew something about how to do this. The idea appealed to me for two main reasons. One is that I did enjoy writing. In high school I had been editor of the weekly paper. In college I was editor of the science magazine, and I worked on the campus paper as copy editor. And, as I told you, I wrote the manual for that compiler that we wrote. I enjoyed writing, number one.

Also, Addison-Wesley was the people who were asking me to do this book; my favorite textbooks had been published by Addison Wesley. They had done the books that I loved the most as a student. For them to come to me and say, "Would you write a book for us?", and here I am just a secondyear gradate student -- this was a thrill.

Another very important reason at the time was that I knew that there was a great need for a book about compilers, because there were a lot of people who even in 1962 -- this was January of 1962 -- were starting to rediscover the wheel. The knowledge was out there, but it hadn't been explained. The people who had discovered it, though, were scattered all over the world and they didn't know of each other's work either, very much. I had been following it. Everybody I could think of who could write a book about compilers, as far as I could see, they would only give a piece of the fabric. They would slant it to their own view of it. There might be four people who could write about it, but they would write four different books. I could present all four of their viewpoints in what I would think was a balanced way, without any axe to grind, without slanting it towards something that I thought would be misleading to the compiler writer for the future. I considered myself as a journalist, essentially. I could be the expositor, the tech writer, that could do the job that was needed in order to take the work of these brilliant people and make it accessible to the world. That was my motivation. Now, I didn't have much time to spend on it then, I just had this page of paper with 12 chapter headings on it. That's all I could do while I'm a consultant at Burroughs and doing my graduate work. I signed a contract, but they said "We know it'll take you a while." I didn't really begin to have much time to work on it until 1963, my third year of graduate school, as I'm already finishing up on my thesis. In the summer of '62, I guess I should mention, I wrote another compiler. This was for Univac; it was a FORTRAN compiler. I spent the summer, I sold my soul to the devil, I guess you say, for three months in the summer of 1962 to write a FORTRAN compiler. I believe that the salary for that was $15,000, which was much more than an assistant professor. I think assistant professors were getting eight or nine thousand in those days.

Feigenbaum: Well, when I started in 1960 at [University of California] Berkeley, I was getting $7,600 for the nine-month year.

Knuth: Knuth: Yeah, so you see it. I got $15,000 for a summer job in 1962 writing a FORTRAN compiler. One day during that summer I was writing the part of the compiler that looks up identifiers in a hash table. The method that we used is called linear probing. Basically you take the variable name that you want to look up, you scramble it, like you square it or something like this, and that gives you a number between one and, well in those days it would have been between 1 and 1000, and then you look there. If you find it, good; if you don't find it, go to the next place and keep on going until you either get to an empty place, or you find the number you're looking for. It's called linear probing. There was a rumor that one of Professor Feller's students at Princeton had tried to figure out how fast linear probing works and was unable to succeed. This was a new thing for me. It was a case where I was doing programming, but I also had a mathematical problem that would go into my other [job]. My winter job was being a math student, my summer job was writing compilers. There was no mix. These worlds did not intersect at all in my life at that point. So I spent one day during the summer while writing the compiler looking at the mathematics of how fast does linear probing work. I got lucky, and I solved the problem. I figured out some math, and I kept two or three sheets of paper with me and I typed it up. ["Notes on 'Open' Addressing', 7/22/63] I guess that's on the internet now, because this became really the genesis of my main research work, which developed not to be working on compilers, but to be working on what they call analysis of algorithms, which is, have a computer method and find out how good is it quantitatively. I can say, if I got so many things to look up in the table, how long is linear probing going to take. It dawned on me that this was just one of many algorithms that would be important, and each one would lead to a fascinating mathematical problem. This was easily a good lifetime source of rich problems to work on. Here I am then, in the middle of 1962, writing this FORTRAN compiler, and I had one day to do the research and mathematics that changed my life for my future research trends. But now I've gotten off the topic of what your original question was.

Feigenbaum: We were talking about sort of the.. You talked about the embryo of The Art of Computing. The compiler book morphed into The Art of Computer Programming, which became a seven-volume plan.

Knuth: Exactly. Anyway, I'm working on a compiler and I'm thinking about this. But now I'm starting, after I finish this summer job, then I began to do things that were going to be relating to the book. One of the things I knew I had to have in the book was an artificial machine, because I'm writing a compiler book but machines are changing faster than I can write books. I have to have a machine that I'm totally in control of. I invented this machine called MIX, which was typical of the computers of 1962.

In 1963 I wrote a simulator for MIX so that I could write sample programs for it, and I taught a class at Caltech on how to write programs in assembly language for this hypothetical computer. Then I started writing the parts that dealt with sorting problems and searching problems, like the linear probing idea. I began to write those parts, which are part of a compiler, of the book. I had several hundred pages of notes gathering for those chapters for The Art of Computer Programming. Before I graduated, I've already done quite a bit of writing on The Art of Computer Programming.

I met George Forsythe about this time. George was the man who inspired both of us [Knuth and Feigenbaum] to come to Stanford during the '60s. George came down to Southern California for a talk, and he said, "Come up to Stanford. How about joining our faculty?" I said "Oh no, I can't do that. I just got married, and I've got to finish this book first." I said, "I think I'll finish the book next year, and then I can come up [and] start thinking about the rest of my life, but I want to get my book done before my son is born." Well, John is now 40-some years old and I'm not done with the book. Part of my lack of expertise is any good estimation procedure as to how long projects are going to take. I way underestimated how much needed to be written about in this book. Anyway, I started writing the manuscript, and I went merrily along writing pages of things that I thought really needed to be said. Of course, it didn't take long before I had started to discover a few things of my own that weren't in any of the existing literature. I did have an axe to grind. The message that I was presenting was in fact not going to be unbiased at all. It was going to be based on my own particular slant on stuff, and that original reason for why I should write the book became impossible to sustain. But the fact that I had worked on linear probing and solved the problem gave me a new unifying theme for the book. I was going to base it around this idea of analyzing algorithms, and have some quantitative ideas about how good methods were. Not just that they worked, but that they worked well: this method worked 3 times better than this method, or 3.1 times better than this method. Also, at this time I was learning mathematical techniques that I had never been taught in school. I found they were out there, but they just hadn't been emphasized openly, about how to solve problems of this kind.

So my book would also present a different kind of mathematics than was common in the curriculum at the time, that was very relevant to analysis of algorithm. I went to the publishers, I went to Addison Wesley, and said "How about changing the title of the book from 'The Art of Computer Programming' to 'The Analysis of Algorithms'." They said that will never sell; their focus group couldn't buy that one. I'm glad they stuck to the original title, although I'm also glad to see that several books have now come out called "The Analysis of Algorithms", 20 years down the line.

But in those days, The Art of Computer Programming was very important because I'm thinking of the aesthetical: the whole question of writing programs as something that has artistic aspects in all senses of the word. The one idea is "art" which means artificial, and the other "art" means fine art. All these are long stories, but I've got to cover it fairly quickly.

I've got The Art of Computer Programming started out, and I'm working on my 12 chapters. I finish a rough draft of all 12 chapters by, I think it was like 1965. I've got 3,000 pages of notes, including a very good example of what you mentioned about seeing holes in the fabric. One of the most important chapters in the book is parsing: going from somebody's algebraic formula and figuring out the structure of the formula. Just the way I had done in seventh grade finding the structure of English sentences, I had to do this with mathematical sentences.

Chapter ten is all about parsing of context-free language, [which] is what we called it at the time. I covered what people had published about context-free languages and parsing. I got to the end of the chapter and I said, well, you can combine these ideas and these ideas, and all of a sudden you get a unifying thing which goes all the way to the limit. These other ideas had sort of gone partway there. They would say "Oh, if a grammar satisfies this condition, I can do it efficiently." "If a grammar satisfies this condition, I can do it efficiently." But now, all of a sudden, I saw there was a way to say I can find the most general condition that can be done efficiently without looking ahead to the end of the sentence. That you could make a decision on the fly, reading from left to right, about the structure of the thing. That was just a natural outgrowth of seeing the different pieces of the fabric that other people had put together, and writing it into a chapter for the first time. But I felt that this general concept, well, I didn't feel that I had surrounded the concept. I knew that I had it, and I could prove it, and I could check it, but I couldn't really intuit it all in my head. I knew it was right, but it was too hard for me, really, to explain it well.

So I didn't put in The Art of Computer Programming. I thought it was beyond the scope of my book. Textbooks don't have to cover everything when you get to the harder things; then you have to go to the literature. My idea at that time [is] I'm writing this book and I'm thinking it's going to be published very soon, so any little things I discover and put in the book I didn't bother to write a paper and publish in the journal because I figure it'll be in my book pretty soon anyway. Computer science is changing so fast, my book is bound to be obsolete.

It takes a year for it to go through editing, and people drawing the illustrations, and then they have to print it and bind it and so on. I have to be a little bit ahead of the state-of-the-art if my book isn't going to be obsolete when it comes out. So I kept most of the stuff to myself that I had, these little ideas I had been coming up with. But when I got to this idea of left-to-right parsing, I said "Well here's something I don't really understand very well. I'll publish this, let other people figure out what it is, and then they can tell me what I should have said." I published that paper I believe in 1965, at the end of finishing my draft of the chapter, which didn't get as far as that story, LR(k). Well now, textbooks of computer science start with LR(k) and take off from there. But I want to give you an idea of

[Sep 06, 2019] Python vs. Ruby Which is best for web development Opensource.com

Sep 06, 2019 | opensource.com

Python was developed organically in the scientific space as a prototyping language that easily could be translated into C++ if a prototype worked. This happened long before it was first used for web development. Ruby, on the other hand, became a major player specifically because of web development; the Rails framework extended Ruby's popularity with people developing complex websites.

Which programming language best suits your needs? Here is a quick overview of each language to help you choose:

Approach: one best way vs. human-language Python

Python takes a direct approach to programming. Its main goal is to make everything obvious to the programmer. In Python, there is only one "best" way to do something. This philosophy has led to a language strict in layout.

Python's core philosophy consists of three key hierarchical principles:

This regimented philosophy results in Python being eminently readable and easy to learn -- and why Python is great for beginning coders. Python has a big foothold in introductory programming courses . Its syntax is very simple, with little to remember. Because its code structure is explicit, the developer can easily tell where everything comes from, making it relatively easy to debug.

Python's hierarchy of principles is evident in many aspects of the language. Its use of whitespace to do flow control as a core part of the language syntax differs from most other languages, including Ruby. The way you indent code determines the meaning of its action. This use of whitespace is a prime example of Python's "explicit" philosophy, the shape a Python app takes spells out its logic and how the app will act.

Ruby

In contrast to Python, Ruby focuses on "human-language" programming, and its code reads like a verbal language rather than a machine-based one, which many programmers, both beginners and experts, like. Ruby follows the principle of " least astonishment ," and offers myriad ways to do the same thing. These similar methods can have multiple names, which many developers find confusing and frustrating.

Unlike Python, Ruby makes use of "blocks," a first-class object that is treated as a unit within a program. In fact, Ruby takes the concept of OOP (Object-Oriented Programming) to its limit. Everything is an object -- even global variables are actually represented within the ObjectSpace object. Classes and modules are themselves objects, and functions and operators are methods of objects. This ability makes Ruby especially powerful, especially when combined with its other primary strength: functional programming and the use of lambdas.

In addition to blocks and functional programming, Ruby provides programmers with many other features, including fragmentation, hashable and unhashable types, and mutable strings.

Ruby's fans find its elegance to be one of its top selling points. At the same time, Ruby's "magical" features and flexibility can make it very hard to track down bugs.

Communities: stability vs. innovation

Although features and coding philosophy are the primary drivers for choosing a given language, the strength of a developer community also plays an important role. Fortunately, both Python and Ruby boast strong communities.

Python

Python's community already includes a large Linux and academic community and therefore offers many academic use cases in both math and science. That support gives the community a stability and diversity that only grows as Python increasingly is used for web development.

Ruby

However, Ruby's community has focused primarily on web development from the get-go. It tends to innovate more quickly than the Python community, but this innovation also causes more things to break. In addition, while it has gotten more diverse, it has yet to reach the level of diversity that Python has.

Final thoughts

For web development, Ruby has Rails and Python has Django. Both are powerful frameworks, so when it comes to web development, you can't go wrong with either language. Your decision will ultimately come down to your level of experience and your philosophical preferences.

If you plan to focus on building web applications, Ruby is popular and flexible. There is a very strong community built upon it and they are always on the bleeding edge of development.

If you are interested in building web applications and would like to learn a language that's used more generally, try Python. You'll get a diverse community and lots of influence and support from the various industries in which it is used.

Tom Radcliffe - Tom Radcliffe has over 20 years experience in software development and management in both academia and industry. He is a professional engineer (PEO and APEGBC) and holds a PhD in physics from Queen's University at Kingston. Tom brings a passion for quantitative, data-driven processes to ActiveState .

[Sep 03, 2019] bash - How to convert strings like 19-FEB-12 to epoch date in UNIX - Stack Overflow

Feb 11, 2013 | stackoverflow.com

Asked 6 years, 6 months ago Active 2 years, 2 months ago Viewed 53k times 24 4

hellish ,Feb 11, 2013 at 3:45

In UNIX how to convert to epoch milliseconds date strings like:
19-FEB-12
16-FEB-12
05-AUG-09

I need this to compare these dates with the current time on the server.

> ,

To convert a date to seconds since the epoch:
date --date="19-FEB-12" +%s

Current epoch:

date +%s

So, since your dates are in the past:

NOW=`date +%s`
THEN=`date --date="19-FEB-12" +%s`

let DIFF=$NOW-$THEN
echo "The difference is: $DIFF"

Using BSD's date command, you would need

$ date -j -f "%d-%B-%y" 19-FEB-12 +%s

Differences from GNU date :

  1. -j prevents date from trying to set the clock
  2. The input format must be explicitly set with -f
  3. The input date is a regular argument, not an option (viz. -d )
  4. When no time is specified with the date, use the current time instead of midnight.

[Sep 03, 2019] command line - How do I convert an epoch timestamp to a human readable format on the cli - Unix Linux Stack Exchange

Sep 03, 2019 | unix.stackexchange.com

Gilles ,Oct 11, 2010 at 18:14

date -d @1190000000 Replace 1190000000 with your epoch

Stefan Lasiewski ,Oct 11, 2010 at 18:04

$ echo 1190000000 | perl -pe 's/(\d+)/localtime($1)/e' 
Sun Sep 16 20:33:20 2007

This can come in handy for those applications which use epoch time in the logfiles:

$ tail -f /var/log/nagios/nagios.log | perl -pe 's/(\d+)/localtime($1)/e'
[Thu May 13 10:15:46 2010] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;HOSTA;check_raid;0;check_raid.pl: OK (Unit 0 on Controller 0 is OK)

Stéphane Chazelas ,Jul 31, 2015 at 20:24

With bash-4.2 or above:
printf '%(%F %T)T\n' 1234567890

(where %F %T is the strftime() -type format)

That syntax is inspired from ksh93 .

In ksh93 however, the argument is taken as a date expression where various and hardly documented formats are supported.

For a Unix epoch time, the syntax in ksh93 is:

printf '%(%F %T)T\n' '#1234567890'

ksh93 however seems to use its own algorithm for the timezone and can get it wrong. For instance, in Britain, it was summer time all year in 1970, but:

$ TZ=Europe/London bash -c 'printf "%(%c)T\n" 0'
Thu 01 Jan 1970 01:00:00 BST
$ TZ=Europe/London ksh93 -c 'printf "%(%c)T\n" "#0"'
Thu Jan  1 00:00:00 1970

DarkHeart ,Jul 28, 2014 at 3:56

Custom format with GNU date :
date -d @1234567890 +'%Y-%m-%d %H:%M:%S'

Or with GNU awk :

awk 'BEGIN { print strftime("%Y-%m-%d %H:%M:%S", 1234567890); }'

Linked SO question: https://stackoverflow.com/questions/3249827/convert-from-unixtime-at-command-line

,

The two I frequently use are:
$ perl -leprint\ scalar\ localtime\ 1234567890
Sat Feb 14 00:31:30 2009

[Sep 02, 2019] bash - Pretty-print for shell script

Oct 21, 2010 | stackoverflow.com

Pretty-print for shell script Ask Question Asked 8 years, 10 months ago Active 30 days ago Viewed 14k times


Benoit ,Oct 21, 2010 at 13:19

I'm looking for something similiar to indent but for (bash) scripts. Console only, no colorizing, etc.

Do you know of one ?

Jamie ,Sep 11, 2012 at 3:00

Vim can indent bash scripts. But not reformat them before indenting.
Backup your bash script, open it with vim, type gg=GZZ and indent will be corrected. (Note for the impatient: this overwrites the file, so be sure to do that backup!)

Though, some bugs with << (expecting EOF as first character on a line) e.g.

EDIT: ZZ not ZQ

Daniel Martí ,Apr 8, 2018 at 13:52

A bit late to the party, but it looks like shfmt could do the trick for you.

Brian Chrisman ,Aug 11 at 4:08

In bash I do this:
reindent() {
source <(echo "Zibri () {";cat "$1"; echo "}")
declare -f Zibri|head --lines=-1|tail --lines=+3 | sed -e "s/^\s\s\s\s//"
}

this eliminates comments and reindents the script "bash way".

If you have HEREDOCS in your script, they got ruined by the sed in the previous function.

So use:

reindent() {
source <(echo "Zibri () {";cat "$1"; echo "}")
declare -f Zibri|head --lines=-1|tail --lines=+3"
}

But all your script will have a 4 spaces indentation.

Or you can do:

reindent () 
{ 
    rstr=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 16 | head -n 1);
    source <(echo "Zibri () {";cat "$1"|sed -e "s/^\s\s\s\s/$rstr/"; echo "}");
    echo '#!/bin/bash';
    declare -f Zibri | head --lines=-1 | tail --lines=+3 | sed -e "s/^\s\s\s\s//;s/$rstr/    /"
}

which takes care also of heredocs.

Pius Raeder ,Jan 10, 2017 at 8:35

Found this http://www.linux-kheops.com/doc/perl/perl-aubert/fmt.script .

Very nice, only one thing i took out is the [...]->test substitution.

[Sep 02, 2019] Negative regex for Perl string pattern match

Sep 02, 2019 | stackoverflow.com

mirod ,Jun 15, 2011 at 17:21

I have this regex:
if($string =~ m/^(Clinton|[^Bush]|Reagan)/i)
  {print "$string\n"};

I want to match with Clinton and Reagan, but not Bush.

It's not working.

Calvin Taylor ,Jul 14, 2017 at 21:03

Sample text:

Clinton said
Bush used crayons
Reagan forgot

Just omitting a Bush match:

$ perl -ne 'print if /^(Clinton|Reagan)/' textfile
Clinton said
Reagan forgot

Or if you really want to specify:

$ perl -ne 'print if /^(?!Bush)(Clinton|Reagan)/' textfile
Clinton said
Reagan forgot

GuruM ,Oct 27, 2012 at 12:54

Your regex does not work because [] defines a character class, but what you want is a lookahead:
(?=) - Positive look ahead assertion foo(?=bar) matches foo when followed by bar
(?!) - Negative look ahead assertion foo(?!bar) matches foo when not followed by bar
(?<=) - Positive look behind assertion (?<=foo)bar matches bar when preceded by foo
(?<!) - Negative look behind assertion (?<!foo)bar matches bar when NOT preceded by foo
(?>) - Once-only subpatterns (?>\d+)bar Performance enhancing when bar not present
(?(x)) - Conditional subpatterns
(?(3)foo|fu)bar - Matches foo if 3rd subpattern has matched, fu if not
(?#) - Comment (?# Pattern does x y or z)

So try: (?!bush)

[Sep 02, 2019] How to get the current line number of a file open using Perl

Sep 02, 2019 | stackoverflow.com

How to get the current line number of a file open using Perl? Ask Question Asked 8 years, 3 months ago Active 6 months ago Viewed 33k times 25 1


tadmc ,May 8, 2011 at 17:08

open my $fp, '<', $file or die $!;

while (<$fp>) {
    my $line = $_;
    if ($line =~ /$regex/) {
        # How do I find out which line number this match happened at?
    }
}

close $fp;

tchrist ,Apr 22, 2015 at 21:16

Use $. (see perldoc perlvar ).

tchrist ,May 7, 2011 at 16:48

You can also do it through OO interface:
use IO::Handle;
# later on ...
my $n = $fp->input_line_number();

This is in perldoc perlvar , too.

> ,

Don't use $. , nor $_ or any global variable. Use this instead:
while(my $line = <FILE>) {
  print $line unless ${\*FILE}->input_line_number == 1;
}

To avoid this and a lot of others Perl gotchas you can use on Atom or VSCode packages like linter-perl . Stop making Perl a write-only language !

[Aug 31, 2019] Complexity prevent programmer from ever learning the whole language, only subset is learned and used

Aug 31, 2019 | ask.slashdot.org

Re:Neither! (Score 2, Interesting) 817 by M. D. Nahas on Friday December 23, 2005 @06:08PM ( #14329127 ) Attached to: Learning Java or C# as a Next Language? The cleanest languages I've used are C, Java, and OCaml. By "clean", I mean the language has a few concepts that can be completely memorized, which results in less "gotchas" and manual reading. For these languages, you'll see small manuals (e.g., K&R's book for C) which cover the complete language and then lots of pages devoted to the libraries that come with the language. I'd definitely recommend Java (or C, or OCaml) over C# for this reason. C# seems to have combined every feature of C++, Java, and VBA into a single language. It is very complex and has a ton of concepts, for which I could never memorize the whole language. I have a feeling that most programmers will use the subset of C# that is closest to the language they understand, whether it is C++, Java or VBA. You might as well learn Java's style of programming, and then, if needed, switch to C# using its Java-like features.

[Aug 29, 2019] How do I parse command line arguments in Bash - Stack Overflow

Jul 10, 2017 | stackoverflow.com

Livven, Jul 10, 2017 at 8:11

Update: It's been more than 5 years since I started this answer. Thank you for LOTS of great edits/comments/suggestions. In order save maintenance time, I've modified the code block to be 100% copy-paste ready. Please do not post comments like "What if you changed X to Y ". Instead, copy-paste the code block, see the output, make the change, rerun the script, and comment "I changed X to Y and " I don't have time to test your ideas and tell you if they work.
Method #1: Using bash without getopt[s]

Two common ways to pass key-value-pair arguments are:

Bash Space-Separated (e.g., --option argument ) (without getopt[s])

Usage demo-space-separated.sh -e conf -s /etc -l /usr/lib /etc/hosts

cat >/tmp/demo-space-separated.sh <<'EOF'
#!/bin/bash

POSITIONAL=()
while [[ $# -gt 0 ]]
do
key="$1"

case $key in
    -e|--extension)
    EXTENSION="$2"
    shift # past argument
    shift # past value
    ;;
    -s|--searchpath)
    SEARCHPATH="$2"
    shift # past argument
    shift # past value
    ;;
    -l|--lib)
    LIBPATH="$2"
    shift # past argument
    shift # past value
    ;;
    --default)
    DEFAULT=YES
    shift # past argument
    ;;
    *)    # unknown option
    POSITIONAL+=("$1") # save it in an array for later
    shift # past argument
    ;;
esac
done
set -- "${POSITIONAL[@]}" # restore positional parameters

echo "FILE EXTENSION  = ${EXTENSION}"
echo "SEARCH PATH     = ${SEARCHPATH}"
echo "LIBRARY PATH    = ${LIBPATH}"
echo "DEFAULT         = ${DEFAULT}"
echo "Number files in SEARCH PATH with EXTENSION:" $(ls -1 "${SEARCHPATH}"/*."${EXTENSION}" | wc -l)
if [[ -n $1 ]]; then
    echo "Last line of file specified as non-opt/last argument:"
    tail -1 "$1"
fi
EOF

chmod +x /tmp/demo-space-separated.sh

/tmp/demo-space-separated.sh -e conf -s /etc -l /usr/lib /etc/hosts

output from copy-pasting the block above:

FILE EXTENSION  = conf
SEARCH PATH     = /etc
LIBRARY PATH    = /usr/lib
DEFAULT         =
Number files in SEARCH PATH with EXTENSION: 14
Last line of file specified as non-opt/last argument:
#93.184.216.34    example.com
Bash Equals-Separated (e.g., --option=argument ) (without getopt[s])

Usage demo-equals-separated.sh -e=conf -s=/etc -l=/usr/lib /etc/hosts

cat >/tmp/demo-equals-separated.sh <<'EOF'
#!/bin/bash

for i in "$@"
do
case $i in
    -e=*|--extension=*)
    EXTENSION="${i#*=}"
    shift # past argument=value
    ;;
    -s=*|--searchpath=*)
    SEARCHPATH="${i#*=}"
    shift # past argument=value
    ;;
    -l=*|--lib=*)
    LIBPATH="${i#*=}"
    shift # past argument=value
    ;;
    --default)
    DEFAULT=YES
    shift # past argument with no value
    ;;
    *)
          # unknown option
    ;;
esac
done
echo "FILE EXTENSION  = ${EXTENSION}"
echo "SEARCH PATH     = ${SEARCHPATH}"
echo "LIBRARY PATH    = ${LIBPATH}"
echo "DEFAULT         = ${DEFAULT}"
echo "Number files in SEARCH PATH with EXTENSION:" $(ls -1 "${SEARCHPATH}"/*."${EXTENSION}" | wc -l)
if [[ -n $1 ]]; then
    echo "Last line of file specified as non-opt/last argument:"
    tail -1 $1
fi
EOF

chmod +x /tmp/demo-equals-separated.sh

/tmp/demo-equals-separated.sh -e=conf -s=/etc -l=/usr/lib /etc/hosts

output from copy-pasting the block above:

FILE EXTENSION  = conf
SEARCH PATH     = /etc
LIBRARY PATH    = /usr/lib
DEFAULT         =
Number files in SEARCH PATH with EXTENSION: 14
Last line of file specified as non-opt/last argument:
#93.184.216.34    example.com

To better understand ${i#*=} search for "Substring Removal" in this guide . It is functionally equivalent to `sed 's/[^=]*=//' <<< "$i"` which calls a needless subprocess or `echo "$i" | sed 's/[^=]*=//'` which calls two needless subprocesses.

Method #2: Using bash with getopt[s]

from: http://mywiki.wooledge.org/BashFAQ/035#getopts

getopt(1) limitations (older, relatively-recent getopt versions):

More recent getopt versions don't have these limitations.

Additionally, the POSIX shell (and others) offer getopts which doesn't have these limitations. I've included a simplistic getopts example.

Usage demo-getopts.sh -vf /etc/hosts foo bar

cat >/tmp/demo-getopts.sh <<'EOF'
#!/bin/sh

# A POSIX variable
OPTIND=1         # Reset in case getopts has been used previously in the shell.

# Initialize our own variables:
output_file=""
verbose=0

while getopts "h?vf:" opt; do
    case "$opt" in
    h|\?)
        show_help
        exit 0
        ;;
    v)  verbose=1
        ;;
    f)  output_file=$OPTARG
        ;;
    esac
done

shift $((OPTIND-1))

[ "${1:-}" = "--" ] && shift

echo "verbose=$verbose, output_file='$output_file', Leftovers: $@"
EOF

chmod +x /tmp/demo-getopts.sh

/tmp/demo-getopts.sh -vf /etc/hosts foo bar

output from copy-pasting the block above:

verbose=1, output_file='/etc/hosts', Leftovers: foo bar

The advantages of getopts are:

  1. It's more portable, and will work in other shells like dash .
  2. It can handle multiple single options like -vf filename in the typical Unix way, automatically.

The disadvantage of getopts is that it can only handle short options ( -h , not --help ) without additional code.

There is a getopts tutorial which explains what all of the syntax and variables mean. In bash, there is also help getopts , which might be informative.

johncip ,Jul 23, 2018 at 15:15

No answer mentions enhanced getopt . And the top-voted answer is misleading: It either ignores -⁠vfd style short options (requested by the OP) or options after positional arguments (also requested by the OP); and it ignores parsing-errors. Instead:

The following calls

myscript -vfd ./foo/bar/someFile -o /fizz/someOtherFile
myscript -v -f -d -o/fizz/someOtherFile -- ./foo/bar/someFile
myscript --verbose --force --debug ./foo/bar/someFile -o/fizz/someOtherFile
myscript --output=/fizz/someOtherFile ./foo/bar/someFile -vfd
myscript ./foo/bar/someFile -df -v --output /fizz/someOtherFile

all return

verbose: y, force: y, debug: y, in: ./foo/bar/someFile, out: /fizz/someOtherFile

with the following myscript

#!/bin/bash
# saner programming env: these switches turn some bugs into errors
set -o errexit -o pipefail -o noclobber -o nounset

# -allow a command to fail with !'s side effect on errexit
# -use return value from ${PIPESTATUS[0]}, because ! hosed $?
! getopt --test > /dev/null 
if [[ ${PIPESTATUS[0]} -ne 4 ]]; then
    echo 'I'm sorry, `getopt --test` failed in this environment.'
    exit 1
fi

OPTIONS=dfo:v
LONGOPTS=debug,force,output:,verbose

# -regarding ! and PIPESTATUS see above
# -temporarily store output to be able to check for errors
# -activate quoting/enhanced mode (e.g. by writing out "--options")
# -pass arguments only via   -- "$@"   to separate them correctly
! PARSED=$(getopt --options=$OPTIONS --longoptions=$LONGOPTS --name "$0" -- "$@")
if [[ ${PIPESTATUS[0]} -ne 0 ]]; then
    # e.g. return value is 1
    #  then getopt has complained about wrong arguments to stdout
    exit 2
fi
# read getopt's output this way to handle the quoting right:
eval set -- "$PARSED"

d=n f=n v=n outFile=-
# now enjoy the options in order and nicely split until we see --
while true; do
    case "$1" in
        -d|--debug)
            d=y
            shift
            ;;
        -f|--force)
            f=y
            shift
            ;;
        -v|--verbose)
            v=y
            shift
            ;;
        -o|--output)
            outFile="$2"
            shift 2
            ;;
        --)
            shift
            break
            ;;
        *)
            echo "Programming error"
            exit 3
            ;;
    esac
done

# handle non-option arguments
if [[ $# -ne 1 ]]; then
    echo "$0: A single input file is required."
    exit 4
fi

echo "verbose: $v, force: $f, debug: $d, in: $1, out: $outFile"

1 enhanced getopt is available on most "bash-systems", including Cygwin; on OS X try brew install gnu-getopt or sudo port install getopt
2 the POSIX exec() conventions have no reliable way to pass binary NULL in command line arguments; those bytes prematurely end the argument
3 first version released in 1997 or before (I only tracked it back to 1997)

Tobias Kienzler ,Mar 19, 2016 at 15:23

from : digitalpeer.com with minor modifications

Usage myscript.sh -p=my_prefix -s=dirname -l=libname

#!/bin/bash
for i in "$@"
do
case $i in
    -p=*|--prefix=*)
    PREFIX="${i#*=}"

    ;;
    -s=*|--searchpath=*)
    SEARCHPATH="${i#*=}"
    ;;
    -l=*|--lib=*)
    DIR="${i#*=}"
    ;;
    --default)
    DEFAULT=YES
    ;;
    *)
            # unknown option
    ;;
esac
done
echo PREFIX = ${PREFIX}
echo SEARCH PATH = ${SEARCHPATH}
echo DIRS = ${DIR}
echo DEFAULT = ${DEFAULT}

To better understand ${i#*=} search for "Substring Removal" in this guide . It is functionally equivalent to `sed 's/[^=]*=//' <<< "$i"` which calls a needless subprocess or `echo "$i" | sed 's/[^=]*=//'` which calls two needless subprocesses.

Robert Siemer ,Jun 1, 2018 at 1:57

getopt() / getopts() is a good option. Stolen from here :

The simple use of "getopt" is shown in this mini-script:

#!/bin/bash
echo "Before getopt"
for i
do
  echo $i
done
args=`getopt abc:d $*`
set -- $args
echo "After getopt"
for i
do
  echo "-->$i"
done

What we have said is that any of -a, -b, -c or -d will be allowed, but that -c is followed by an argument (the "c:" says that).

If we call this "g" and try it out:

bash-2.05a$ ./g -abc foo
Before getopt
-abc
foo
After getopt
-->-a
-->-b
-->-c
-->foo
-->--

We start with two arguments, and "getopt" breaks apart the options and puts each in its own argument. It also added "--".

hfossli ,Jan 31 at 20:05

More succinct way

script.sh

#!/bin/bash

while [[ "$#" -gt 0 ]]; do case $1 in
  -d|--deploy) deploy="$2"; shift;;
  -u|--uglify) uglify=1;;
  *) echo "Unknown parameter passed: $1"; exit 1;;
esac; shift; done

echo "Should deploy? $deploy"
echo "Should uglify? $uglify"

Usage:

./script.sh -d dev -u

# OR:

./script.sh --deploy dev --uglify

bronson ,Apr 27 at 23:22

At the risk of adding another example to ignore, here's my scheme.

Hope it's useful to someone.

while [ "$#" -gt 0 ]; do
  case "$1" in
    -n) name="$2"; shift 2;;
    -p) pidfile="$2"; shift 2;;
    -l) logfile="$2"; shift 2;;

    --name=*) name="${1#*=}"; shift 1;;
    --pidfile=*) pidfile="${1#*=}"; shift 1;;
    --logfile=*) logfile="${1#*=}"; shift 1;;
    --name|--pidfile|--logfile) echo "$1 requires an argument" >&2; exit 1;;

    -*) echo "unknown option: $1" >&2; exit 1;;
    *) handle_argument "$1"; shift 1;;
  esac
done

Robert Siemer ,Jun 6, 2016 at 19:28

I'm about 4 years late to this question, but want to give back. I used the earlier answers as a starting point to tidy up my old adhoc param parsing. I then refactored out the following template code. It handles both long and short params, using = or space separated arguments, as well as multiple short params grouped together. Finally it re-inserts any non-param arguments back into the $1,$2.. variables. I hope it's useful.
#!/usr/bin/env bash

# NOTICE: Uncomment if your script depends on bashisms.
#if [ -z "$BASH_VERSION" ]; then bash $0 $@ ; exit $? ; fi

echo "Before"
for i ; do echo - $i ; done


# Code template for parsing command line parameters using only portable shell
# code, while handling both long and short params, handling '-f file' and
# '-f=file' style param data and also capturing non-parameters to be inserted
# back into the shell positional parameters.

while [ -n "$1" ]; do
        # Copy so we can modify it (can't modify $1)
        OPT="$1"
        # Detect argument termination
        if [ x"$OPT" = x"--" ]; then
                shift
                for OPT ; do
                        REMAINS="$REMAINS \"$OPT\""
                done
                break
        fi
        # Parse current opt
        while [ x"$OPT" != x"-" ] ; do
                case "$OPT" in
                        # Handle --flag=value opts like this
                        -c=* | --config=* )
                                CONFIGFILE="${OPT#*=}"
                                shift
                                ;;
                        # and --flag value opts like this
                        -c* | --config )
                                CONFIGFILE="$2"
                                shift
                                ;;
                        -f* | --force )
                                FORCE=true
                                ;;
                        -r* | --retry )
                                RETRY=true
                                ;;
                        # Anything unknown is recorded for later
                        * )
                                REMAINS="$REMAINS \"$OPT\""
                                break
                                ;;
                esac
                # Check for multiple short options
                # NOTICE: be sure to update this pattern to match valid options
                NEXTOPT="${OPT#-[cfr]}" # try removing single short opt
                if [ x"$OPT" != x"$NEXTOPT" ] ; then
                        OPT="-$NEXTOPT"  # multiple short opts, keep going
                else
                        break  # long form, exit inner loop
                fi
        done
        # Done with that param. move to next
        shift
done
# Set the non-parameters back into the positional parameters ($1 $2 ..)
eval set -- $REMAINS


echo -e "After: \n configfile='$CONFIGFILE' \n force='$FORCE' \n retry='$RETRY' \n remains='$REMAINS'"
for i ; do echo - $i ; done

> ,

I have found the matter to write portable parsing in scripts so frustrating that I have written Argbash - a FOSS code generator that can generate the arguments-parsing code for your script plus it has some nice features:

https://argbash.io

[Aug 29, 2019] shell - An example of how to use getopts in bash - Stack Overflow

The key thing to understand is that getops is just parsing options. You need to shift them as a separate operation:
shift $((OPTIND-1))
May 10, 2013 | stackoverflow.com

An example of how to use getopts in bash Ask Question Asked 6 years, 3 months ago Active 10 months ago Viewed 419k times 288 132

chepner ,May 10, 2013 at 13:42

I want to call myscript file in this way:
$ ./myscript -s 45 -p any_string

or

$ ./myscript -h >>> should display help
$ ./myscript    >>> should display help

My requirements are:

I tried so far this code:

#!/bin/bash
while getopts "h:s:" arg; do
  case $arg in
    h)
      echo "usage" 
      ;;
    s)
      strength=$OPTARG
      echo $strength
      ;;
  esac
done

But with that code I get errors. How to do it with Bash and getopt ?

,

#!/bin/bash

usage() { echo "Usage: $0 [-s <45|90>] [-p <string>]" 1>&2; exit 1; }

while getopts ":s:p:" o; do
    case "${o}" in
        s)
            s=${OPTARG}
            ((s == 45 || s == 90)) || usage
            ;;
        p)
            p=${OPTARG}
            ;;
        *)
            usage
            ;;
    esac
done
shift $((OPTIND-1))

if [ -z "${s}" ] || [ -z "${p}" ]; then
    usage
fi

echo "s = ${s}"
echo "p = ${p}"

Example runs:

$ ./myscript.sh
Usage: ./myscript.sh [-s <45|90>] [-p <string>]

$ ./myscript.sh -h
Usage: ./myscript.sh [-s <45|90>] [-p <string>]

$ ./myscript.sh -s "" -p ""
Usage: ./myscript.sh [-s <45|90>] [-p <string>]

$ ./myscript.sh -s 10 -p foo
Usage: ./myscript.sh [-s <45|90>] [-p <string>]

$ ./myscript.sh -s 45 -p foo
s = 45
p = foo

$ ./myscript.sh -s 90 -p bar
s = 90
p = bar

[Aug 28, 2019] How do I import a perl module outside of @INC that does not end in .pm - Stack Overflow

Aug 22, 2019 | stackoverflow.com


mob ,Aug 22 at 19:47

Background

I am attempting to import a perl module that does not end in .pm with a method similar to this answer :

use lib "/hard/coded/directory"; use scripting;

However, when I attempt to import a module in this way, I get the following error when running perl -c :

Can't locate scripting.pm in @INC (@INC contains: ... ... ... /hard/coded/directory) at name of script line 47.

BEGIN failed--compilation aborted at name of script line 47.

Question

How do I import a perl module outside of @INC that does not have .pm at the end of the file?

ikegami ,Aug 22 at 20:04

If the file has a package directive, the file name and the package directive need to match, so simply fix the file name.

If the file doesn't have a package directive, you don't have a module , and you shouldn't use use or require . This can cause problems.

What you have is sometimes called a library, and you should use do .

do('/hard/coded/directory/scripting')
   or die $@ || $!;

(For proper error checking, the file needs to result in a true value.)

That said, you are probably trying to do something really awful. I'm guessing you're either have a configuration file written in Perl or a poorly written module. Perl is not a suitable choice of language for a configuration file, and avoiding namespaces is just bad programming with no benefit.

ikegami ,Aug 22 at 20:10

If the source file does not define new namespaces or classes and you just want to read the function definitions or data from a file, Perl provides the do and require functions.
do "scripting";
require "scripting";

The difference between them is that require will look for the file to evaluate to a true value (it expects the last statement in the file to resolve to a non-zero, non-empty value), and will emit a fatal error if this does not happen. (You will often see naked 1; statements at the end of modules to satisfy this requirement).

If scripting really contains class code and you do need all the functionality that the use function provides, remember that

use Foo::Bar qw(stuff);

is just syntactic sugar for

BEGIN {
    $file = <find Foo/Bar.pm on @INC>;
    require "$file";
    Foo::Bar->import( qw(stuff) )
}

and suggests how you can workaround your inability to use use :

BEGIN {
    require "scripting";
    scripting->import()
}

In theory, the file scripting might define some other package and begin with a line like package Something::Else; . Then you would load the package in this module with

BEGIN {
    require "scripting";
    Something::Else->import();
}

[Aug 27, 2019] perl defensive programming (die, assert, croak) - Stack Overflow

Aug 27, 2019 | stackoverflow.com

perl defensive programming (die, assert, croak) Ask Question Asked 5 years, 6 months ago Active 5 years, 6 months ago Viewed 645 times 2 0


Zaid ,Feb 23, 2014 at 17:11

What is the best (or recommended) approach to do defensive programming in perl? For example if I have a sub which must be called with a (defined) SCALAR, an ARRAYREF and an optional HASHREF.

Three of the approaches I have seen:

sub test1 {
    die if !(@_ == 2 || @_ == 3);
    my ($scalar, $arrayref, $hashref) = @_;
    die if !defined($scalar) || ref($scalar);
    die if ref($arrayref) ne 'ARRAY';
    die if defined($hashref) && ref($hashref) ne 'HASH';
    #do s.th with scalar, arrayref and hashref
}

sub test2 {
    Carp::assert(@_ == 2 || @_ == 3) if DEBUG;
    my ($scalar, $arrayref, $hashref) = @_;
    if(DEBUG) {
        Carp::assert defined($scalar) && !ref($scalar);
        Carp::assert ref($arrayref) eq 'ARRAY';
        Carp::assert !defined($hashref) || ref($hashref) eq 'HASH';
    }
    #do s.th with scalar, arrayref and hashref
}

sub test3 {
    my ($scalar, $arrayref, $hashref) = @_;
    (@_ == 2 || @_ == 3 && defined($scalar) && !ref($scalar) && ref($arrayref) eq 'ARRAY' && (!defined($hashref) || ref($hashref) eq 'HASH'))
        or Carp::croak 'usage: test3(SCALAR, ARRAYREF, [HASHREF])';
    #do s.th with scalar, arrayref and hashref
}

tobyink ,Feb 23, 2014 at 21:44

use Params::Validate qw(:all);

sub Yada {
   my (...)=validate_pos(@_,{ type=>SCALAR },{ type=>ARRAYREF },{ type=>HASHREF,optional=>1 });
   ...
}

ikegami ,Feb 23, 2014 at 17:33

I wouldn't use any of them. Aside from not not accepting many array and hash references, the checks you used are almost always redundant.
>perl -we"use strict; sub { my ($x) = @_; my $y = $x->[0] }->( 'abc' )"
Can't use string ("abc") as an ARRAY ref nda"strict refs" in use at -e line 1.

>perl -we"use strict; sub { my ($x) = @_; my $y = $x->[0] }->( {} )"
Not an ARRAY reference at -e line 1.

The only advantage to checking is that you can use croak to show the caller in the error message.


Proper way to check if you have an reference to an array:

defined($x) && eval { @$x; 1 }

Proper way to check if you have an reference to a hash:

defined($x) && eval { %$x; 1 }

Borodin ,Feb 23, 2014 at 17:23

None of the options you show display any message to give a reason for the failure, which I think is paramount.

It is also preferable to use croak instead of die from within library subroutines, so that the error is reported from the point of view of the caller.

I would replace all occurrences of if ! with unless . The former is a C programmer's habit.

I suggest something like this

sub test1 {
    croak "Incorrect number of parameters" unless @_ == 2 or @_ == 3;
    my ($scalar, $arrayref, $hashref) = @_;
    croak "Invalid first parameter" unless $scalar and not ref $scalar;
    croak "Invalid second parameter" unless $arrayref eq 'ARRAY';
    croak "Invalid third parameter" if defined $hashref and ref $hashref ne 'HASH';

    # do s.th with scalar, arrayref and hashref
}

[Aug 27, 2019] linux - How to show line number when executing bash script - Stack Overflow

Aug 27, 2019 | stackoverflow.com

How to show line number when executing bash script Ask Question Asked 6 years, 1 month ago Active 1 year, 4 months ago Viewed 47k times 68 31


dspjm ,Jul 23, 2013 at 7:31

I have a test script which has a lot of commands and will generate lots of output, I use set -x or set -v and set -e , so the script would stop when error occurs. However, it's still rather difficult for me to locate which line did the execution stop in order to locate the problem. Is there a method which can output the line number of the script before each line is executed? Or output the line number before the command exhibition generated by set -x ? Or any method which can deal with my script line location problem would be a great help. Thanks.

Suvarna Pattayil ,Jul 28, 2017 at 17:25

You mention that you're already using -x . The variable PS4 denotes the value is the prompt printed before the command line is echoed when the -x option is set and defaults to : followed by space.

You can change PS4 to emit the LINENO (The line number in the script or shell function currently executing).

For example, if your script reads:

$ cat script
foo=10
echo ${foo}
echo $((2 + 2))

Executing it thus would print line numbers:

$ PS4='Line ${LINENO}: ' bash -x script
Line 1: foo=10
Line 2: echo 10
10
Line 3: echo 4
4

http://wiki.bash-hackers.org/scripting/debuggingtips gives the ultimate PS4 that would output everything you will possibly need for tracing:

export PS4='+(${BASH_SOURCE}:${LINENO}): ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'

Deqing ,Jul 23, 2013 at 8:16

In Bash, $LINENO contains the line number where the script currently executing.

If you need to know the line number where the function was called, try $BASH_LINENO . Note that this variable is an array.

For example:

#!/bin/bash       

function log() {
    echo "LINENO: ${LINENO}"
    echo "BASH_LINENO: ${BASH_LINENO[*]}"
}

function foo() {
    log "$@"
}

foo "$@"

See here for details of Bash variables.

Eliran Malka ,Apr 25, 2017 at 10:14

Simple (but powerful) solution: Place echo around the code you think that causes the problem and move the echo line by line until the messages does not appear anymore on screen - because the script has stop because of an error before.

Even more powerful solution: Install bashdb the bash debugger and debug the script line by line

kklepper ,Apr 2, 2018 at 22:44

Workaround for shells without LINENO

In a fairly sophisticated script I wouldn't like to see all line numbers; rather I would like to be in control of the output.

Define a function

echo_line_no () {
    grep -n "$1" $0 |  sed "s/echo_line_no//" 
    # grep the line(s) containing input $1 with line numbers
    # replace the function name with nothing 
} # echo_line_no

Use it with quotes like

echo_line_no "this is a simple comment with a line number"

Output is

16   "this is a simple comment with a line number"

if the number of this line in the source file is 16.

This basically answers the question How to show line number when executing bash script for users of ash or other shells without LINENO .

Anything more to add?

Sure. Why do you need this? How do you work with this? What can you do with this? Is this simple approach really sufficient or useful? Why do you want to tinker with this at all?

Want to know more? Read reflections on debugging

[Aug 27, 2019] How do I get the filename and line number in Perl - Stack Overflow

Aug 27, 2019 | stackoverflow.com

How do I get the filename and line number in Perl? Ask Question Asked 8 years, 10 months ago Active 8 years, 9 months ago Viewed 6k times 6


Elijah ,Nov 1, 2010 at 17:35

I would like to get the current filename and line number within a Perl script. How do I do this?

For example, in a file call test.pl :

my $foo = 'bar';
print 'Hello World';
print functionForFilename() . ':' . functionForLineNo();

It would output:

Hello World
test.pl:3

tchrist ,Nov 2, 2010 at 19:13

These are available with the __LINE__ and __FILE__ tokens, as documented in perldoc perldata under "Special Literals":

The special literals __FILE__, __LINE__, and __PACKAGE__ represent the current filename, line number, and package name at that point in your program. They may be used only as separate tokens; they will not be interpolated into strings. If there is no current package (due to an empty package; directive), __PACKAGE__ is the undefined value.

Eric Strom ,Nov 1, 2010 at 17:41

The caller function will do what you are looking for:
sub print_info {
   my ($package, $filename, $line) = caller;
   ...
}

print_info(); # prints info about this line

This will get the information from where the sub is called, which is probably what you are looking for. The __FILE__ and __LINE__ directives only apply to where they are written, so you can not encapsulate their effect in a subroutine. (unless you wanted a sub that only prints info about where it is defined)

,

You can use:
print __FILE__. " " . __LINE__;

[Aug 26, 2019] bash - How to prevent rm from reporting that a file was not found

Aug 26, 2019 | stackoverflow.com

How to prevent rm from reporting that a file was not found? Ask Question Asked 7 years, 4 months ago Active 1 year, 4 months ago Viewed 101k times 133 19


pizza ,Apr 20, 2012 at 21:29

I am using rm within a BASH script to delete many files. Sometimes the files are not present, so it reports many errors. I do not need this message. I have searched the man page for a command to make rm quiet, but the only option I found is -f , which from the description, "ignore nonexistent files, never prompt", seems to be the right choice, but the name does not seem to fit, so I am concerned it might have unintended consequences.

Keith Thompson ,Dec 19, 2018 at 13:05

The main use of -f is to force the removal of files that would not be removed using rm by itself (as a special case, it "removes" non-existent files, thus suppressing the error message).

You can also just redirect the error message using

$ rm file.txt 2> /dev/null

(or your operating system's equivalent). You can check the value of $? immediately after calling rm to see if a file was actually removed or not.

vimdude ,May 28, 2014 at 18:10

Yes, -f is the most suitable option for this.

tripleee ,Jan 11 at 4:50

-f is the correct flag, but for the test operator, not rm
[ -f "$THEFILE" ] && rm "$THEFILE"

this ensures that the file exists and is a regular file (not a directory, device node etc...)

mahemoff ,Jan 11 at 4:41

\rm -f file will never report not found.

Idelic ,Apr 20, 2012 at 16:51

As far as rm -f doing "anything else", it does force ( -f is shorthand for --force ) silent removal in situations where rm would otherwise ask you for confirmation. For example, when trying to remove a file not writable by you from a directory that is writable by you.

Keith Thompson ,May 28, 2014 at 18:09

I had same issue for cshell. The only solution I had was to create a dummy file that matched pattern before "rm" in my script.

[Aug 26, 2019] shell - rm -rf return codes

Aug 26, 2019 | superuser.com

rm -rf return codes Ask Question Asked 6 years ago Active 6 years ago Viewed 15k times 8 0


SheetJS ,Aug 15, 2013 at 2:50

Any one can let me know the possible return codes for the command rm -rf other than zero i.e, possible return codes for failure cases. I want to know more detailed reason for the failure of the command unlike just the command is failed(return other than 0).

Adrian Frühwirth ,Aug 14, 2013 at 7:00

To see the return code, you can use echo $? in bash.

To see the actual meaning, some platforms (like Debian Linux) have the perror binary available, which can be used as follows:

$ rm -rf something/; perror $?
rm: cannot remove `something/': Permission denied
OS error code   1:  Operation not permitted

rm -rf automatically suppresses most errors. The most likely error you will see is 1 (Operation not permitted), which will happen if you don't have permissions to remove the file. -f intentionally suppresses most errors

Adrian Frühwirth ,Aug 14, 2013 at 7:21

grabbed coreutils from git....

looking at exit we see...

openfly@linux-host:~/coreutils/src $ cat rm.c | grep -i exit
  if (status != EXIT_SUCCESS)
  exit (status);
  /* Since this program exits immediately after calling 'rm', rm need not
  atexit (close_stdin);
          usage (EXIT_FAILURE);
        exit (EXIT_SUCCESS);
          usage (EXIT_FAILURE);
        error (EXIT_FAILURE, errno, _("failed to get attributes of %s"),
        exit (EXIT_SUCCESS);
  exit (status == RM_ERROR ? EXIT_FAILURE : EXIT_SUCCESS);

Now looking at the status variable....

openfly@linux-host:~/coreutils/src $ cat rm.c | grep -i status
usage (int status)
  if (status != EXIT_SUCCESS)
  exit (status);
  enum RM_status status = rm (file, &x);
  assert (VALID_STATUS (status));
  exit (status == RM_ERROR ? EXIT_FAILURE : EXIT_SUCCESS);

looks like there isn't much going on there with the exit status.

I see EXIT_FAILURE and EXIT_SUCCESS and not anything else.

so basically 0 and 1 / -1

To see specific exit() syscalls and how they occur in a process flow try this

openfly@linux-host:~/ $ strace rm -rf $whatever

fairly simple.

ref:

http://www.unix.com/man-page/Linux/EXIT_FAILURE/exit/

[Aug 26, 2019] debugging - How can I debug a Perl script - Stack Overflow

Jun 27, 2014 | stackoverflow.com

Matthew Lock ,Jun 27, 2014 at 1:01

To run your script under perl debugger you should use -d switch:
perl -d script.pl

But perl is flexible. It supply some hooks and you may force debugger to work as you want

So to use different debuggers you may do:

perl -d:DebugHooks::Terminal script.pl
# OR
perl -d:Trepan script.pl

Look these modules here and here

There are several most interesting perl modules that hook into perl debugger internals: Devel::NYTProf , Devel::Cover

And many others

XXX,

If you want to do remote debug (for cgi or if you don't want to mess output with debug command line) use this:

given test:

use v5.14;
say 1;
say 2;
say 3;

Start a listener on whatever host and port on terminal 1 (here localhost:12345):

$ nc -v -l localhost -p 12345

for readline support use rlwrap (you can use on perl -d too):

$ rlwrap nc -v -l localhost -p 12345

And start the test on another terminal (say terminal 2):

$ PERLDB_OPTS="RemotePort=localhost:12345" perl -d test

Input/Output on terminal 1:

Connection from 127.0.0.1:42994

Loading DB routines from perl5db.pl version 1.49
Editor support available.

Enter h or 'h h' for help, or 'man perldebug' for more help.

main::(test:2): say 1;
  DB<1> n
main::(test:3): say 2;
  DB<1> select $DB::OUT

  DB<2> n
2
main::(test:4): say 3;
  DB<2> n
3
Debugged program terminated.  Use q to quit or R to restart,
use o inhibit_exit to avoid stopping after program termination,
h q, h R or h o to get additional info.  
  DB<2>

Output on terminal 2:

1

Note the sentence if you want output on debug terminal

select $DB::OUT

If you are vim user, install this plugin: dbg.vim which provides basic support for perl

[Aug 26, 2019] D>ebugging - How to use the Perl debugger

Aug 26, 2019 | stackoverflow.com
This is like "please can you give me an example how to drive a car" .

I have explained the basic commands that you will use most often. Beyond this you must read the debugger's inline help and reread the perldebug documentation

The debugger will do a lot more than this, but these are the basic commands that you need to know. You should experiment with them and look at the contents of the help text to get more proficient with the Perl debugger

[Aug 25, 2019] How to check if a variable is set in Bash?

Aug 25, 2019 | stackoverflow.com

Ask Question Asked 8 years, 11 months ago Active 2 months ago Viewed 1.1m times 1339 435


Jens ,Jul 15, 2014 at 9:46

How do I know if a variable is set in Bash?

For example, how do I check if the user gave the first parameter to a function?

function a {
    # if $1 is set ?
}

Graeme ,Nov 25, 2016 at 5:07

(Usually) The right way
if [ -z ${var+x} ]; then echo "var is unset"; else echo "var is set to '$var'"; fi

where ${var+x} is a parameter expansion which evaluates to nothing if var is unset, and substitutes the string x otherwise.

Quotes Digression

Quotes can be omitted (so we can say ${var+x} instead of "${var+x}" ) because this syntax & usage guarantees this will only expand to something that does not require quotes (since it either expands to x (which contains no word breaks so it needs no quotes), or to nothing (which results in [ -z ] , which conveniently evaluates to the same value (true) that [ -z "" ] does as well)).

However, while quotes can be safely omitted, and it was not immediately obvious to all (it wasn't even apparent to the first author of this quotes explanation who is also a major Bash coder), it would sometimes be better to write the solution with quotes as [ -z "${var+x}" ] , at the very small possible cost of an O(1) speed penalty. The first author also added this as a comment next to the code using this solution giving the URL to this answer, which now also includes the explanation for why the quotes can be safely omitted.

(Often) The wrong way
if [ -z "$var" ]; then echo "var is blank"; else echo "var is set to '$var'"; fi

This is often wrong because it doesn't distinguish between a variable that is unset and a variable that is set to the empty string. That is to say, if var='' , then the above solution will output "var is blank".

The distinction between unset and "set to the empty string" is essential in situations where the user has to specify an extension, or additional list of properties, and that not specifying them defaults to a non-empty value, whereas specifying the empty string should make the script use an empty extension or list of additional properties.

The distinction may not be essential in every scenario though. In those cases [ -z "$var" ] will be just fine.

Flow ,Nov 26, 2014 at 13:49

To check for non-null/non-zero string variable, i.e. if set, use
if [ -n "$1" ]

It's the opposite of -z . I find myself using -n more than -z .

You would use it like:

if [ -n "$1" ]; then
  echo "You supplied the first parameter!"
else
  echo "First parameter not supplied."
fi

Jens ,Jan 19, 2016 at 23:30

Here's how to test whether a parameter is unset , or empty ("Null") or set with a value :
+--------------------+----------------------+-----------------+-----------------+
|                    |       parameter      |     parameter   |    parameter    |
|                    |   Set and Not Null   |   Set But Null  |      Unset      |
+--------------------+----------------------+-----------------+-----------------+
| ${parameter:-word} | substitute parameter | substitute word | substitute word |
| ${parameter-word}  | substitute parameter | substitute null | substitute word |
| ${parameter:=word} | substitute parameter | assign word     | assign word     |
| ${parameter=word}  | substitute parameter | substitute null | assign word     |
| ${parameter:?word} | substitute parameter | error, exit     | error, exit     |
| ${parameter?word}  | substitute parameter | substitute null | error, exit     |
| ${parameter:+word} | substitute word      | substitute null | substitute null |
| ${parameter+word}  | substitute word      | substitute word | substitute null |
+--------------------+----------------------+-----------------+-----------------+

Source: POSIX: Parameter Expansion :

In all cases shown with "substitute", the expression is replaced with the value shown. In all cases shown with "assign", parameter is assigned that value, which also replaces the expression.

Dan ,Jul 24, 2018 at 20:16

While most of the techniques stated here are correct, bash 4.2 supports an actual test for the presence of a variable ( man bash ), rather than testing the value of the variable.
[[ -v foo ]]; echo $?
# 1

foo=bar
[[ -v foo ]]; echo $?
# 0

foo=""
[[ -v foo ]]; echo $?
# 0

Notably, this approach will not cause an error when used to check for an unset variable in set -u / set -o nounset mode, unlike many other approaches, such as using [ -z .

chepner ,Sep 11, 2013 at 14:22

There are many ways to do this with the following being one of them:
if [ -z "$1" ]

This succeeds if $1 is null or unset

phkoester ,Feb 16, 2018 at 8:06

To see if a variable is nonempty, I use
if [[ $var ]]; then ...       # `$var' expands to a nonempty string

The opposite tests if a variable is either unset or empty:

if [[ ! $var ]]; then ...     # `$var' expands to the empty string (set or not)

To see if a variable is set (empty or nonempty), I use

if [[ ${var+x} ]]; then ...   # `var' exists (empty or nonempty)
if [[ ${1+x} ]]; then ...     # Parameter 1 exists (empty or nonempty)

The opposite tests if a variable is unset:

if [[ ! ${var+x} ]]; then ... # `var' is not set at all
if [[ ! ${1+x} ]]; then ...   # We were called with no arguments

Palec ,Jun 19, 2017 at 3:25

I always find the POSIX table in the other answer slow to grok, so here's my take on it:
   +----------------------+------------+-----------------------+-----------------------+
   |   if VARIABLE is:    |    set     |         empty         |        unset          |
   +----------------------+------------+-----------------------+-----------------------+
 - |  ${VARIABLE-default} | $VARIABLE  |          ""           |       "default"       |
 = |  ${VARIABLE=default} | $VARIABLE  |          ""           | $(VARIABLE="default") |
 ? |  ${VARIABLE?default} | $VARIABLE  |          ""           |       exit 127        |
 + |  ${VARIABLE+default} | "default"  |       "default"       |          ""           |
   +----------------------+------------+-----------------------+-----------------------+
:- | ${VARIABLE:-default} | $VARIABLE  |       "default"       |       "default"       |
:= | ${VARIABLE:=default} | $VARIABLE  | $(VARIABLE="default") | $(VARIABLE="default") |
:? | ${VARIABLE:?default} | $VARIABLE  |       exit 127        |       exit 127        |
:+ | ${VARIABLE:+default} | "default"  |          ""           |          ""           |
   +----------------------+------------+-----------------------+-----------------------+

Note that each group (with and without preceding colon) has the same set and unset cases, so the only thing that differs is how the empty cases are handled.

With the preceding colon, the empty and unset cases are identical, so I would use those where possible (i.e. use := , not just = , because the empty case is inconsistent).

Headings:

Values:

chepner ,Mar 28, 2017 at 12:26

On a modern version of Bash (4.2 or later I think; I don't know for sure), I would try this:
if [ ! -v SOMEVARIABLE ] #note the lack of a $ sigil
then
    echo "Variable is unset"
elif [ -z "$SOMEVARIABLE" ]
then
    echo "Variable is set to an empty string"
else
    echo "Variable is set to some string"
fi

Gordon Davisson ,May 15, 2015 at 13:53

if [ "$1" != "" ]; then
  echo \$1 is set
else
  echo \$1 is not set
fi

Although for arguments it is normally best to test $#, which is the number of arguments, in my opinion.

if [ $# -gt 0 ]; then
  echo \$1 is set
else
  echo \$1 is not set
fi

Jarrod Chesney ,Dec 9, 2016 at 3:34

You want to exit if it's unset

This worked for me. I wanted my script to exit with an error message if a parameter wasn't set.

#!/usr/bin/env bash

set -o errexit

# Get the value and empty validation check all in one
VER="${1:?You must pass a version of the format 0.0.0 as the only argument}"

This returns with an error when it's run

peek@peek:~$ ./setver.sh
./setver.sh: line 13: 1: You must pass a version of the format 0.0.0 as the only argument
Check only, no exit - Empty and Unset are INVALID

Try this option if you just want to check if the value set=VALID or unset/empty=INVALID.

TSET="good val"
TEMPTY=""
unset TUNSET

if [ "${TSET:-}" ]; then echo "VALID"; else echo "INVALID";fi
# VALID
if [ "${TEMPTY:-}" ]; then echo "VALID"; else echo "INVALID";fi
# INVALID
if [ "${TUNSET:-}" ]; then echo "VALID"; else echo "INVALID";fi
# INVALID

Or, Even short tests ;-)

[ "${TSET:-}"   ] && echo "VALID" || echo "INVALID"
[ "${TEMPTY:-}" ] && echo "VALID" || echo "INVALID"
[ "${TUNSET:-}" ] && echo "VALID" || echo "INVALID"
Check only, no exit - Only empty is INVALID

And this is the answer to the question. Use this if you just want to check if the value set/empty=VALID or unset=INVALID.

NOTE, the "1" in "..-1}" is insignificant, it can be anything (like x)

TSET="good val"
TEMPTY=""
unset TUNSET

if [ "${TSET+1}" ]; then echo "VALID"; else echo "INVALID";fi
# VALID
if [ "${TEMPTY+1}" ]; then echo "VALID"; else echo "INVALID";fi
# VALID
if [ "${TUNSET+1}" ]; then echo "VALID"; else echo "INVALID";fi
# INVALID

Short tests

[ "${TSET+1}"   ] && echo "VALID" || echo "INVALID"
[ "${TEMPTY+1}" ] && echo "VALID" || echo "INVALID"
[ "${TUNSET+1}" ] && echo "VALID" || echo "INVALID"

I dedicate this answer to @mklement0 (comments) who challenged me to answer the question accurately.

Reference http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02

Gilles ,Aug 31, 2010 at 7:30

To check whether a variable is set with a non-empty value, use [ -n "$x" ] , as others have already indicated.

Most of the time, it's a good idea to treat a variable that has an empty value in the same way as a variable that is unset. But you can distinguish the two if you need to: [ -n "${x+set}" ] ( "${x+set}" expands to set if x is set and to the empty string if x is unset).

To check whether a parameter has been passed, test $# , which is the number of parameters passed to the function (or to the script, when not in a function) (see Paul's answer ).

tripleee ,Sep 12, 2015 at 6:33

Read the "Parameter Expansion" section of the bash man page. Parameter expansion doesn't provide a general test for a variable being set, but there are several things you can do to a parameter if it isn't set.

For example:

function a {
    first_arg=${1-foo}
    # rest of the function
}

will set first_arg equal to $1 if it is assigned, otherwise it uses the value "foo". If a absolutely must take a single parameter, and no good default exists, you can exit with an error message when no parameter is given:

function a {
    : ${1?a must take a single argument}
    # rest of the function
}

(Note the use of : as a null command, which just expands the values of its arguments. We don't want to do anything with $1 in this example, just exit if it isn't set)

AlejandroVD ,Feb 8, 2016 at 13:31

In bash you can use -v inside the [[ ]] builtin:
#! /bin/bash -u

if [[ ! -v SOMEVAR ]]; then
    SOMEVAR='hello'
fi

echo $SOMEVAR

Palec ,Nov 16, 2016 at 15:01

For those that are looking to check for unset or empty when in a script with set -u :
if [ -z "${var-}" ]; then
   echo "Must provide var environment variable. Exiting...."
   exit 1
fi

The regular [ -z "$var" ] check will fail with var; unbound variable if set -u but [ -z "${var-}" ] expands to empty string if var is unset without failing.

user1387866 ,Jul 30 at 15:57

Note

I'm giving a heavily Bash-focused answer because of the bash tag.

Short answer

As long as you're only dealing with named variables in Bash, this function should always tell you if the variable has been set, even if it's an empty array.

is-variable-set() {
    declare -p $1 &>dev/null
}
Why this works

In Bash (at least as far back as 3.0), if var is a declared/set variable, then declare -p var outputs a declare command that would set variable var to whatever its current type and value are, and returns status code 0 (success). If var is undeclared, then declare -p var outputs an error message to stderr and returns status code 1 . Using &>/dev/null , redirects both regular stdout and stderr output to /dev/null , never to be seen, and without changing the status code. Thus the function only returns the status code.

Why other methods (sometimes) fail in Bash
  • [ -n "$var" ] : This only checks if ${var[0]} is nonempty. (In Bash, $var is the same as ${var[0]} .)
  • [ -n "${var+x}" ] : This only checks if ${var[0]} is set.
  • [ "${#var[@]}" != 0 ] : This only checks if at least one index of $var is set.
When this method fails in Bash

This only works for named variables (including $_ ), not certain special variables ( $! , $@ , $# , $$ , $* , $? , $- , $0 , $1 , $2 , ..., and any I may have forgotten). Since none of these are arrays, the POSIX-style [ -n "${var+x}" ] works for all of these special variables. But beware of wrapping it in a function since many special variables change values/existence when functions are called.

Shell compatibility note

If your script has arrays and you're trying to make it compatible with as many shells as possible, then consider using typeset -p instead of declare -p . I've read that ksh only supports the former, but haven't been able to test this. I do know that Bash 3.0+ and Zsh 5.5.1 each support both typeset -p and declare -p , differing only in which one is an alternative for the other. But I haven't tested differences beyond those two keywords, and I haven't tested other shells.

If you need your script to be POSIX sh compatible, then you can't use arrays. Without arrays, [ -n "{$var+x}" ] works.

Comparison code for different methods in Bash

This function unsets variable var , eval s the passed code, runs tests to determine if var is set by the eval d code, and finally shows the resulting status codes for the different tests.

I'm skipping test -v var , [ -v var ] , and [[ -v var ]] because they yield identical results to the POSIX standard [ -n "${var+x}" ] , while requiring Bash 4.2+. I'm also skipping typeset -p because it's the same as declare -p in the shells I've tested (Bash 3.0 thru 5.0, and Zsh 5.5.1).

is-var-set-after() {
    # Set var by passed expression.
    unset var
    eval "$1"

    # Run the tests, in increasing order of accuracy.
    [ -n "$var" ] # (index 0 of) var is nonempty
    nonempty=$?
    [ -n "${var+x}" ] # (index 0 of) var is set, maybe empty
    plus=$?
    [ "${#var[@]}" != 0 ] # var has at least one index set, maybe empty
    count=$?
    declare -p var &>/dev/null # var has been declared (any type)
    declared=$?

    # Show test results.
    printf '%30s: %2s %2s %2s %2s\n' "$1" $nonempty $plus $count $declared
}
Test case code

Note that test results may be unexpected due to Bash treating non-numeric array indices as "0" if the variable hasn't been declared as an associative array. Also, associative arrays are only valid in Bash 4.0+.

# Header.
printf '%30s: %2s %2s %2s %2s\n' "test" '-n' '+x' '#@' '-p'
# First 5 tests: Equivalent to setting 'var=foo' because index 0 of an
# indexed array is also the nonindexed value, and non-numerical
# indices in an array not declared as associative are the same as
# index 0.
is-var-set-after "var=foo"                        #  0  0  0  0
is-var-set-after "var=(foo)"                      #  0  0  0  0
is-var-set-after "var=([0]=foo)"                  #  0  0  0  0
is-var-set-after "var=([x]=foo)"                  #  0  0  0  0
is-var-set-after "var=([y]=bar [x]=foo)"          #  0  0  0  0
# '[ -n "$var" ]' fails when var is empty.
is-var-set-after "var=''"                         #  1  0  0  0
is-var-set-after "var=([0]='')"                   #  1  0  0  0
# Indices other than 0 are not detected by '[ -n "$var" ]' or by
# '[ -n "${var+x}" ]'.
is-var-set-after "var=([1]='')"                   #  1  1  0  0
is-var-set-after "var=([1]=foo)"                  #  1  1  0  0
is-var-set-after "declare -A var; var=([x]=foo)"  #  1  1  0  0
# Empty arrays are only detected by 'declare -p'.
is-var-set-after "var=()"                         #  1  1  1  0
is-var-set-after "declare -a var"                 #  1  1  1  0
is-var-set-after "declare -A var"                 #  1  1  1  0
# If 'var' is unset, then it even fails the 'declare -p var' test.
is-var-set-after "unset var"                      #  1  1  1  1
Test output

The test mnemonics in the header row correspond to [ -n "$var" ] , [ -n "${var+x}" ] , [ "${#var[@]}" != 0 ] , and declare -p var , respectively.

                         test: -n +x #@ -p
                      var=foo:  0  0  0  0
                    var=(foo):  0  0  0  0
                var=([0]=foo):  0  0  0  0
                var=([x]=foo):  0  0  0  0
        var=([y]=bar [x]=foo):  0  0  0  0
                       var='':  1  0  0  0
                 var=([0]=''):  1  0  0  0
                 var=([1]=''):  1  1  0  0
                var=([1]=foo):  1  1  0  0
declare -A var; var=([x]=foo):  1  1  0  0
                       var=():  1  1  1  0
               declare -a var:  1  1  1  0
               declare -A var:  1  1  1  0
                    unset var:  1  1  1  1
Summary
  • declare -p var &>/dev/null is (100%?) reliable for testing named variables in Bash since at least 3.0.
  • [ -n "${var+x}" ] is reliable in POSIX compliant situations, but cannot handle arrays.
  • Other tests exist for checking if a variable is nonempty, and for checking for declared variables in other shells. But these tests are suited for neither Bash nor POSIX scripts.

Peregring-lk ,Oct 18, 2014 at 22:09

Using [[ -z "$var" ]] is the easiest way to know if a variable was set or not, but that option -z doesn't distinguish between an unset variable and a variable set to an empty string:
$ set=''
$ [[ -z "$set" ]] && echo "Set" || echo "Unset" 
Unset
$ [[ -z "$unset" ]] && echo "Set" || echo "Unset"
Unset

It's best to check it according to the type of variable: env variable, parameter or regular variable.

For a env variable:

[[ $(env | grep "varname=" | wc -l) -eq 1 ]] && echo "Set" || echo "Unset"

For a parameter (for example, to check existence of parameter $5 ):

[[ $# -ge 5 ]] && echo "Set" || echo "Unset"

For a regular variable (using an auxiliary function, to do it in an elegant way):

function declare_var {
   declare -p "$1" &> /dev/null
}
declare_var "var_name" && echo "Set" || echo "Unset"

Notes:

  • $# : gives you the number of positional parameters.
  • declare -p : gives you the definition of the variable passed as a parameter. If it exists, returns 0, if not, returns 1 and prints an error message.
  • &> /dev/null : suppresses output from declare -p without affecting its return code.

Dennis Williamson ,Nov 27, 2013 at 20:56

You can do:
function a {
        if [ ! -z "$1" ]; then
                echo '$1 is set'
        fi
}

LavaScornedOven ,May 11, 2017 at 13:14

The answers above do not work when Bash option set -u is enabled. Also, they are not dynamic, e.g., how to test is variable with name "dummy" is defined? Try this:
is_var_defined()
{
    if [ $# -ne 1 ]
    then
        echo "Expected exactly one argument: variable name as string, e.g., 'my_var'"
        exit 1
    fi
    # Tricky.  Since Bash option 'set -u' may be enabled, we cannot directly test if a variable
    # is defined with this construct: [ ! -z "$var" ].  Instead, we must use default value
    # substitution with this construct: [ ! -z "${var:-}" ].  Normally, a default value follows the
    # operator ':-', but here we leave it blank for empty (null) string.  Finally, we need to
    # substitute the text from $1 as 'var'.  This is not allowed directly in Bash with this
    # construct: [ ! -z "${$1:-}" ].  We need to use indirection with eval operator.
    # Example: $1="var"
    # Expansion for eval operator: "[ ! -z \${$1:-} ]" -> "[ ! -z \${var:-} ]"
    # Code  execute: [ ! -z ${var:-} ]
    eval "[ ! -z \${$1:-} ]"
    return $?  # Pedantic.
}

Related: In Bash, how do I test if a variable is defined in "-u" mode

Aquarius Power ,Nov 15, 2014 at 17:55

My prefered way is this:
$var=10
$if ! ${var+false};then echo "is set";else echo "NOT set";fi
is set
$unset var
$if ! ${var+false};then echo "is set";else echo "NOT set";fi
NOT set

So basically, if a variable is set, it becomes "a negation of the resulting false " (what will be true = "is set").

And, if it is unset, it will become "a negation of the resulting true " (as the empty result evaluates to true ) (so will end as being false = "NOT set").

kenorb ,Sep 22, 2014 at 13:57

In a shell you can use the -z operator which is True if the length of string is zero.

A simple one-liner to set default MY_VAR if it's not set, otherwise optionally you can display the message:

[[ -z "$MY_VAR" ]] && MY_VAR="default"
[[ -z "$MY_VAR" ]] && MY_VAR="default" || echo "Variable already set."

Zlatan ,Nov 20, 2013 at 18:53

if [[ ${1:+isset} ]]
then echo "It was set and not null." >&2
else echo "It was not set or it was null." >&2
fi

if [[ ${1+isset} ]]
then echo "It was set but might be null." >&2
else echo "It was was not set." >&2
fi

solidsnack ,Nov 30, 2013 at 16:47

I found a (much) better code to do this if you want to check for anything in $@ .
if [[ $1 = "" ]]
then
  echo '$1 is blank'
else
  echo '$1 is filled up'
fi

Why this all? Everything in $@ exists in Bash, but by default it's blank, so test -z and test -n couldn't help you.

Update: You can also count number of characters in a parameters.

if [ ${#1} = 0 ]
then
  echo '$1 is blank'
else
  echo '$1 is filled up'
fi

Steven Penny ,May 11, 2014 at 4:59

[[ $foo ]]

Or

(( ${#foo} ))

Or

let ${#foo}

Or

declare -p foo

Celeo ,Feb 11, 2015 at 20:58

if [[ ${!xx[@]} ]] ; then echo xx is defined; fi

HelloGoodbye ,Nov 29, 2013 at 22:41

I always use this one, based on the fact that it seems easy to be understood by anybody who sees the code for the very first time:
if [ "$variable" = "" ]
    then
    echo "Variable X is empty"
fi

And, if wanting to check if not empty;

if [ ! "$variable" = "" ]
    then
    echo "Variable X is not empty"
fi

That's it.

fr00tyl00p ,Nov 29, 2015 at 20:26

This is what I use every day:
#
# Check if a variable is set
#   param1  name of the variable
#
function is_set()
{
    [[ -n "${1}" ]] && test -n "$(eval "echo "\${${1}+x}"")"
}

This works well under Linux and Solaris down to bash 3.0.

bash-3.00$ myvar="TEST"
bash-3.00$ is_set myvar ; echo $?
0
bash-3.00$ mavar=""
bash-3.00$ is_set myvar ; echo $?
0
bash-3.00$ unset myvar
bash-3.00$ is_set myvar ; echo $?
1

Daniel S ,Mar 1, 2016 at 13:12

I like auxiliary functions to hide the crude details of bash. In this case, doing so adds even more (hidden) crudeness:
# The first ! negates the result (can't use -n to achieve this)
# the second ! expands the content of varname (can't do ${$varname})
function IsDeclared_Tricky
{
  local varname="$1"
  ! [ -z ${!varname+x} ]
}

Because I first had bugs in this implementation (inspired by the answers of Jens and Lionel), I came up with a different solution:

# Ask for the properties of the variable - fails if not declared
function IsDeclared()
{
  declare -p $1 &>/dev/null
}

I find it to be more straight-forward, more bashy and easier to understand/remember. Test case shows it is equivalent:

function main()
{
  declare -i xyz
  local foo
  local bar=
  local baz=''

  IsDeclared_Tricky xyz; echo "IsDeclared_Tricky xyz: $?"
  IsDeclared_Tricky foo; echo "IsDeclared_Tricky foo: $?"
  IsDeclared_Tricky bar; echo "IsDeclared_Tricky bar: $?"
  IsDeclared_Tricky baz; echo "IsDeclared_Tricky baz: $?"

  IsDeclared xyz; echo "IsDeclared xyz: $?"
  IsDeclared foo; echo "IsDeclared foo: $?"
  IsDeclared bar; echo "IsDeclared bar: $?"
  IsDeclared baz; echo "IsDeclared baz: $?"
}

main

The test case also shows that local var does NOT declare var (unless followed by '='). For quite some time I thought i declared variables this way, just to discover now that i merely expressed my intention... It's a no-op, i guess.

IsDeclared_Tricky xyz: 1
IsDeclared_Tricky foo: 1
IsDeclared_Tricky bar: 0
IsDeclared_Tricky baz: 0
IsDeclared xyz: 1
IsDeclared foo: 1
IsDeclared bar: 0
IsDeclared baz: 0

BONUS: usecase

I mostly use this test to give (and return) parameters to functions in a somewhat "elegant" and safe way (almost resembling an interface...):

#auxiliary functions
function die()
{
  echo "Error: $1"; exit 1
}

function assertVariableDeclared()
{
  IsDeclared "$1" || die "variable not declared: $1"
}

function expectVariables()
{
  while (( $# > 0 )); do
    assertVariableDeclared $1; shift
  done
}

# actual example
function exampleFunction()
{
  expectVariables inputStr outputStr
  outputStr="$inputStr world!"
}

function bonus()
{
  local inputStr='Hello'
  local outputStr= # remove this to trigger error
  exampleFunction
  echo $outputStr
}

bonus

If called with all requires variables declared:

Hello world!

else:

Error: variable not declared: outputStr

Hatem Jaber ,Jun 13 at 12:08

After skimming all the answers, this also works:
if [[ -z $SOME_VAR ]]; then read -p "Enter a value for SOME_VAR: " SOME_VAR; fi
echo "SOME_VAR=$SOME_VAR"

if you don't put SOME_VAR instead of what I have $SOME_VAR , it will set it to an empty value; $ is necessary for this to work.

Keith Thompson ,Aug 5, 2013 at 19:10

If you wish to test that a variable is bound or unbound, this works well, even after you've turned on the nounset option:
set -o noun set

if printenv variableName >/dev/null; then
    # variable is bound to a value
else
    # variable is unbound
fi

> ,Jan 30 at 18:23

Functions to check if variable is declared/unset including empty $array=()


The following functions test if the given name exists as a variable

# The first parameter needs to be the name of the variable to be checked.
# (See example below)

var_is_declared() {
    { [[ -n ${!1+anything} ]] || declare -p $1 &>/dev/null;}
}

var_is_unset() {
    { [[ -z ${!1+anything} ]] && ! declare -p $1 &>/dev/null;} 
}

This functions would test as showed in the following conditions:

a;       # is not declared
a=;      # is declared
a="foo"; # is declared
a=();    # is declared
a=("");  # is declared
unset a; # is not declared

a;       # is unset
a=;      # is not unset
a="foo"; # is not unset
a=();    # is not unset
a=("");  # is not unset
unset a; # is unset

.

For more details

and a test script see my answer to the question "How do I check if a variable exists in bash?" .

Remark: The similar usage of declare -p , as it is also shown by Peregring-lk 's answer , is truly coincidental. Otherwise I would of course have credited it!

[Aug 20, 2019] Is it possible to insert separator in midnight commander menu?

Jun 07, 2010 | superuser.com

Ask Question Asked 9 years, 2 months ago Active 7 years, 10 months ago Viewed 363 times 2

okutane ,Jun 7, 2010 at 3:36

I want to insert some items into mc menu (which is opened by F2) grouped together. Is it possible to insert some sort of separator before them or put them into some submenu?
Probably, not.
The format of the menu file is very simple. Lines that start with anything but
space or tab are considered entries for the menu (in order to be able to use
it like a hot key, the first character should be a letter). All the lines that
start with a space or a tab are the commands that will be executed when the
entry is selected.

But MC allows you to make multiple menu entries with same shortcut and title, so you can make a menu entry that looks like separator and does nothing, like:

a hello
  echo world
- --------
b world
  echo hello
- --------
c superuser
  ls /

This will look like:

[Aug 20, 2019] Midnight Commander, using date in User menu

Dec 31, 2013 | unix.stackexchange.com

user2013619 ,Dec 31, 2013 at 0:43

I would like to use MC (midnight commander) to compress the selected dir with date in its name, e.g: dirname_20131231.tar.gz

The command in the User menu is :

tar -czf dirname_`date '+%Y%m%d'`.tar.gz %d

The archive is missing because %m , and %d has another meaning in MC. I made an alias for the date, but it also doesn't work.

Does anybody solved this problem ever?

John1024 ,Dec 31, 2013 at 1:06

To escape the percent signs, double them:
tar -czf dirname_$(date '+%%Y%%m%%d').tar.gz %d

The above would compress the current directory (%d) to a file also in the current directory. If you want to compress the directory pointed to by the cursor rather than the current directory, use %f instead:

tar -czf %f_$(date '+%%Y%%m%%d').tar.gz %f

mc handles escaping of special characters so there is no need to put %f in quotes.

By the way, midnight commander's special treatment of percent signs occurs not just in the user menu file but also at the command line. This is an issue when using shell commands with constructs like ${var%.c} . At the command line, the same as in the user menu file, percent signs can be escaped by doubling them.

[Aug 19, 2019] mc - Is there are any documentation about user-defined menu in midnight-commander - Unix Linux Stack Exchange

Aug 19, 2019 | unix.stackexchange.com

Is there are any documentation about user-defined menu in midnight-commander? Ask Question Asked 5 years, 2 months ago Active 1 year, 2 months ago Viewed 3k times 6 2


login ,Jun 11, 2014 at 13:13

I'd like to create my own user-defined menu for mc ( menu file). I see some lines like
+ t r & ! t t

or

+ t t

What does it mean?

goldilocks ,Jun 11, 2014 at 13:35

It is documented in the help, the node is "Edit Menu File" under "Command Menu"; if you scroll down you should find "Addition Conditions":

If the condition begins with '+' (or '+?') instead of '=' (or '=?') it is an addition condition. If the condition is true the menu entry will be included in the menu. If the condition is false the menu entry will not be included in the menu.

This is preceded by "Default conditions" (the = condition), which determine which entry will be highlighted as the default choice when the menu appears. Anyway, by way of example:

+ t r & ! t t

t r means if this is a regular file ("t(ype) r"), and ! t t means if the file has not been tagged in the interface.

Jarek

On top what has been written above, this page can be browsed in the Internet, when searching for man pages, e.g.: https://www.systutorials.com/docs/linux/man/1-mc/

Search for "Menu File Edit" .

Best regards, Jarek

[Aug 14, 2019] bash - PID background process - Unix Linux Stack Exchange

Aug 14, 2019 | unix.stackexchange.com

PID background process Ask Question Asked 2 years, 8 months ago Active 2 years, 8 months ago Viewed 2k times 2


Raul ,Nov 27, 2016 at 18:21

As I understand pipes and commands, bash takes each command, spawns a process for each one and connects stdout of the previous one with the stdin of the next one.

For example, in "ls -lsa | grep feb", bash will create two processes, and connect the output of "ls -lsa" to the input of "grep feb".

When you execute a background command like "sleep 30 &" in bash, you get the pid of the background process running your command. Surprisingly for me, when I wrote "ls -lsa | grep feb &" bash returned only one PID.

How should this be interpreted? A process runs both "ls -lsa" and "grep feb"? Several process are created but I only get the pid of one of them?

Raul ,Nov 27, 2016 at 19:21

Spawns 2 processes. The & displays the PID of the second process. Example below.
$ echo $$
13358
$ sleep 100 | sleep 200 &
[1] 13405
$ ps -ef|grep 13358
ec2-user 13358 13357  0 19:02 pts/0    00:00:00 -bash
ec2-user 13404 13358  0 19:04 pts/0    00:00:00 sleep 100
ec2-user 13405 13358  0 19:04 pts/0    00:00:00 sleep 200
ec2-user 13406 13358  0 19:04 pts/0    00:00:00 ps -ef
ec2-user 13407 13358  0 19:04 pts/0    00:00:00 grep --color=auto 13358
$

> ,

When you run a job in the background, bash prints the process ID of its subprocess, the one that runs the command in that job. If that job happens to create more subprocesses, that's none of the parent shell's business.

When the background job is a pipeline (i.e. the command is of the form something1 | something2 & , and not e.g. { something1 | something2; } & ), there's an optimization which is strongly suggested by POSIX and performed by most shells including bash: each of the elements of the pipeline are executed directly as subprocesses of the original shell. What POSIX mandates is that the variable $! is set to the last command in the pipeline in this case. In most shells, that last command is a subprocess of the original process, and so are the other commands in the pipeline.

When you run ls -lsa | grep feb , there are three processes involved: the one that runs the left-hand side of the pipe (a subshell that finishes setting up the pipe then executes ls ), the one that runs the right-hand side of the pipe (a subshell that finishes setting up the pipe then executes grep ), and the original process that waits for the pipe to finish.

You can watch what happens by tracing the processes:

$ strace -f -e clone,wait4,pipe,execve,setpgid bash --norc
execve("/usr/local/bin/bash", ["bash", "--norc"], [/* 82 vars */]) = 0
setpgid(0, 24084)                       = 0
bash-4.3$ sleep 10 | sleep 20 &

Note how the second sleep is reported and stored as $! , but the process group ID is the first sleep . Dash has the same oddity, ksh and mksh don't.

[Aug 14, 2019] unix - How to get PID of process by specifying process name and store it in a variable to use further - Stack Overflow

Aug 14, 2019 | stackoverflow.com

Nidhi ,Nov 28, 2014 at 0:54

pids=$(pgrep <name>)

will get you the pids of all processes with the given name. To kill them all, use

kill -9 $pids

To refrain from using a variable and directly kill all processes with a given name issue

pkill -9 <name>

panticz.de ,Nov 11, 2016 at 10:11

On a single line...
pgrep -f process_name | xargs kill -9

flazzarini ,Jun 13, 2014 at 9:54

Another possibility would be to use pidof it usually comes with most distributions. It will return you the PID of a given process by using it's name.
pidof process_name

This way you could store that information in a variable and execute kill -9 on it.

#!/bin/bash
pid=`pidof process_name`
kill -9 $pid

Pawel K ,Dec 20, 2017 at 10:27

use grep [n]ame to remove that grep -v name this is first... Sec using xargs in the way how it is up there is wrong to rnu whatever it is piped you have to use -i ( interactive mode) otherwise you may have issues with the command.

ps axf | grep | grep -v grep | awk '{print "kill -9 " $1}' ? ps aux |grep [n]ame | awk '{print "kill -9 " $2}' ? isnt that better ?

[Aug 14, 2019] linux - How to get PID of background process - Stack Overflow

Highly recommended!
Aug 14, 2019 | stackoverflow.com

How to get PID of background process? Ask Question Asked 9 years, 8 months ago Active 7 months ago Viewed 238k times 336 64


pixelbeat ,Mar 20, 2013 at 9:11

I start a background process from my shell script, and I would like to kill this process when my script finishes.

How to get the PID of this process from my shell script? As far as I can see variable $! contains the PID of the current script, not the background process.

WiSaGaN ,Jun 2, 2015 at 14:40

You need to save the PID of the background process at the time you start it:
foo &
FOO_PID=$!
# do other stuff
kill $FOO_PID

You cannot use job control, since that is an interactive feature and tied to a controlling terminal. A script will not necessarily have a terminal attached at all so job control will not necessarily be available.

Phil ,Dec 2, 2017 at 8:01

You can use the jobs -l command to get to a particular jobL
^Z
[1]+  Stopped                 guard

my_mac:workspace r$ jobs -l
[1]+ 46841 Suspended: 18           guard

In this case, 46841 is the PID.

From help jobs :

-l Report the process group ID and working directory of the jobs.

jobs -p is another option which shows just the PIDs.

Timo ,Dec 2, 2017 at 8:03

Here's a sample transcript from a bash session ( %1 refers to the ordinal number of background process as seen from jobs ):

$ echo $$
3748

$ sleep 100 &
[1] 192

$ echo $!
192

$ kill %1

[1]+  Terminated              sleep 100

lepe ,Dec 2, 2017 at 8:29

An even simpler way to kill all child process of a bash script:
pkill -P $$

The -P flag works the same way with pkill and pgrep - it gets child processes, only with pkill the child processes get killed and with pgrep child PIDs are printed to stdout.

Luis Ramirez ,Feb 20, 2013 at 23:11

this is what I have done. Check it out, hope it can help.
#!/bin/bash
#
# So something to show.
echo "UNO" >  UNO.txt
echo "DOS" >  DOS.txt
#
# Initialize Pid List
dPidLst=""
#
# Generate background processes
tail -f UNO.txt&
dPidLst="$dPidLst $!"
tail -f DOS.txt&
dPidLst="$dPidLst $!"
#
# Report process IDs
echo PID=$$
echo dPidLst=$dPidLst
#
# Show process on current shell
ps -f
#
# Start killing background processes from list
for dPid in $dPidLst
do
        echo killing $dPid. Process is still there.
        ps | grep $dPid
        kill $dPid
        ps | grep $dPid
        echo Just ran "'"ps"'" command, $dPid must not show again.
done

Then just run it as: ./bgkill.sh with proper permissions of course

root@umsstd22 [P]:~# ./bgkill.sh
PID=23757
dPidLst= 23758 23759
UNO
DOS
UID        PID  PPID  C STIME TTY          TIME CMD
root      3937  3935  0 11:07 pts/5    00:00:00 -bash
root     23757  3937  0 11:55 pts/5    00:00:00 /bin/bash ./bgkill.sh
root     23758 23757  0 11:55 pts/5    00:00:00 tail -f UNO.txt
root     23759 23757  0 11:55 pts/5    00:00:00 tail -f DOS.txt
root     23760 23757  0 11:55 pts/5    00:00:00 ps -f
killing 23758. Process is still there.
23758 pts/5    00:00:00 tail
./bgkill.sh: line 24: 23758 Terminated              tail -f UNO.txt
Just ran 'ps' command, 23758 must not show again.
killing 23759. Process is still there.
23759 pts/5    00:00:00 tail
./bgkill.sh: line 24: 23759 Terminated              tail -f DOS.txt
Just ran 'ps' command, 23759 must not show again.
root@umsstd22 [P]:~# ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
root      3937  3935  0 11:07 pts/5    00:00:00 -bash
root     24200  3937  0 11:56 pts/5    00:00:00 ps -f

Phil ,Oct 15, 2013 at 18:22

You might also be able to use pstree:
pstree -p user

This typically gives a text representation of all the processes for the "user" and the -p option gives the process-id. It does not depend, as far as I understand, on having the processes be owned by the current shell. It also shows forks.

Phil ,Dec 4, 2018 at 9:46

pgrep can get you all of the child PIDs of a parent process. As mentioned earlier $$ is the current scripts PID. So, if you want a script that cleans up after itself, this should do the trick:
trap 'kill $( pgrep -P $$ | tr "\n" " " )' SIGINT SIGTERM EXIT

[Aug 10, 2019] How to check the file size in Linux-Unix bash shell scripting by Vivek Gite

Aug 10, 2019 | www.cyberciti.biz

The stat command shows information about the file. The syntax is as follows to get the file size on GNU/Linux stat:

stat -c %s "/etc/passwd"

OR

stat --format=%s "/etc/passwd"

[Aug 10, 2019] bash - How to check size of a file - Stack Overflow

Aug 10, 2019 | stackoverflow.com

[ -n file.txt ] doesn't check its size , it checks that the string file.txt is non-zero length, so it will always succeed.

If you want to say " size is non-zero", you need [ -s file.txt ] .

To get a file's size , you can use wc -c to get the size ( file length) in bytes:

file=file.txt
minimumsize=90000
actualsize=$(wc -c <"$file")
if [ $actualsize -ge $minimumsize ]; then
    echo size is over $minimumsize bytes
else
    echo size is under $minimumsize bytes
fi

In this case, it sounds like that's what you want.

But FYI, if you want to know how much disk space the file is using, you could use du -k to get the size (disk space used) in kilobytes:

file=file.txt
minimumsize=90
actualsize=$(du -k "$file" | cut -f 1)
if [ $actualsize -ge $minimumsize ]; then
    echo size is over $minimumsize kilobytes
else
    echo size is under $minimumsize kilobytes
fi

If you need more control over the output format, you can also look at stat . On Linux, you'd start with something like stat -c '%s' file.txt , and on BSD/Mac OS X, something like stat -f '%z' file.txt .

--Mikel

On Linux, you'd start with something like stat -c '%s' file.txt , and on BSD/Mac OS X, something like stat -f '%z' file.txt .

Oz Solomon ,Jun 13, 2014 at 21:44

It surprises me that no one mentioned stat to check file size. Some methods are definitely better: using -s to find out whether the file is empty or not is easier than anything else if that's all you want. And if you want to find files of a size, then find is certainly the way to go.

I also like du a lot to get file size in kb, but, for bytes, I'd use stat :

size=$(stat -f%z $filename) # BSD stat

size=$(stat -c%s $filename) # GNU stat?
alternative solution with awk and double parenthesis:
FILENAME=file.txt
SIZE=$(du -sb $FILENAME | awk '{ print $1 }')

if ((SIZE<90000)) ; then 
    echo "less"; 
else 
    echo "not less"; 
fi

[Aug 10, 2019] command line - How do I add file and directory comparision option to mc user menu - Ask Ubuntu

Aug 10, 2019 | askubuntu.com

How do I add file and directory comparision option to mc user menu? Ask Question Asked 7 years, 4 months ago Active 7 years, 3 months ago Viewed 664 times 0

sorin ,Mar 30, 2012 at 8:57

I want to add Beyond Compare diff to mc (midnight commmander) user menu.

All I know is that I need to add my custom command to ~/.mc/menu but I have no idea about the syntax to use.

I want to be able to compare two files from the two panes or the directories themselves.

The command that I need to run is bcompare file1 file2 & (some for directories, it will figure it out).

mivk ,Oct 17, 2015 at 15:35

Add this to ~/.mc/menu :
+ t r & ! t t
d       Diff against file of same name in other directory
        if [ "%d" = "%D" ]; then
          echo "The two directores must be different"
          exit 1
        fi
        if [ -f %D/%f ]; then        # if two of them, then
          bcompare %f %D/%f &
        else
          echo %f: No copy in %D/%f
        fi

x       Diff file to file
        if [ -f %D/%F ]; then        # if two of them, then
          bcompare %f %D/%F &
        else
          echo %f: No copy in %D/%f
        fi

D       Diff current directory against other directory
        if [ "%d" = "%D" ]; then
          echo "The two directores must be different"
          exit 1
        fi
        bcompare %d %D &

[Aug 10, 2019] mc - Is there are any documentation about user-defined menu in midnight-commander - Unix Linux Stack Exchange

Aug 10, 2019 | unix.stackexchange.com

Is there are any documentation about user-defined menu in midnight-commander? Ask Question Asked 5 years, 2 months ago Active 1 year, 1 month ago Viewed 3k times 6 2


login ,Jun 11, 2014 at 13:13

I'd like to create my own user-defined menu for mc ( menu file). I see some lines like
+ t r & ! t t

or

+ t t

What does it mean?

goldilocks ,Jun 11, 2014 at 13:35

It is documented in the help, the node is "Edit Menu File" under "Command Menu"; if you scroll down you should find "Addition Conditions":

If the condition begins with '+' (or '+?') instead of '=' (or '=?') it is an addition condition. If the condition is true the menu entry will be included in the menu. If the condition is false the menu entry will not be included in the menu.

This is preceded by "Default conditions" (the = condition), which determine which entry will be highlighted as the default choice when the menu appears. Anyway, by way of example:

+ t r & ! t t

t r means if this is a regular file ("t(ype) r"), and ! t t means if the file has not been tagged in the interface.

> ,

On top what has been written above, this page can be browsed in the Internet, when searching for man pages, e.g.: https://www.systutorials.com/docs/linux/man/1-mc/

Search for "Menu File Edit" .

Best regards, Jarek

[Aug 10, 2019] midnight commander - How to configure coloring of the file names in MC - Super User

If colors are crazy, the simplest way to solve this problem is to turn them off
To turn off color you can also use option mc --nocolor or by by using the -b flag
You can customize the color displayed by defining them in ~/.mc/ini . But that requres some work. Have a look here for an example: http://ajnasz.hu/blog/20080101/midnight-commander-coloring .
Aug 10, 2019 | superuser.com
How to configure coloring of the file names in MC? Ask Question Asked 8 years, 7 months ago Active 1 year, 4 months ago Viewed 4k times 8 3

Mike L. ,Jan 9, 2011 at 17:21

Is it possible to configure the Midnight Commander (Ubuntu 10.10) to show certain file and directory names differently, e.g. all hidden (starting with a period) using grey color?

Mike L. ,Feb 20, 2018 at 5:51

Under Options -> Panel Options select File highlight -> File types .

See man mc in the Colors section for ways to choose particular colors by adding entries in your ~/.config/mc/ini file. Unfortunately, there doesn't appear to be a keyword for hidden files.

[Aug 07, 2019] Find files and tar them (with spaces)

Aug 07, 2019 | stackoverflow.com

Ask Question Asked 8 years, 3 months ago Active 1 month ago Viewed 104k times 106 45


porges ,Sep 6, 2012 at 17:43

Alright, so simple problem here. I'm working on a simple back up code. It works fine except if the files have spaces in them. This is how I'm finding files and adding them to a tar archive:
find . -type f | xargs tar -czvf backup.tar.gz

The problem is when the file has a space in the name because tar thinks that it's a folder. Basically is there a way I can add quotes around the results from find? Or a different way to fix this?

Brad Parks ,Mar 2, 2017 at 18:35

Use this:
find . -type f -print0 | tar -czvf backup.tar.gz --null -T -

It will:

Also see:

czubehead ,Mar 19, 2018 at 11:51

There could be another way to achieve what you want. Basically,
  1. Use the find command to output path to whatever files you're looking for. Redirect stdout to a filename of your choosing.
  2. Then tar with the -T option which allows it to take a list of file locations (the one you just created with find!)
    find . -name "*.whatever" > yourListOfFiles
    tar -cvf yourfile.tar -T yourListOfFiles
    

gsteff ,May 5, 2011 at 2:05

Try running:
    find . -type f | xargs -d "\n" tar -czvf backup.tar.gz

Caleb Kester ,Oct 12, 2013 at 20:41

Why not:
tar czvf backup.tar.gz *

Sure it's clever to use find and then xargs, but you're doing it the hard way.

Update: Porges has commented with a find-option that I think is a better answer than my answer, or the other one: find -print0 ... | xargs -0 ....

Kalibur x ,May 19, 2016 at 13:54

If you have multiple files or directories and you want to zip them into independent *.gz file you can do this. Optional -type f -atime
find -name "httpd-log*.txt" -type f -mtime +1 -exec tar -vzcf {}.gz {} \;

This will compress

httpd-log01.txt
httpd-log02.txt

to

httpd-log01.txt.gz
httpd-log02.txt.gz

Frank Eggink ,Apr 26, 2017 at 8:28

Why not give something like this a try: tar cvf scala.tar `find src -name *.scala`

tommy.carstensen ,Dec 10, 2017 at 14:55

Another solution as seen here :
find var/log/ -iname "anaconda.*" -exec tar -cvzf file.tar.gz {} +

Robino ,Sep 22, 2016 at 14:26

The best solution seem to be to create a file list and then archive files because you can use other sources and do something else with the list.

For example this allows using the list to calculate size of the files being archived:

#!/bin/sh

backupFileName="backup-big-$(date +"%Y%m%d-%H%M")"
backupRoot="/var/www"
backupOutPath=""

archivePath=$backupOutPath$backupFileName.tar.gz
listOfFilesPath=$backupOutPath$backupFileName.filelist

#
# Make a list of files/directories to archive
#
echo "" > $listOfFilesPath
echo "${backupRoot}/uploads" >> $listOfFilesPath
echo "${backupRoot}/extra/user/data" >> $listOfFilesPath
find "${backupRoot}/drupal_root/sites/" -name "files" -type d >> $listOfFilesPath

#
# Size calculation
#
sizeForProgress=`
cat $listOfFilesPath | while read nextFile;do
    if [ ! -z "$nextFile" ]; then
        du -sb "$nextFile"
    fi
done | awk '{size+=$1} END {print size}'
`

#
# Archive with progress
#
## simple with dump of all files currently archived
#tar -czvf $archivePath -T $listOfFilesPath
## progress bar
sizeForShow=$(($sizeForProgress/1024/1024))
echo -e "\nRunning backup [source files are $sizeForShow MiB]\n"
tar -cPp -T $listOfFilesPath | pv -s $sizeForProgress | gzip > $archivePath

user3472383 ,Jun 27 at 1:11

Would add a comment to @Steve Kehlet post but need 50 rep (RIP).

For anyone that has found this post through numerous googling, I found a way to not only find specific files given a time range, but also NOT include the relative paths OR whitespaces that would cause tarring errors. (THANK YOU SO MUCH STEVE.)

find . -name "*.pdf" -type f -mtime 0 -printf "%f\0" | tar -czvf /dir/zip.tar.gz --null -T -
  1. . relative directory
  2. -name "*.pdf" look for pdfs (or any file type)
  3. -type f type to look for is a file
  4. -mtime 0 look for files created in last 24 hours
  5. -printf "%f\0" Regular -print0 OR -printf "%f" did NOT work for me. From man pages:

This quoting is performed in the same way as for GNU ls. This is not the same quoting mechanism as the one used for -ls and -fls. If you are able to decide what format to use for the output of find then it is normally better to use '\0' as a terminator than to use newline, as file names can contain white space and newline characters.

  1. -czvf create archive, filter the archive through gzip , verbosely list files processed, archive name

[Aug 06, 2019] Tar archiving that takes input from a list of files>

Aug 06, 2019 | stackoverflow.com

Tar archiving that takes input from a list of files Ask Question Asked 7 years, 9 months ago Active 6 months ago Viewed 123k times 131 29


Kurt McKee ,Apr 29 at 10:22

I have a file that contain list of files I want to archive with tar. Let's call it mylist.txt

It contains:

file1.txt
file2.txt
...
file10.txt

Is there a way I can issue TAR command that takes mylist.txt as input? Something like

tar -cvf allfiles.tar -[someoption?] mylist.txt

So that it is similar as if I issue this command:

tar -cvf allfiles.tar file1.txt file2.txt file10.txt

Stphane ,May 25 at 0:11

Yes:
tar -cvf allfiles.tar -T mylist.txt

drue ,Jun 23, 2014 at 14:56

Assuming GNU tar (as this is Linux), the -T or --files-from option is what you want.

Stphane ,Mar 1, 2016 at 20:28

You can also pipe in the file names which might be useful:
find /path/to/files -name \*.txt | tar -cvf allfiles.tar -T -

David C. Rankin ,May 31, 2018 at 18:27

Some versions of tar, for example, the default versions on HP-UX (I tested 11.11 and 11.31), do not include a command line option to specify a file list, so a decent work-around is to do this:
tar cvf allfiles.tar $(cat mylist.txt)

Jan ,Sep 25, 2015 at 20:18

On Solaris, you can use the option -I to read the filenames that you would normally state on the command line from a file. In contrast to the command line, this can create tar archives with hundreds of thousands of files (just did that).

So the example would read

tar -cvf allfiles.tar -I mylist.txt

,

For me on AIX, it worked as follows:
tar -L List.txt -cvf BKP.tar

[Aug 06, 2019] Shell command to tar directory excluding certain files-folders

Aug 06, 2019 | stackoverflow.com

Shell command to tar directory excluding certain files/folders Ask Question Asked 10 years, 1 month ago Active 1 month ago Viewed 787k times 720 186


Rekhyt ,Jun 24, 2014 at 16:06

Is there a simple shell command/script that supports excluding certain files/folders from being archived?

I have a directory that need to be archived with a sub directory that has a number of very large files I do not need to backup.

Not quite solutions:

The tar --exclude=PATTERN command matches the given pattern and excludes those files, but I need specific files & folders to be ignored (full file path), otherwise valid files might be excluded.

I could also use the find command to create a list of files and exclude the ones I don't want to archive and pass the list to tar, but that only works with for a small amount of files. I have tens of thousands.

I'm beginning to think the only solution is to create a file with a list of files/folders to be excluded, then use rsync with --exclude-from=file to copy all the files to a tmp directory, and then use tar to archive that directory.

Can anybody think of a better/more efficient solution?

EDIT: Charles Ma 's solution works well. The big gotcha is that the --exclude='./folder' MUST be at the beginning of the tar command. Full command (cd first, so backup is relative to that directory):

cd /folder_to_backup
tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .

James O'Brien ,Nov 24, 2016 at 9:55

You can have multiple exclude options for tar so
$ tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .

etc will work. Make sure to put --exclude before the source and destination items.

Johan Soderberg ,Jun 11, 2009 at 23:10

You can exclude directories with --exclude for tar.

If you want to archive everything except /usr you can use:

tar -zcvf /all.tgz / --exclude=/usr

In your case perhaps something like

tar -zcvf archive.tgz arc_dir --exclude=dir/ignore_this_dir

cstamas ,Oct 8, 2018 at 18:02

Possible options to exclude files/directories from backup using tar:

Exclude files using multiple patterns

tar -czf backup.tar.gz --exclude=PATTERN1 --exclude=PATTERN2 ... /path/to/backup

Exclude files using an exclude file filled with a list of patterns

tar -czf backup.tar.gz -X /path/to/exclude.txt /path/to/backup

Exclude files using tags by placing a tag file in any directory that should be skipped

tar -czf backup.tar.gz --exclude-tag-all=exclude.tag /path/to/backup

Anish Ramaswamy ,Apr 1 at 16:18

old question with many answers, but I found that none were quite clear enough for me, so I would like to add my try.

if you have the following structure

/home/ftp/mysite/

with following file/folders

/home/ftp/mysite/file1
/home/ftp/mysite/file2
/home/ftp/mysite/file3
/home/ftp/mysite/folder1
/home/ftp/mysite/folder2
/home/ftp/mysite/folder3

so, you want to make a tar file that contain everyting inside /home/ftp/mysite (to move the site to a new server), but file3 is just junk, and everything in folder3 is also not needed, so we will skip those two.

we use the format

tar -czvf <name of tar file> <what to tar> <any excludes>

where the c = create, z = zip, and v = verbose (you can see the files as they are entered, usefull to make sure none of the files you exclude are being added). and f= file.

so, my command would look like this

cd /home/ftp/
tar -czvf mysite.tar.gz mysite --exclude='file3' --exclude='folder3'

note the files/folders excluded are relatively to the root of your tar (I have tried full path here relative to / but I can not make that work).

hope this will help someone (and me next time I google it)

not2qubit ,Apr 4, 2018 at 3:24

You can use standard "ant notation" to exclude directories relative.
This works for me and excludes any .git or node_module directories.
tar -cvf myFile.tar --exclude=**/.git/* --exclude=**/node_modules/*  -T /data/txt/myInputFile.txt 2> /data/txt/myTarLogFile.txt

myInputFile.txt Contains:

/dev2/java
/dev2/javascript

GeertVc ,Feb 9, 2015 at 13:37

I've experienced that, at least with the Cygwin version of tar I'm using ("CYGWIN_NT-5.1 1.7.17(0.262/5/3) 2012-10-19 14:39 i686 Cygwin" on a Windows XP Home Edition SP3 machine), the order of options is important.

While this construction worked for me:

tar cfvz target.tgz --exclude='<dir1>' --exclude='<dir2>' target_dir

that one didn't work:

tar cfvz --exclude='<dir1>' --exclude='<dir2>' target.tgz target_dir

This, while tar --help reveals the following:

tar [OPTION...] [FILE]

So, the second command should also work, but apparently it doesn't seem to be the case...

Best rgds,

Scott Stensland ,Feb 12, 2015 at 20:55

This exclude pattern handles filename suffix like png or mp3 as well as directory names like .git and node_modules
tar --exclude={*.png,*.mp3,*.wav,.git,node_modules} -Jcf ${target_tarball}  ${source_dirname}

Michael ,May 18 at 23:29

I found this somewhere else so I won't take credit, but it worked better than any of the solutions above for my mac specific issues (even though this is closed):
tar zc --exclude __MACOSX --exclude .DS_Store -f <archive> <source(s)>

J. Lawson ,Apr 17, 2018 at 23:28

For those who have issues with it, some versions of tar would only work properly without the './' in the exclude value.
Tar --version

tar (GNU tar) 1.27.1

Command syntax that work:

tar -czvf ../allfiles-butsome.tar.gz * --exclude=acme/foo

These will not work:

$ tar -czvf ../allfiles-butsome.tar.gz * --exclude=./acme/foo
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude='./acme/foo'
$ tar --exclude=./acme/foo -czvf ../allfiles-butsome.tar.gz *
$ tar --exclude='./acme/foo' -czvf ../allfiles-butsome.tar.gz *
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude=/full/path/acme/foo
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude='/full/path/acme/foo'
$ tar --exclude=/full/path/acme/foo -czvf ../allfiles-butsome.tar.gz *
$ tar --exclude='/full/path/acme/foo' -czvf ../allfiles-butsome.tar.gz *

Jerinaw ,May 6, 2017 at 20:07

For Mac OSX I had to do

tar -zcv --exclude='folder' -f theOutputTarFile.tar folderToTar

Note the -f after the --exclude=

Aaron Votre ,Jul 15, 2016 at 15:56

I agree the --exclude flag is the right approach.
$ tar --exclude='./folder_or_file' --exclude='file_pattern' --exclude='fileA'

A word of warning for a side effect that I did not find immediately obvious: The exclusion of 'fileA' in this example will search for 'fileA' RECURSIVELY!

Example:A directory with a single subdirectory containing a file of the same name (data.txt)

data.txt
config.txt
--+dirA
  |  data.txt
  |  config.docx

Znik ,Nov 15, 2014 at 5:12

To avoid possible 'xargs: Argument list too long' errors due to the use of find ... | xargs ... when processing tens of thousands of files, you can pipe the output of find directly to tar using find ... -print0 | tar --null ... .
# archive a given directory, but exclude various files & directories 
# specified by their full file paths
find "$(pwd -P)" -type d \( -path '/path/to/dir1' -or -path '/path/to/dir2' \) -prune \
   -or -not \( -path '/path/to/file1' -or -path '/path/to/file2' \) -print0 | 
   gnutar --null --no-recursion -czf archive.tar.gz --files-from -
   #bsdtar --null -n -czf archive.tar.gz -T -

Mike ,May 9, 2014 at 21:29

After reading this thread, I did a little testing on RHEL 5 and here are my results for tarring up the abc directory:

This will exclude the directories error and logs and all files under the directories:

tar cvpzf abc.tgz abc/ --exclude='abc/error' --exclude='abc/logs'

Adding a wildcard after the excluded directory will exclude the files but preserve the directories:

tar cvpzf abc.tgz abc/ --exclude='abc/error/*' --exclude='abc/logs/*'

Alex B ,Jun 11, 2009 at 23:03

Use the find command in conjunction with the tar append (-r) option. This way you can add files to an existing tar in a single step, instead of a two pass solution (create list of files, create tar).
find /dir/dir -prune ... -o etc etc.... -exec tar rvf ~/tarfile.tar {} \;

frommelmak ,Sep 10, 2012 at 14:08

You can also use one of the "--exclude-tag" options depending on your needs:

The folder hosting the specified FILE will be excluded.

camh ,Jun 12, 2009 at 5:53

You can use cpio(1) to create tar files. cpio takes the files to archive on stdin, so if you've already figured out the find command you want to use to select the files the archive, pipe it into cpio to create the tar file:
find ... | cpio -o -H ustar | gzip -c > archive.tar.gz

PicoutputCls ,Aug 21, 2018 at 14:13

gnu tar v 1.26 the --exclude needs to come after archive file and backup directory arguments, should have no leading or trailing slashes, and prefers no quotes (single or double). So relative to the PARENT directory to be backed up, it's:

tar cvfz /path_to/mytar.tgz ./dir_to_backup --exclude=some_path/to_exclude

user2553863 ,May 28 at 21:41

After reading all this good answers for different versions and having solved the problem for myself, I think there are very small details that are very important, and rare to GNU/Linux general use , that aren't stressed enough and deserves more than comments.

So I'm not going to try to answer the question for every case, but instead, try to register where to look when things doesn't work.

IT IS VERY IMPORTANT TO NOTICE:

  1. THE ORDER OF THE OPTIONS MATTER: it is not the same put the --exclude before than after the file option and directories to backup. This is unexpected at least to me, because in my experience, in GNU/Linux commands, usually the order of the options doesn't matter.
  2. Different tar versions expects this options in different order: for instance, @Andrew's answer indicates that in GNU tar v 1.26 and 1.28 the excludes comes last, whereas in my case, with GNU tar 1.29, it's the other way.
  3. THE TRAILING SLASHES MATTER : at least in GNU tar 1.29, it shouldn't be any .

In my case, for GNU tar 1.29 on Debian stretch, the command that worked was

tar --exclude="/home/user/.config/chromium" --exclude="/home/user/.cache" -cf file.tar  /dir1/ /home/ /dir3/

The quotes didn't matter, it worked with or without them.

I hope this will be useful to someone.

jørgensen ,Dec 19, 2015 at 11:10

Your best bet is to use find with tar, via xargs (to handle the large number of arguments). For example:
find / -print0 | xargs -0 tar cjf tarfile.tar.bz2

Ashwini Gupta ,Jan 12, 2018 at 10:30

tar -cvzf destination_folder source_folder -X /home/folder/excludes.txt

-X indicates a file which contains a list of filenames which must be excluded from the backup. For Instance, you can specify *~ in this file to not include any filenames ending with ~ in the backup.

George ,Sep 4, 2013 at 22:35

Possible redundant answer but since I found it useful, here it is:

While a FreeBSD root (i.e. using csh) I wanted to copy my whole root filesystem to /mnt but without /usr and (obviously) /mnt. This is what worked (I am at /):

tar --exclude ./usr --exclude ./mnt --create --file - . (cd /mnt && tar xvd -)

My whole point is that it was necessary (by putting the ./ ) to specify to tar that the excluded directories where part of the greater directory being copied.

My €0.02

t0r0X ,Sep 29, 2014 at 20:25

I had no luck getting tar to exclude a 5 Gigabyte subdirectory a few levels deep. In the end, I just used the unix Zip command. It worked a lot easier for me.

So for this particular example from the original post
(tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz . )

The equivalent would be:

zip -r /backup/filename.zip . -x upload/folder/**\* upload/folder2/**\*

(NOTE: Here is the post I originally used that helped me https://superuser.com/questions/312301/unix-zip-directory-but-excluded-specific-subdirectories-and-everything-within-t )

RohitPorwal ,Jul 21, 2016 at 9:56

Check it out
tar cvpzf zip_folder.tgz . --exclude=./public --exclude=./tmp --exclude=./log --exclude=fileName

tripleee ,Sep 14, 2017 at 4:38

The following bash script should do the trick. It uses the answer given here by Marcus Sundman.
#!/bin/bash

echo -n "Please enter the name of the tar file you wish to create with out extension "
read nam

echo -n "Please enter the path to the directories to tar "
read pathin

echo tar -czvf $nam.tar.gz
excludes=`find $pathin -iname "*.CC" -exec echo "--exclude \'{}\'" \;|xargs`
echo $pathin

echo tar -czvf $nam.tar.gz $excludes $pathin

This will print out the command you need and you can just copy and paste it back in. There is probably a more elegant way to provide it directly to the command line.

Just change *.CC for any other common extension, file name or regex you want to exclude and this should still work.

EDIT

Just to add a little explanation; find generates a list of files matching the chosen regex (in this case *.CC). This list is passed via xargs to the echo command. This prints --exclude 'one entry from the list'. The slashes () are escape characters for the ' marks.

[Aug 06, 2019] bash - More efficient way to find tar millions of files - Stack Overflow

Aug 06, 2019 | stackoverflow.com

More efficient way to find & tar millions of files Ask Question Asked 9 years, 3 months ago Active 8 months ago Viewed 25k times 22 13


theomega ,Apr 29, 2010 at 13:51

I've got a job running on my server at the command line prompt for a two days now:
find data/ -name filepattern-*2009* -exec tar uf 2009.tar {} ;

It is taking forever , and then some. Yes, there are millions of files in the target directory. (Each file is a measly 8 bytes in a well hashed directory structure.) But just running...

find data/ -name filepattern-*2009* -print > filesOfInterest.txt

...takes only two hours or so. At the rate my job is running, it won't be finished for a couple of weeks .. That seems unreasonable. Is there a more efficient to do this? Maybe with a more complicated bash script?

A secondary questions is "why is my current approach so slow?"

Stu Thompson ,May 6, 2013 at 1:11

If you already did the second command that created the file list, just use the -T option to tell tar to read the files names from that saved file list. Running 1 tar command vs N tar commands will be a lot better.

Matthew Mott ,Jul 3, 2014 at 19:21

One option is to use cpio to generate a tar-format archive:
$ find data/ -name "filepattern-*2009*" | cpio -ov --format=ustar > 2009.tar

cpio works natively with a list of filenames from stdin, rather than a top-level directory, which makes it an ideal tool for this situation.

bashfu ,Apr 23, 2010 at 10:05

Here's a find-tar combination that can do what you want without the use of xargs or exec (which should result in a noticeable speed-up):
tar --version    # tar (GNU tar) 1.14 

# FreeBSD find (on Mac OS X)
find -x data -name "filepattern-*2009*" -print0 | tar --null --no-recursion -uf 2009.tar --files-from -

# for GNU find use -xdev instead of -x
gfind data -xdev -name "filepattern-*2009*" -print0 | tar --null --no-recursion -uf 2009.tar --files-from -

# added: set permissions via tar
find -x data -name "filepattern-*2009*" -print0 | \
    tar --null --no-recursion --owner=... --group=... --mode=... -uf 2009.tar --files-from -

Stu Thompson ,Apr 28, 2010 at 12:50

There is xargs for this:
find data/ -name filepattern-*2009* -print0 | xargs -0 tar uf 2009.tar

Guessing why it is slow is hard as there is not much information. What is the structure of the directory, what filesystem do you use, how it was configured on creating. Having milions of files in single directory is quite hard situation for most filesystems.

bashfu ,May 1, 2010 at 14:18

To correctly handle file names with weird (but legal) characters (such as newlines, ...) you should write your file list to filesOfInterest.txt using find's -print0:
find -x data -name "filepattern-*2009*" -print0 > filesOfInterest.txt
tar --null --no-recursion -uf 2009.tar --files-from filesOfInterest.txt

Michael Aaron Safyan ,Apr 23, 2010 at 8:47

The way you currently have things, you are invoking the tar command every single time it finds a file, which is not surprisingly slow. Instead of taking the two hours to print plus the amount of time it takes to open the tar archive, see if the files are out of date, and add them to the archive, you are actually multiplying those times together. You might have better success invoking the tar command once, after you have batched together all the names, possibly using xargs to achieve the invocation. By the way, I hope you are using 'filepattern-*2009*' and not filepattern-*2009* as the stars will be expanded by the shell without quotes.

ruffrey ,Nov 20, 2018 at 17:13

There is a utility for this called tarsplitter .
tarsplitter -m archive -i folder/*.json -o archive.tar -p 8

will use 8 threads to archive the files matching "folder/*.json" into an output archive of "archive.tar"

https://github.com/AQUAOSOTech/tarsplitter

syneticon-dj ,Jul 22, 2013 at 8:47

Simplest (also remove file after archive creation):
find *.1  -exec tar czf '{}.tgz' '{}' --remove-files \;

[Aug 06, 2019] backup - Fastest way combine many files into one (tar czf is too slow) - Unix Linux Stack Exchange

Aug 06, 2019 | unix.stackexchange.com

Fastest way combine many files into one (tar czf is too slow) Ask Question Asked 7 years, 11 months ago Active 21 days ago Viewed 32k times 22 5


Gilles ,Nov 5, 2013 at 0:05

Currently I'm running tar czf to combine backup files. The files are in a specific directory.

But the number of files is growing. Using tzr czf takes too much time (more than 20 minutes and counting).

I need to combine the files more quickly and in a scalable fashion.

I've found genisoimage , readom and mkisofs . But I don't know which is fastest and what the limitations are for each of them.

Rufo El Magufo ,Aug 24, 2017 at 7:56

You should check if most of your time are being spent on CPU or in I/O. Either way, there are ways to improve it:

A: don't compress

You didn't mention "compression" in your list of requirements so try dropping the "z" from your arguments list: tar cf . This might be speed up things a bit.

There are other techniques to speed-up the process, like using "-N " to skip files you already backed up before.

B: backup the whole partition with dd

Alternatively, if you're backing up an entire partition, take a copy of the whole disk image instead. This would save processing and a lot of disk head seek time. tar and any other program working at a higher level have a overhead of having to read and process directory entries and inodes to find where the file content is and to do more head disk seeks , reading each file from a different place from the disk.

To backup the underlying data much faster, use:

dd bs=16M if=/dev/sda1 of=/another/filesystem

(This assumes you're not using RAID, which may change things a bit)

,

To repeat what others have said: we need to know more about the files that are being backed up. I'll go with some assumptions here. Append to the tar file

If files are only being added to the directories (that is, no file is being deleted), make sure you are appending to the existing tar file rather than re-creating it every time. You can do this by specifying the existing archive filename in your tar command instead of a new one (or deleting the old one).

Write to a different disk

Reading from the same disk you are writing to may be killing performance. Try writing to a different disk to spread the I/O load. If the archive file needs to be on the same disk as the original files, move it afterwards.

Don't compress

Just repeating what @Yves said. If your backup files are already compressed, there's not much need to compress again. You'll just be wasting CPU cycles.

[Aug 02, 2019] linux - How to tar directory and then remove originals including the directory - Super User

Aug 02, 2019 | superuser.com

How to tar directory and then remove originals including the directory? Ask Question Asked 9 years, 6 months ago Active 4 years, 6 months ago Viewed 124k times 28 7


mit ,Dec 7, 2016 at 1:22

I'm trying to tar a collection of files in a directory called 'my_directory' and remove the originals by using the command:
tar -cvf files.tar my_directory --remove-files

However it is only removing the individual files inside the directory and not the directory itself (which is what I specified in the command). What am I missing here?

EDIT:

Yes, I suppose the 'remove-files' option is fairly literal. Although I too found the man page unclear on that point. (In linux I tend not to really distinguish much between directories and files that much, and forget sometimes that they are not the same thing). It looks like the consensus is that it doesn't remove directories.

However, my major prompting point for asking this question stems from tar's handling of absolute paths. Because you must specify a relative path to a file/s to be compressed, you therefore must change to the parent directory to tar it properly. As I see it using any kind of follow-on 'rm' command is potentially dangerous in that situation. Thus I was hoping to simplify things by making tar itself do the remove.

For example, imagine a backup script where the directory to backup (ie. tar) is included as a shell variable. If that shell variable value was badly entered, it is possible that the result could be deleted files from whatever directory you happened to be in last.

Arjan ,Feb 13, 2016 at 13:08

You are missing the part which says the --remove-files option removes files after adding them to the archive.

You could follow the archive and file-removal operation with a command like,

find /path/to/be/archived/ -depth -type d -empty -exec rmdir {} \;


Update: You may be interested in reading this short Debian discussion on,
Bug 424692: --remove-files complains that directories "changed as we read it" .

Kim ,Feb 13, 2016 at 13:08

Since the --remove-files option only removes files , you could try
tar -cvf files.tar my_directory && rm -R my_directory

so that the directory is removed only if the tar returns an exit status of 0

redburn ,Feb 13, 2016 at 13:08

Have you tried to put --remove-files directive after archive name? It works for me.
tar -cvf files.tar --remove-files my_directory

shellking ,Oct 4, 2010 at 19:58

source={directory argument}

e.g.

source={FULL ABSOLUTE PATH}/my_directory
parent={parent directory of argument}

e.g.

parent={ABSOLUTE PATH of 'my_directory'/
logFile={path to a run log that captures status messages}

Then you could execute something along the lines of:

cd ${parent}

tar cvf Tar_File.`date%Y%M%D_%H%M%S` ${source}

if [ $? != 0 ]

then

 echo "Backup FAILED for ${source} at `date` >> ${logFile}

else

 echo "Backup SUCCESS for ${source} at `date` >> ${logFile}

 rm -rf ${source}

fi

mit ,Nov 14, 2011 at 13:21

This was probably a bug.

Also the word "file" is ambigous in this case. But because this is a command line switch I would it expect to mean also directories, because in unix/lnux everything is a file, also a directory. (The other interpretation is of course also valid, but It makes no sense to keep directories in such a case. I would consider it unexpected and confusing behavior.)

But I have found that in gnu tar on some distributions gnu tar actually removes the directory tree. Another indication that keeping the tree was a bug. Or at least some workaround until they fixed it.

This is what I tried out on an ubuntu 10.04 console:

mit:/var/tmp$ mkdir tree1                                                                                               
mit:/var/tmp$ mkdir tree1/sub1                                                                                          
mit:/var/tmp$ > tree1/sub1/file1                                                                                        

mit:/var/tmp$ ls -la                                                                                                    
drwxrwxrwt  4 root root 4096 2011-11-14 15:40 .                                                                              
drwxr-xr-x 16 root root 4096 2011-02-25 03:15 ..
drwxr-xr-x  3 mit  mit  4096 2011-11-14 15:40 tree1

mit:/var/tmp$ tar -czf tree1.tar.gz tree1/ --remove-files

# AS YOU CAN SEE THE TREE IS GONE NOW:

mit:/var/tmp$ ls -la
drwxrwxrwt  3 root root 4096 2011-11-14 15:41 .
drwxr-xr-x 16 root root 4096 2011-02-25 03:15 ..
-rw-r--r--  1 mit   mit    159 2011-11-14 15:41 tree1.tar.gz                                                                   


mit:/var/tmp$ tar --version                                                                                             
tar (GNU tar) 1.22                                                                                                           
Copyright © 2009 Free Software Foundation, Inc.

If you want to see it on your machine, paste this into a console at your own risk:

tar --version                                                                                             
cd /var/tmp
mkdir -p tree1/sub1                                                                                          
> tree1/sub1/file1                                                                                        
tar -czf tree1.tar.gz tree1/ --remove-files
ls -la

[Jul 31, 2019] Is Ruby moving toward extinction?

Jul 31, 2019 | developers.slashdot.org

timeOday ( 582209 ) , Monday July 29, 2019 @03:44PM ( #59007686 )

Re:ORLY ( Score: 5 , Insightful)

This is what it feels like to actually learn from an article instead of simply having it confirm your existing beliefs.

Here is what it says:

An analysis of Dice job-posting data over the past year shows a startling dip in the number of companies looking for technology professionals who are skilled in Ruby. In 2018, the number of Ruby jobs declined 56 percent. That's a huge warning sign that companies are turning away from Ruby - and if that's the case, the language's user-base could rapidly erode to almost nothing.

Well, what's your evidence-based rebuttal to that?

Wdomburg ( 141264 ) writes:
Re: ( Score: 2 )

If you actually look at the TIOBE rankings, it's #11 (not #12 as claimed in the article), and back on the upswing. If you look at RedMonk, which they say they looked at but don't reference with respect to Ruby, it is a respectable #8, being one of the top languages on GitHub and Stack Overflow.

We are certainly past the glory days of Ruby, when it was the Hot New Thing and everyone was deploying Rails, but to suggest that it is "probably doomed" seems a somewhat hysterical prediction.

OrangeTide ( 124937 ) , Tuesday July 30, 2019 @01:52AM ( #59010348 ) Homepage Journal
Re:ORLY ( Score: 4 , Funny)
How do they know how many Ruby jobs there are? Maybe how many Ruby job openings announced, but not the actual number of jobs. Or maybe they are finding Ruby job-applicants and openings via other means.

Maybe there is a secret list of Ruby job postings only available to the coolest programmers? Man! I never get to hang out with the cool kids.

jellomizer ( 103300 ) , Monday July 29, 2019 @03:48PM ( #59007714 )
Re:ORLY ( Score: 5 , Insightful)

Perhaps the devops/web programmers is a dying field.

But to be fair, Ruby had its peak about 10 years ago. With Ruby on Rails. However the problem is the "Rails" started to get very dated. And Python and Node.JS had taken its place.

whitelabrat ( 469237 ) , Monday July 29, 2019 @03:57PM ( #59007778 )
Re:ORLY ( Score: 5 , Insightful)

I don't see Ruby dying anytime soon, but I do get the feeling that Python is the go-to scripting language for all the things now. I learned Ruby and wish I spent that time learning Python.

Perl is perl. It will live on, but anybody writing new things with it probably needs to have a talkin' to.

phantomfive ( 622387 ) , Monday July 29, 2019 @07:32PM ( #59009188 ) Journal
Re:ORLY ( Score: 4 , Insightful)
I learned Ruby and wish I spent that time learning Python.

Ruby and Python are basically the same thing. With a little google, you can literally start programming in Python today. Search for "print python" and you can easily write a hello world. search for 'python for loop' and suddenly you can do repetitious tasks. Search for "define function python" and you can organize your code.

After that do a search for hash tables and lists in Python and you'll be good enough to pass a coding interview in the language.

[Jul 31, 2019] 5 Programming Languages That Are Probably Doomed

The article is a clickbait. entrenched languages seldom die. But some Slashdot comments are interesting.
Jul 31, 2019 | developers.slashdot.org

NoNonAlphaCharsHere ( 2201864 ) , Monday July 29, 2019 @03:39PM ( #59007638 )

Re:ORLY ( Score: 5 , Funny)

Perl has been "doomed" for over 30 years now, hasn't stopped it.

geekoid ( 135745 ) writes:
Re: ( Score: 2 )

OTOH, it not exactly what it once was.

IMO: if you can't write good readable code in PERL, you should find a new business to work in.

Anonymous Coward writes:
check the job description ( Score: 3 , Funny)

Writing unreadable perl is the business.

ShanghaiBill ( 739463 ) writes:
Re: ( Score: 3 )
Perl has been "doomed" for over 30 years now, hasn't stopped it.

I love Perl, but today it is mostly small throw-away scripts and maintaining legacy apps.

It makes little sense to use Perl for a new project.

Perl won't disappear, but the glory days are in the past.

Anonymous Coward , Monday July 29, 2019 @03:59PM ( #59007794 )
Re:ORLY ( Score: 4 , Interesting)

I write new code in perl all the time. Cleanly written, well formatted and completely maintainable. Simply because YOU can't write perl in such a manner, that doesn't mean others can't.

Anonymous Coward writes:
Re: ORLY ( Score: 2 , Insightful)

Do you have someone else who is saying that about your code or is that your own opinion?

Sarten-X ( 1102295 ) , Monday July 29, 2019 @05:53PM ( #59008624 ) Homepage
Re: ORLY ( Score: 4 , Insightful)

I happen to read a lot of Perl in my day job, involving reverse-engineering a particular Linux-based appliance for integration purposes. I seldom come across scripts that are too actually bad.

It's important to understand that Perl has a different concept of readability. It's more like reading a book than reading a program, because there are so many ways to write any given task. A good Perl programmer will incorporate that flexibility into their style, so intent can be inferred not just from the commands used, but also how the code is arranged. For example, a large block describing a complex function would be written verbosely for detailed clarity.

A trivial statement could be used, if it resolves an edge case.

Conversely, a good Perl reader will be familiar enough with the language to understand the idioms and shorthand used, so they can understand the story as written without being distracted by the ugly bits. Once viewed from that perspective, a Perl program can condense incredible amounts of description into just a few lines, and still be as readily-understood as any decent novel.

Sarten-X ( 1102295 ) writes: on Monday July 29, 2019 @07:06PM ( #59009056 ) Homepage
Re: ORLY ( Score: 4 , Insightful)

Since you brought it up...

In building several dev teams, I have never tried to hire everyone with any particular skill. I aim to have at least two people with each skill, but won't put effort to having more than that at first. After the initial startup of the team, I try to run projects in pairs, with an expert starting the project, then handing it to a junior (in that particular skill) for completion. After a few rounds of that, the junior is close enough to an expert, and somebody else takes the junior role. That way, even with turnover, expertise is shared among the team, and there's always someone who can be the expert.

Back to the subject at hand, though...

My point is that Perl is a more conversational language that others, and its structure reflects that. It is unreasonable to simply look at Perl code, see the variety of structures, and declare it "unreadable" simply because the reader doesn't understand the language.

As an analogy, consider the structural differences between Lord of the Rings and The Cat in the Hat . A reader who is only used to The Cat in the Hat would find Lord of the Rings to be ridiculously complex to the point of being unreadable, when Lord of the Rings is simply making use of structures and capabilities that are not permitted in the language of young children's' books.

This is not to say that other languages are wrong to have a more limited grammar. They are simply different, and learning to read a more flexible language is a skill to be developed like any other. Similar effort must be spent to learn other languages with sufficiently-different structure, like Lisp or Haskell.

phantomfive ( 622387 ) , Monday July 29, 2019 @07:24PM ( #59009128 ) Journal
Re:ORLY ( Score: 3 )

FWIW DuckDuckGo is apparently written primarily in Perl.

fahrbot-bot ( 874524 ) , Monday July 29, 2019 @03:46PM ( #59007696 )
If your career is based on ... ( Score: 3 , Interesting)

From TFA:

Perl: Even if RedMonk has Perl's popularity declining, it's still going to take a long time for the language to flatten out completely, given the sheer number of legacy websites that still feature its code. Nonetheless, a lack of active development, and widespread developer embrace of other languages for things like building websites, means that Perl is going to just fall into increasing disuse.

First, Perl is used for many, many more things than websites -- and the focus in TFA is short-sighted. Second, I've written a LOT of Perl in my many years, but wouldn't say my (or most people's) career is based on it. Yes, I have written applications in Perl, but more often used it for utility, glue and other things that help get things done, monitor and (re)process data. Nothing (or very few things) can beat Perl for a quick knock-off script to do something or another.

Perl's not going anywhere and it will be a useful language to know for quite a while. Languages like Perl (and Python) are great tools to have in your toolbox, ones that you know how to wield well when you need them. Knowing when you need them, and not something else, is important.

TimHunter ( 174406 ) , Monday July 29, 2019 @05:22PM ( #59008400 )
Career based on *a* programming language? ( Score: 4 , Insightful)

Anybody whose career is based on a single programming language is doomed already. Programmers know how to write code. The language they use is beside the point. A good programmer can write code in whatever language is asked of them.

bobbied ( 2522392 ) , Monday July 29, 2019 @04:23PM ( #59007966 )
Re:Diversifying ( Score: 5 , Insightful)
The writer of this article should consider diversifying his skillset at some point, as not all bloggers endure forever and his popularity ranking on Slashdot has recently tanked.

I'd suggest that this writer quit their day job and take up stand up...

Old languages never really die until the platform dies. Languages may fall out of favor, but they don't usually die until the platform they are running on disappears and then the people who used them die. So, FORTRAN, C, C++, and COBOL and more are here to pretty much stay.

Specifically, PERL isn't going anywhere being fundamentally on Linux, neither is Ruby, the rest to varying degrees have been out of favor for awhile now, but none of the languages in the article are dead. They are, however, falling out of favor and because of that, it might be a good idea to be adding other tools to your programmer's tool box if your livelihood depends on one of them.

[Jul 30, 2019] Python is overrated

Notable quotes:
"... R commits a substantial scale crime by being so dependent on memory-resident objects. Python commits major scale crime with its single-threaded primary interpreter loop. ..."
Jul 29, 2019 | developers.slashdot.org

epine ( 68316 ), Monday July 29, 2019 @05:48PM ( #59008600 ) Score: 3 )

Jul 30, 2019 | developers.slashdot.org

I had this naive idea that Python might substantially displace R until I learned more about the Python internals, which are pretty nasty. This is the new generation's big data language? If true, sure sucks to be young again.

Python isn't even really used to do big data. It's mainly used to orchestrate big data flows on top of other libraries or facilities. It has more or less become the lingua franca of high-level hand waving. Any real grunt is far below.

R commits a substantial scale crime by being so dependent on memory-resident objects. Python commits major scale crime with its single-threaded primary interpreter loop.

If I move away from R, it will definitely be Julia for any real work (as Julia matures, if it matures well), and not Python.

[Jul 30, 2019] The difference between tar and tar.gz archives

With tar.gz to extract a file archiver first creates an intermediary tarball x.tar file from x.tar.gz by uncompressing the whole archive then unpack requested files from this intermediary tarball. In tar.gz archive is large unpacking can take several hours or even days.
Jul 30, 2019 | askubuntu.com

[Jul 29, 2019] How do I tar a directory of files and folders without including the directory itself - Stack Overflow

Jan 05, 2017 | stackoverflow.com

How do I tar a directory of files and folders without including the directory itself? Ask Question Asked 10 years, 1 month ago Active 8 months ago Viewed 464k times 348 105


tvanfosson ,Jan 5, 2017 at 12:29

I typically do:
tar -czvf my_directory.tar.gz my_directory

What if I just want to include everything (including any hidden system files) in my_directory, but not the directory itself? I don't want:

my_directory
   --- my_file
   --- my_file
   --- my_file

I want:

my_file
my_file
my_file

PanCrit ,Feb 19 at 13:04

cd my_directory/ && tar -zcvf ../my_dir.tgz . && cd -

should do the job in one line. It works well for hidden files as well. "*" doesn't expand hidden files by path name expansion at least in bash. Below is my experiment:

$ mkdir my_directory
$ touch my_directory/file1
$ touch my_directory/file2
$ touch my_directory/.hiddenfile1
$ touch my_directory/.hiddenfile2
$ cd my_directory/ && tar -zcvf ../my_dir.tgz . && cd ..
./
./file1
./file2
./.hiddenfile1
./.hiddenfile2
$ tar ztf my_dir.tgz
./
./file1
./file2
./.hiddenfile1
./.hiddenfile2

JCotton ,Mar 3, 2015 at 2:46

Use the -C switch of tar:
tar -czvf my_directory.tar.gz -C my_directory .

The -C my_directory tells tar to change the current directory to my_directory , and then . means "add the entire current directory" (including hidden files and sub-directories).

Make sure you do -C my_directory before you do . or else you'll get the files in the current directory.

Digger ,Mar 23 at 6:52

You can also create archive as usual and extract it with:
tar --strip-components 1 -xvf my_directory.tar.gz

jwg ,Mar 8, 2017 at 12:56

Have a look at --transform / --xform , it gives you the opportunity to massage the file name as the file is added to the archive:
% mkdir my_directory
% touch my_directory/file1
% touch my_directory/file2
% touch my_directory/.hiddenfile1
% touch my_directory/.hiddenfile2
% tar -v -c -f my_dir.tgz --xform='s,my_directory/,,' $(find my_directory -type f)
my_directory/file2
my_directory/.hiddenfile1
my_directory/.hiddenfile2
my_directory/file1
% tar -t -f my_dir.tgz 
file2
.hiddenfile1
.hiddenfile2
file1

Transform expression is similar to that of sed , and we can use separators other than / ( , in the above example).
https://www.gnu.org/software/tar/manual/html_section/tar_52.html

Alex ,Mar 31, 2017 at 15:40

TL;DR
find /my/dir/ -printf "%P\n" | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -

With some conditions (archive only files, dirs and symlinks):

find /my/dir/ -printf "%P\n" -type f -o -type l -o -type d | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -
Explanation

The below unfortunately includes a parent directory ./ in the archive:

tar -czf mydir.tgz -C /my/dir .

You can move all the files out of that directory by using the --transform configuration option, but that doesn't get rid of the . directory itself. It becomes increasingly difficult to tame the command.

You could use $(find ...) to add a file list to the command (like in magnus' answer ), but that potentially causes a "file list too long" error. The best way is to combine it with tar's -T option, like this:

find /my/dir/ -printf "%P\n" -type f -o -type l -o -type d | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -

Basically what it does is list all files ( -type f ), links ( -type l ) and subdirectories ( -type d ) under your directory, make all filenames relative using -printf "%P\n" , and then pass that to the tar command (it takes filenames from STDIN using -T - ). The -C option is needed so tar knows where the files with relative names are located. The --no-recursion flag is so that tar doesn't recurse into folders it is told to archive (causing duplicate files).

If you need to do something special with filenames (filtering, following symlinks etc), the find command is pretty powerful, and you can test it by just removing the tar part of the above command:

$ find /my/dir/ -printf "%P\n" -type f -o -type l -o -type d
> textfile.txt
> documentation.pdf
> subfolder2
> subfolder
> subfolder/.gitignore

For example if you want to filter PDF files, add ! -name '*.pdf'

$ find /my/dir/ -printf "%P\n" -type f ! -name '*.pdf' -o -type l -o -type d
> textfile.txt
> subfolder2
> subfolder
> subfolder/.gitignore
Non-GNU find

The command uses printf (available in GNU find ) which tells find to print its results with relative paths. However, if you don't have GNU find , this works to make the paths relative (removes parents with sed ):

find /my/dir/ -type f -o -type l -o -type d | sed s,^/my/dir/,, | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -

BrainStone ,Dec 21, 2016 at 22:14

This Answer should work in most situations. Notice however how the filenames are stored in the tar file as, for example, ./file1 rather than just file1 . I found that this caused problems when using this method to manipulate tarballs used as package files in BuildRoot .

One solution is to use some Bash globs to list all files except for .. like this:

tar -C my_dir -zcvf my_dir.tar.gz .[^.]* ..?* *

This is a trick I learnt from this answer .

Now tar will return an error if there are no files matching ..?* or .[^.]* , but it will still work. If the error is a problem (you are checking for success in a script), this works:

shopt -s nullglob
tar -C my_dir -zcvf my_dir.tar.gz .[^.]* ..?* *
shopt -u nullglob

Though now we are messing with shell options, we might decide that it is neater to have * match hidden files:

shopt -s dotglob
tar -C my_dir -zcvf my_dir.tar.gz *
shopt -u dotglob

This might not work where your shell globs * in the current directory, so alternatively, use:

shopt -s dotglob
cd my_dir
tar -zcvf ../my_dir.tar.gz *
cd ..
shopt -u dotglob

PanCrit ,Jun 14, 2010 at 6:47

cd my_directory
tar zcvf ../my_directory.tar.gz *

anion ,May 11, 2018 at 14:10

If it's a Unix/Linux system, and you care about hidden files (which will be missed by *), you need to do:
cd my_directory
tar zcvf ../my_directory.tar.gz * .??*

I don't know what hidden files look like under Windows.

gpz500 ,Feb 27, 2014 at 10:46

I would propose the following Bash function (first argument is the path to the dir, second argument is the basename of resulting archive):
function tar_dir_contents ()
{
    local DIRPATH="$1"
    local TARARCH="$2.tar.gz"
    local ORGIFS="$IFS"
    IFS=$'\n'
    tar -C "$DIRPATH" -czf "$TARARCH" $( ls -a "$DIRPATH" | grep -v '\(^\.$\)\|\(^\.\.$\)' )
    IFS="$ORGIFS"
}

You can run it in this way:

$ tar_dir_contents /path/to/some/dir my_archive

and it will generate the archive my_archive.tar.gz within current directory. It works with hidden (.*) elements and with elements with spaces in their filename.

med ,Feb 9, 2017 at 17:19

cd my_directory && tar -czvf ../my_directory.tar.gz $(ls -A) && cd ..

This one worked for me and it's include all hidden files without putting all files in a root directory named "." like in tomoe's answer :

Breno Salgado ,Apr 16, 2016 at 15:42

Use pax.

Pax is a deprecated package but does the job perfectly and in a simple fashion.

pax -w > mydir.tar mydir

asynts ,Jun 26 at 16:40

Simplest way I found:

cd my_dir && tar -czvf ../my_dir.tar.gz *

marcingo ,Aug 23, 2016 at 18:04

# tar all files within and deeper in a given directory
# with no prefixes ( neither <directory>/ nor ./ )
# parameters: <source directory> <target archive file>
function tar_all_in_dir {
    { cd "$1" && find -type f -print0; } \
    | cut --zero-terminated --characters=3- \
    | tar --create --file="$2" --directory="$1" --null --files-from=-
}

Safely handles filenames with spaces or other unusual characters. You can optionally add a -name '*.sql' or similar filter to the find command to limit the files included.

user1456599 ,Feb 13, 2013 at 21:37

 tar -cvzf  tarlearn.tar.gz --remove-files mytemp/*

If the folder is mytemp then if you apply the above it will zip and remove all the files in the folder but leave it alone

 tar -cvzf  tarlearn.tar.gz --remove-files --exclude='*12_2008*' --no-recursion mytemp/*

You can give exclude patterns and also specify not to look into subfolders too

Aaron Digulla ,Jun 2, 2009 at 15:33

tar -C my_dir -zcvf my_dir.tar.gz `ls my_dir`

[Jun 26, 2019] 7,000 Developers Report Their Top Languages: Java, JavaScript, and Python

The article mixes apples and oranges and demonstrates complete ignorance in the of language classification.
Two of the three top language are scripting languages. This is a huge victory. But Python has problems with efficiency (not that they matter everywhere) and is far from being an elegant language. It entered mainstream via the adoption it at universities as the first programming language, displacing Java (which I think might be a mistake -- I think teaching should start with assembler and replicate the history of development -- assembler -- compiled languages -- scripting language)
Perl which essentially heralded the era of scripting languages is now losing its audience and shrinks to its initial purpose -- the tool for Unix system administrators. But I think is such surveys its use is underreported for obvious reasons -- it is not fashionable. But please note that Fortran is still widely used.
Go is just veriant of a "better C" -- statically typed, compiled language. Rust is an attempt to improve C++. Both belong to the class of compiled languages. So complied language still hold their own and are important part of the ecosystem. See also How Rust Compares to Other Programming Languages - The New Stack
Jun 26, 2019 | developers.slashdot.org
The report surveyed about 7,000 developers worldwide, and revealed Python is the most studied programming language, the most loved language , and the third top primary programming language developers are using... The top use cases developers are using Python for include data analysis, web development, machine learning and writing automation scripts, according to the JetBrains report . More developers are also beginning to move over to Python 3, with 9 out of 10 developers using the current version.

The JetBrains report also found while Go is still a young language, it is the most promising programming language. "Go started out with a share of 8% in 2017 and now it has reached 18%. In addition, the biggest number of developers (13%) chose Go as a language they would like to adopt or migrate to," the report stated...

Seventy-three percent of JavaScript developers use TypeScript, which is up from 17 percent last year. Seventy-one percent of Kotlin developers use Kotlin for work. Java 8 is still the most popular programming language, but developers are beginning to migrate to Java 10 and 11.
JetBrains (which designed Kotlin in 2011) also said that 60% of their survey's respondents identified themselves as professional web back-end developers (while 46% said they did web front-end, and 23% developed mobile applications). 41% said they hadn't contributed to open source projects "but I would like to," while 21% said they contributed "several times a year."

"16% of developers don't have any tests in their projects. Among fully-employed senior developers though, that statistic is just 8%. Like last year, about 30% of developers still don't have unit tests in their projects." Other interesting statistics: 52% say they code in their dreams. 57% expect AI to replace developers "partially" in the future. "83% prefer the Dark theme for their editor or IDE. This represents a growth of 6 percentage points since last year for each environment. 47% take public transit to work.

And 97% of respondents using Rust "said they have been using Rust for less than a year. With only 14% using it for work, it's much more popular as a language for personal/side projects." And more than 90% of the Rust developers who responded worked with codebases with less than 300 files.

[Jun 23, 2019] Utilizing multi core for tar+gzip-bzip compression-decompression

Highly recommended!
Notable quotes:
"... There is effectively no CPU time spent tarring, so it wouldn't help much. The tar format is just a copy of the input file with header blocks in between files. ..."
"... You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use. ..."
Jun 23, 2019 | stackoverflow.com

user1118764 , Sep 7, 2012 at 6:58

I normally compress using tar zcvf and decompress using tar zxvf (using gzip due to habit).

I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I notice that many of the cores are unused during compression/decompression.

Is there any way I can utilize the unused cores to make it faster?

Warren Severin , Nov 13, 2017 at 4:37

The solution proposed by Xiong Chiamiov above works beautifully. I had just backed up my laptop with .tar.bz2 and it took 132 minutes using only one cpu thread. Then I compiled and installed tar from source: gnu.org/software/tar I included the options mentioned in the configure step: ./configure --with-gzip=pigz --with-bzip2=lbzip2 --with-lzip=plzip I ran the backup again and it took only 32 minutes. That's better than 4X improvement! I watched the system monitor and it kept all 4 cpus (8 threads) flatlined at 100% the whole time. THAT is the best solution. – Warren Severin Nov 13 '17 at 4:37

Mark Adler , Sep 7, 2012 at 14:48

You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz:
tar cf - paths-to-archive | pigz > archive.tar.gz

By default, pigz uses the number of available cores, or eight if it could not query that. You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can request better compression with -9. E.g.

tar cf - paths-to-archive | pigz -9 -p 32 > archive.tar.gz

user788171 , Feb 20, 2013 at 12:43

How do you use pigz to decompress in the same fashion? Or does it only work for compression?

Mark Adler , Feb 20, 2013 at 16:18

pigz does use multiple cores for decompression, but only with limited improvement over a single core. The deflate format does not lend itself to parallel decompression.

The decompression portion must be done serially. The other cores for pigz decompression are used for reading, writing, and calculating the CRC. When compressing on the other hand, pigz gets close to a factor of n improvement with n cores.

Garrett , Mar 1, 2014 at 7:26

The hyphen here is stdout (see this page ).

Mark Adler , Jul 2, 2014 at 21:29

Yes. 100% compatible in both directions.

Mark Adler , Apr 23, 2015 at 5:23

There is effectively no CPU time spent tarring, so it wouldn't help much. The tar format is just a copy of the input file with header blocks in between files.

Jen , Jun 14, 2013 at 14:34

You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use.

For example use:

tar -c --use-compress-program=pigz -f tar.file dir_to_zip

Valerio Schiavoni , Aug 5, 2014 at 22:38

Unfortunately by doing so the concurrent feature of pigz is lost. You can see for yourself by executing that command and monitoring the load on each of the cores. – Valerio Schiavoni Aug 5 '14 at 22:38

bovender , Sep 18, 2015 at 10:14

@ValerioSchiavoni: Not here, I get full load on all 4 cores (Ubuntu 15.04 'Vivid'). – bovender Sep 18 '15 at 10:14

Valerio Schiavoni , Sep 28, 2015 at 23:41

On compress or on decompress ? – Valerio Schiavoni Sep 28 '15 at 23:41

Offenso , Jan 11, 2017 at 17:26

I prefer tar - dir_to_zip | pv | pigz > tar.file pv helps me estimate, you can skip it. But still it easier to write and remember. – Offenso Jan 11 '17 at 17:26

Maxim Suslov , Dec 18, 2014 at 7:31

Common approach

There is option for tar program:

-I, --use-compress-program PROG
      filter through PROG (must accept -d)

You can use multithread version of archiver or compressor utility.

Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:

$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive
$ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archive

Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need specify additional parameters, then use pipes (add parameters if necessary):

$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz
$ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gz

Input and output of singlethread and multithread are compatible. You can compress using multithread version and decompress using singlethread version and vice versa.

p7zip

For p7zip for compression you need a small shell script like the following:

#!/bin/sh
case $1 in
  -d) 7za -txz -si -so e;;
   *) 7za -txz -si -so a .;;
esac 2>/dev/null

Save it as 7zhelper.sh. Here the example of usage:

$ tar -I 7zhelper.sh -cf OUTPUT_FILE.tar.7z paths_to_archive
$ tar -I 7zhelper.sh -xf OUTPUT_FILE.tar.7z
xz

Regarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils, you can utilize multiple cores for compression by setting -T or --threads to an appropriate value via the environmental variable XZ_DEFAULTS (e.g. XZ_DEFAULTS="-T 0" ).

This is a fragment of man for 5.1.0alpha version:

Multithreaded compression and decompression are not implemented yet, so this option has no effect for now.

However this will not work for decompression of files that haven't also been compressed with threading enabled. From man for version 5.2.2:

Threaded decompression hasn't been implemented yet. It will only work on files that contain multiple blocks with size information in block headers. All files compressed in multi-threaded mode meet this condition, but files compressed in single-threaded mode don't even if --block-size=size is used.

Recompiling with replacement

If you build tar from sources, then you can recompile with parameters

--with-gzip=pigz
--with-bzip2=lbzip2
--with-lzip=plzip

After recompiling tar with these options you can check the output of tar's help:

$ tar --help | grep "lbzip2\|plzip\|pigz"
  -j, --bzip2                filter the archive through lbzip2
      --lzip                 filter the archive through plzip
  -z, --gzip, --gunzip, --ungzip   filter the archive through pigz

mpibzip2 , Apr 28, 2015 at 20:57

I just found pbzip2 and mpibzip2 . mpibzip2 looks very promising for clusters or if you have a laptop and a multicore desktop computer for instance. – user1985657 Apr 28 '15 at 20:57

oᴉɹǝɥɔ , Jun 10, 2015 at 17:39

Processing STDIN may in fact be slower. – oᴉɹǝɥɔ Jun 10 '15 at 17:39

selurvedu , May 26, 2016 at 22:13

Plus 1 for xz option. It the simplest, yet effective approach. – selurvedu May 26 '16 at 22:13

panticz.de , Sep 1, 2014 at 15:02

You can use the shortcut -I for tar's --use-compress-program switch, and invoke pbzip2 for bzip2 compression on multiple cores:
tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 DIRECTORY_TO_COMPRESS/

einpoklum , Feb 11, 2017 at 15:59

A nice TL;DR for @MaximSuslov's answer . – einpoklum Feb 11 '17 at 15:59
If you want to have more flexibility with filenames and compression options, you can use:
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec \
tar -P --transform='s@/my/path/@@g' -cf - {} + | \
pigz -9 -p 4 > myarchive.tar.gz
Step 1: find

find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec

This command will look for the files you want to archive, in this case /my/path/*.sql and /my/path/*.log . Add as many -o -name "pattern" as you want.

-exec will execute the next command using the results of find : tar

Step 2: tar

tar -P --transform='s@/my/path/@@g' -cf - {} +

--transform is a simple string replacement parameter. It will strip the path of the files from the archive so the tarball's root becomes the current directory when extracting. Note that you can't use -C option to change directory as you'll lose benefits of find : all files of the directory would be included.

-P tells tar to use absolute paths, so it doesn't trigger the warning "Removing leading `/' from member names". Leading '/' with be removed by --transform anyway.

-cf - tells tar to use the tarball name we'll specify later

{} + uses everyfiles that find found previously

Step 3: pigz

pigz -9 -p 4

Use as many parameters as you want. In this case -9 is the compression level and -p 4 is the number of cores dedicated to compression. If you run this on a heavy loaded webserver, you probably don't want to use all available cores.

Step 4: archive name

> myarchive.tar.gz

Finally.

[May 24, 2019] How to send keystrokes from one computer to another by USB?

Notable quotes:
"... On a different note, have you considered a purely software/network solution such as TightVNC ? ..."
Aug 05, 2018 | stackoverflow.com

Ask Question


Yehonatan ,Aug 5, 2018 at 6:34

Is there a way to use one computer to send keystrokes to another by usb ?

What i'm looking to do is to capture the usb signal used by a keyboard (with USBTrace for example) and use it with PC-1 to send it to PC-2. So that PC-2 recognize it as a regular keyboard input.

Some leads to do this would be very appreciated.

Lucas ,Jan 16, 2011 at 19:18

What you essentially need is a USB port on PC-1 that will act as a USB device for PC-2.

That is not possible for the vast majority of PC systems because USB is an asymmetric bus, with a host/device (or master/slave, if you wish) architecture. USB controllers (and their ports) on most PCs can only work in host mode and cannot simulate a device.

That is the reason that you cannot network computers through USB without a special cable with specialized electronics.

The only exception is if you somehow have a PC that supports the USB On-The-Go standard that allows for a USB port to act in both host and device mode. USB-OTG devices do exist, but they are usually embedded devices (smartphones etc). I don't know if there is a way to add a USB-OTG port to a commodity PC.

EDIT:

If you do not need a keyboard before the OS on PC-2 boots, you might be able to use a pair of USB Bluetooth dongles - one on each PC. You'd have to use specialised software on PC-1, but it is definitely possible - I've already seen a possible implementation on Linux , and I am reasonably certain that there must be one for Windows. You will also need Bluetooth HID drivers on PC-2, if they are not already installed.

On a different note, have you considered a purely software/network solution such as TightVNC ?

bebbo ,Sep 20, 2017 at 18:14

There is a solution: https://github.com/Flowm/etherkey

This uses a network connection from your computer to the raspi which is connected to a teensy (usb developer board) to send the key strokes.

This solution is not an out-of-the-box product. The required skill is similar to programming some other devices like arduino. But it's a complete and working setup.

Yehonatan ,Jan 25, 2011 at 5:51

The cheapest options are commercial microcontrollers (eg arduino platform, pic, etc) or ready built usb keyboard controllers (eg i-pac, arcade controllers,etc)

Benoit-Pierre DEMAINE ,Oct 27, 2017 at 17:17

SEARCH THIS PROGRAM:

TWedge: Keyboard Wedge Software (RS232, Serial, TCP, Bluetooth)

then, MAKE YOUR OWN CONNECTION CABLE WITH:

(usb <-> rs232) + (NULL MODEM) + (rs232 <-> usb)

Connect 2 computer, write your own program to send signal to your (usb <-> rs232) unit, then you can control another computer under the help of TWedge.

> ,

The above mentionned https://github.com/Flowm/etherkey is one way. The keyboard is emulated from an rPi, but the principle can be used from PC to PC (or Mac to Whatever). The core answer to your question is to use an OTG-capable chip, and then you control this chip via a USB-serial adapter.

https://euer.krebsco.de/a-software-kvm-switch.html uses a very similar method, using an Arduino instead of the Teensy.

The generic answer is: you need an OTG capable, or slave capable device: Arduino, Teensy, Pi 0 (either from Rapberry or Orange brands, both work; only the ZERO models are OTG capable), or, an rPi-A with heavy customisation (since it does not include USB hub, it can theoretically be converted into a slave; never found any public tutorial to do it), or any smartphone (Samsung, Nokia, HTC, Oukitel ... most smartphones are OTG capable). If you go for a Pi or a phone, then, you want to dig around USB Gadget. Cheaper solutions (Arduino/Teensy) need custom firmware.

[May 03, 2019] RedHat local repository and offline updates

Aug 03, 2018 | stackoverflow.com

My company just bought a two redhat license for two physical machines , the machines wont be accessible via internet , so we have an issue here regarding the updates , patches , ... etc .

i am thinking of configuring a local repository to be accessible via internet and have all the necessary updates but there is a problem here that i have only two licenses , is it applicable if i activate the local repository for the updates and one of my two service machines , or is there any other way like if there is some sort of offline package that i can download it separately from redhat and update my machines without internet access ?

thanks in advance

XXX

You have several options:

See How can we regularly update a disconnected system (A system without internet connection)? for details.

[Mar 20, 2019] How do I troubleshoot a yum repository problem that has an error No package available error?

Mar 20, 2019 | unix.stackexchange.com

Kiran ,Jan 2, 2017 at 23:57

I have three RHEL 6.6 servers. One has a yum repository that I know works. The other two servers I will refer to as "yum clients." These two are configured to use the same yum repository (the first server described). When I do yum install httpd on each of these two yum client servers, I get two different results. One server prepares for the installation as normal and prompts me with a y/n prompt. The second server says

No package httpd available.

The /etc/yum.conf files on each of the two servers is identical. The /etc/yum.repos.d/ directories have the same .repo files. Why does one yum client not see the httpd package? I use httpd as an example. One yum client cannot install any package. The other yum client can install anything. Neither have access to the Internet or different servers the other one does not have access to.

XXX

If /etc/yum.conf is identical on all servers, and that package is not listed there in exclude line, check if the repo is enabled on all the servers.

Do

grep enabled /etc/yum.repos.d/filename.repo

and see if it is set to 0 or 1.

value of enabled needs to be set to 1, for yum to use that repo.

If repo is not enabled, you can edit the repo file, and change the enable to 1, or try to run yum with enablerepo switch, to enable it for that operation.

Try to run yum like this.

yum --enablerepo=repo_name install package_name

[Mar 20, 2019] How to I print to STDERR only if STDOUT is a different destination?

Mar 14, 2013 | stackoverflow.com

squiguy, Mar 14, 2013 at 19:06

I would like Perl to write to STDERR only if STDOUT is not the same. For example, if both STDOUT and STDERR would redirect output to the Terminal, then I don't want STDERR to be printed.

Consider the following example (outerr.pl):

#!/usr/bin/perl

use strict;
use warnings;

print STDOUT "Hello standard output!\n";
print STDERR "Hello standard error\n" if ($someMagicalFlag);
exit 0

Now consider this (this is what I would like to achieve):

bash $ outerr.pl
Hello standard output!

However, if I redirect out to a file, I'd like to get:

bash $ outerr.pl > /dev/null
Hello standard error

and similary the other way round:

bash $ outerr.pl 2> /dev/null
Hello standard output!

If I re-direct both out/err to the same file, then only stdout should be displayed:

bash $ outerr.pl > foo.txt 2>&1
bash $ cat foo.txt
Hello standard output!

So is there a way to evaluate / determine whether OUT and ERR and are pointing to the same "thing" (descriptor?)?

tchrist ,Mar 15, 2013 at 5:07

On Unix-style systems, you should be able to do:
my @stat_err = stat STDERR;
my @stat_out = stat STDOUT;

my $stderr_is_not_stdout = (($stat_err[0] != $stat_out[0]) ||
                            ($stat_err[1] != $stat_out[1]));

But that won't work on Windows, which doesn't have real inode numbers. It gives both false positives (thinks they're different when they aren't) and false negatives (thinks they're the same when they aren't).

Jim Stewart ,Mar 14, 2013 at 20:59

You can do that (almost) with -t:
-t STDERR

will be true if it is a terminal, and likewise for STDOUT.

This still would not tell you what terminal, and if you redirect to the same file, you may stilll get both.

Hence, if

-t STDERR && ! (-t STDOUT) || -t STDOUT && !(-t STDERR)

or shorter

-t STDOUT ^ -t STDERR  # thanks to @mob

you know you're okay.

EDIT: Solutions for the case that both STDERR and STDOUT are regular files:

Tom Christianson suggested to stat and compare the dev and ino fields. This will work in UNIX, but, as @cjm pointed out, not in Windows.

If you can guarantee that no other program will write to the file, you could do the following both in Windows and UNIX:

  1. check the position the file descriptors for STDOUT and STDERR are at, if they are not equal, you redirected one of them with >> to a nonempty file.
  2. Otherwise, write 42 bytes to file descriptor 2
  3. Seek to the end of file descriptor 1. If it is 42 more than before, chances are high that both are redirected to the same file. If it is unchanged, files are different. If it is changed, but not by 42, someone else is writing there, all bets are off (but then, you're not in Windows, so the stat method will work).

[Mar 17, 2019] Translating Perl to Python

Mar 17, 2019 | stackoverflow.com

21


John Kugelman ,Jul 1, 2009 at 3:29

I found this Perl script while migrating my SQLite database to mysql

I was wondering (since I don't know Perl) how could one rewrite this in Python?

Bonus points for the shortest (code) answer :)

edit : sorry I meant shortest code, not strictly shortest answer

#! /usr/bin/perl

while ($line = <>){
    if (($line !~  /BEGIN TRANSACTION/) && ($line !~ /COMMIT/) && ($line !~ /sqlite_sequence/) && ($line !~ /CREATE UNIQUE INDEX/)){

        if ($line =~ /CREATE TABLE \"([a-z_]*)\"(.*)/){
                $name = $1;
                $sub = $2;
                $sub =~ s/\"//g; #"
                $line = "DROP TABLE IF EXISTS $name;\nCREATE TABLE IF NOT EXISTS $name$sub\n";
        }
        elsif ($line =~ /INSERT INTO \"([a-z_]*)\"(.*)/){
                $line = "INSERT INTO $1$2\n";
                $line =~ s/\"/\\\"/g; #"
                $line =~ s/\"/\'/g; #"
        }else{
                $line =~ s/\'\'/\\\'/g; #'
        }
        $line =~ s/([^\\'])\'t\'(.)/$1THIS_IS_TRUE$2/g; #'
        $line =~ s/THIS_IS_TRUE/1/g;
        $line =~ s/([^\\'])\'f\'(.)/$1THIS_IS_FALSE$2/g; #'
        $line =~ s/THIS_IS_FALSE/0/g;
        $line =~ s/AUTOINCREMENT/AUTO_INCREMENT/g;
        print $line;
    }
}

Some additional code was necessary to successfully migrate the sqlite database (handles one line Create table statements, foreign keys, fixes a bug in the original program that converted empty fields '' to \' .

I posted the code on the migrating my SQLite database to mysql Question

Jiaaro ,Jul 2, 2009 at 10:15

Here's a pretty literal translation with just the minimum of obvious style changes (putting all code into a function, using string rather than re operations where possible).
import re, fileinput

def main():
  for line in fileinput.input():
    process = False
    for nope in ('BEGIN TRANSACTION','COMMIT',
                 'sqlite_sequence','CREATE UNIQUE INDEX'):
      if nope in line: break
    else:
      process = True
    if not process: continue
    m = re.search('CREATE TABLE "([a-z_]*)"(.*)', line)
    if m:
      name, sub = m.groups()
      line = '''DROP TABLE IF EXISTS %(name)s;
CREATE TABLE IF NOT EXISTS %(name)s%(sub)s
'''
      line = line % dict(name=name, sub=sub)
    else:
      m = re.search('INSERT INTO "([a-z_]*)"(.*)', line)
      if m:
        line = 'INSERT INTO %s%s\n' % m.groups()
        line = line.replace('"', r'\"')
        line = line.replace('"', "'")
    line = re.sub(r"([^'])'t'(.)", r"\1THIS_IS_TRUE\2", line)
    line = line.replace('THIS_IS_TRUE', '1')
    line = re.sub(r"([^'])'f'(.)", r"\1THIS_IS_FALSE\2", line)
    line = line.replace('THIS_IS_FALSE', '0')
    line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')
    print line,

main()

dr jimbob ,May 20, 2018 at 0:54

Alex Martelli's solution above works good, but needs some fixes and additions:

In the lines using regular expression substitution, the insertion of the matched groups must be double-escaped OR the replacement string must be prefixed with r to mark is as regular expression:

line = re.sub(r"([^'])'t'(.)", "\\1THIS_IS_TRUE\\2", line)

or

line = re.sub(r"([^'])'f'(.)", r"\1THIS_IS_FALSE\2", line)

Also, this line should be added before print:

line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')

Last, the column names in create statements should be backticks in MySQL. Add this in line 15:

  sub = sub.replace('"','`')

Here's the complete script with modifications:

import re, fileinput

def main():
  for line in fileinput.input():
    process = False
    for nope in ('BEGIN TRANSACTION','COMMIT',
                 'sqlite_sequence','CREATE UNIQUE INDEX'):
      if nope in line: break
    else:
      process = True
    if not process: continue
    m = re.search('CREATE TABLE "([a-z_]*)"(.*)', line)
    if m:
      name, sub = m.groups()
      sub = sub.replace('"','`')
      line = '''DROP TABLE IF EXISTS %(name)s;
CREATE TABLE IF NOT EXISTS %(name)s%(sub)s
'''
      line = line % dict(name=name, sub=sub)
    else:
      m = re.search('INSERT INTO "([a-z_]*)"(.*)', line)
      if m:
        line = 'INSERT INTO %s%s\n' % m.groups()
        line = line.replace('"', r'\"')
        line = line.replace('"', "'")
    line = re.sub(r"([^'])'t'(.)", "\\1THIS_IS_TRUE\\2", line)
    line = line.replace('THIS_IS_TRUE', '1')
    line = re.sub(r"([^'])'f'(.)", "\\1THIS_IS_FALSE\\2", line)
    line = line.replace('THIS_IS_FALSE', '0')
    line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')
    if re.search('^CREATE INDEX', line):
        line = line.replace('"','`')
    print line,

main()

Brad Gilbert ,Jul 1, 2009 at 18:43

Here is a slightly better version of the original.
#! /usr/bin/perl
use strict;
use warnings;
use 5.010; # for s/\K//;

while( <> ){
  next if m'
    BEGIN TRANSACTION   |
    COMMIT              |
    sqlite_sequence     |
    CREATE UNIQUE INDEX
  'x;

  if( my($name,$sub) = m'CREATE TABLE \"([a-z_]*)\"(.*)' ){
    # remove "
    $sub =~ s/\"//g; #"
    $_ = "DROP TABLE IF EXISTS $name;\nCREATE TABLE IF NOT EXISTS $name$sub\n";

  }elsif( /INSERT INTO \"([a-z_]*)\"(.*)/ ){
    $_ = "INSERT INTO $1$2\n";

    # " => \"
    s/\"/\\\"/g; #"
    # " => '
    s/\"/\'/g; #"

  }else{
    # '' => \'
    s/\'\'/\\\'/g; #'
  }

  # 't' => 1
  s/[^\\']\K\'t\'/1/g; #'

  # 'f' => 0
  s/[^\\']\K\'f\'/0/g; #'

  s/AUTOINCREMENT/AUTO_INCREMENT/g;
  print;
}

Mickey Mouse ,Jun 14, 2011 at 15:48

all of scripts on this page can't deal with simple sqlite3:
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE Filename (
  FilenameId INTEGER,
  Name TEXT DEFAULT '',
  PRIMARY KEY(FilenameId) 
  );
INSERT INTO "Filename" VALUES(1,'');
INSERT INTO "Filename" VALUES(2,'bigfile1');
INSERT INTO "Filename" VALUES(3,'%gconf-tree.xml');

None were able to reformat "table_name" into proper mysql's `table_name` . Some messed up empty string value.

Sinan Ünür ,Jul 1, 2009 at 3:24

I am not sure what is so hard to understand about this that it requires a snide remark as in your comment above. Note that <> is called the diamond operator. s/// is the substitution operator and // is the match operator m// .

Ken_g6 ,Jul 1, 2009 at 3:22

Based on http://docs.python.org/dev/howto/regex.html ...
  1. Replace $line =~ /.*/ with re.search(r".*", line) .
  2. $line !~ /.*/ is just !($line =~ /.*/) .
  3. Replace $line =~ s/.*/x/g with line=re.sub(r".*", "x", line) .
  4. Replace $1 through $9 inside re.sub with \1 through \9 respectively.
  5. Outside a sub, save the return value, i.e. m=re.search() , and replace $1 with the return value of m.group(1) .
  6. For "INSERT INTO $1$2\n" specifically, you can do "INSERT INTO %s%s\n" % (m.group(1), m.group(2)) .

hpavc ,Jul 1, 2009 at 12:33

Real issue is do you know actually how to migrate the database? What is presented is merely a search and replace loop.

> ,

Shortest? The tilde signifies a regex in perl. "import re" and go from there. The only key differences are that you'll be using \1 and \2 instead of $1 and $2 when you assign values, and you'll be using %s for when you're replacing regexp matches inside strings.

[Mar 16, 2019] Translating Perl to Python - Stack Overflow

Mar 16, 2019 | stackoverflow.com

Translating Perl to Python Ask Question 21


John Kugelman ,Jul 1, 2009 at 3:29

I found this Perl script while migrating my SQLite database to mysql

I was wondering (since I don't know Perl) how could one rewrite this in Python?

Bonus points for the shortest (code) answer :)

edit : sorry I meant shortest code, not strictly shortest answer

#! /usr/bin/perl

while ($line = <>){
    if (($line !~  /BEGIN TRANSACTION/) && ($line !~ /COMMIT/) && ($line !~ /sqlite_sequence/) && ($line !~ /CREATE UNIQUE INDEX/)){

        if ($line =~ /CREATE TABLE \"([a-z_]*)\"(.*)/){
                $name = $1;
                $sub = $2;
                $sub =~ s/\"//g; #"
                $line = "DROP TABLE IF EXISTS $name;\nCREATE TABLE IF NOT EXISTS $name$sub\n";
        }
        elsif ($line =~ /INSERT INTO \"([a-z_]*)\"(.*)/){
                $line = "INSERT INTO $1$2\n";
                $line =~ s/\"/\\\"/g; #"
                $line =~ s/\"/\'/g; #"
        }else{
                $line =~ s/\'\'/\\\'/g; #'
        }
        $line =~ s/([^\\'])\'t\'(.)/$1THIS_IS_TRUE$2/g; #'
        $line =~ s/THIS_IS_TRUE/1/g;
        $line =~ s/([^\\'])\'f\'(.)/$1THIS_IS_FALSE$2/g; #'
        $line =~ s/THIS_IS_FALSE/0/g;
        $line =~ s/AUTOINCREMENT/AUTO_INCREMENT/g;
        print $line;
    }
}

Some additional code was necessary to successfully migrate the sqlite database (handles one line Create table statements, foreign keys, fixes a bug in the original program that converted empty fields '' to \' .

I posted the code on the migrating my SQLite database to mysql Question

Jiaaro ,Jul 2, 2009 at 10:15

Here's a pretty literal translation with just the minimum of obvious style changes (putting all code into a function, using string rather than re operations where possible).
import re, fileinput

def main():
  for line in fileinput.input():
    process = False
    for nope in ('BEGIN TRANSACTION','COMMIT',
                 'sqlite_sequence','CREATE UNIQUE INDEX'):
      if nope in line: break
    else:
      process = True
    if not process: continue
    m = re.search('CREATE TABLE "([a-z_]*)"(.*)', line)
    if m:
      name, sub = m.groups()
      line = '''DROP TABLE IF EXISTS %(name)s;
CREATE TABLE IF NOT EXISTS %(name)s%(sub)s
'''
      line = line % dict(name=name, sub=sub)
    else:
      m = re.search('INSERT INTO "([a-z_]*)"(.*)', line)
      if m:
        line = 'INSERT INTO %s%s\n' % m.groups()
        line = line.replace('"', r'\"')
        line = line.replace('"', "'")
    line = re.sub(r"([^'])'t'(.)", r"\1THIS_IS_TRUE\2", line)
    line = line.replace('THIS_IS_TRUE', '1')
    line = re.sub(r"([^'])'f'(.)", r"\1THIS_IS_FALSE\2", line)
    line = line.replace('THIS_IS_FALSE', '0')
    line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')
    print line,

main()

dr jimbob ,May 20, 2018 at 0:54

Alex Martelli's solution above works good, but needs some fixes and additions:

In the lines using regular expression substitution, the insertion of the matched groups must be double-escaped OR the replacement string must be prefixed with r to mark is as regular expression:

line = re.sub(r"([^'])'t'(.)", "\\1THIS_IS_TRUE\\2", line)

or

line = re.sub(r"([^'])'f'(.)", r"\1THIS_IS_FALSE\2", line)

Also, this line should be added before print:

line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')

Last, the column names in create statements should be backticks in MySQL. Add this in line 15:

  sub = sub.replace('"','`')

Here's the complete script with modifications:

import re, fileinput

def main():
  for line in fileinput.input():
    process = False
    for nope in ('BEGIN TRANSACTION','COMMIT',
                 'sqlite_sequence','CREATE UNIQUE INDEX'):
      if nope in line: break
    else:
      process = True
    if not process: continue
    m = re.search('CREATE TABLE "([a-z_]*)"(.*)', line)
    if m:
      name, sub = m.groups()
      sub = sub.replace('"','`')
      line = '''DROP TABLE IF EXISTS %(name)s;
CREATE TABLE IF NOT EXISTS %(name)s%(sub)s
'''
      line = line % dict(name=name, sub=sub)
    else:
      m = re.search('INSERT INTO "([a-z_]*)"(.*)', line)
      if m:
        line = 'INSERT INTO %s%s\n' % m.groups()
        line = line.replace('"', r'\"')
        line = line.replace('"', "'")
    line = re.sub(r"([^'])'t'(.)", "\\1THIS_IS_TRUE\\2", line)
    line = line.replace('THIS_IS_TRUE', '1')
    line = re.sub(r"([^'])'f'(.)", "\\1THIS_IS_FALSE\\2", line)
    line = line.replace('THIS_IS_FALSE', '0')
    line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')
    if re.search('^CREATE INDEX', line):
        line = line.replace('"','`')
    print line,

main()

Brad Gilbert ,Jul 1, 2009 at 18:43

Here is a slightly better version of the original.
#! /usr/bin/perl
use strict;
use warnings;
use 5.010; # for s/\K//;

while( <> ){
  next if m'
    BEGIN TRANSACTION   |
    COMMIT              |
    sqlite_sequence     |
    CREATE UNIQUE INDEX
  'x;

  if( my($name,$sub) = m'CREATE TABLE \"([a-z_]*)\"(.*)' ){
    # remove "
    $sub =~ s/\"//g; #"
    $_ = "DROP TABLE IF EXISTS $name;\nCREATE TABLE IF NOT EXISTS $name$sub\n";

  }elsif( /INSERT INTO \"([a-z_]*)\"(.*)/ ){
    $_ = "INSERT INTO $1$2\n";

    # " => \"
    s/\"/\\\"/g; #"
    # " => '
    s/\"/\'/g; #"

  }else{
    # '' => \'
    s/\'\'/\\\'/g; #'
  }

  # 't' => 1
  s/[^\\']\K\'t\'/1/g; #'

  # 'f' => 0
  s/[^\\']\K\'f\'/0/g; #'

  s/AUTOINCREMENT/AUTO_INCREMENT/g;
  print;
}

Mickey Mouse ,Jun 14, 2011 at 15:48

all of scripts on this page can't deal with simple sqlite3:
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE Filename (
  FilenameId INTEGER,
  Name TEXT DEFAULT '',
  PRIMARY KEY(FilenameId) 
  );
INSERT INTO "Filename" VALUES(1,'');
INSERT INTO "Filename" VALUES(2,'bigfile1');
INSERT INTO "Filename" VALUES(3,'%gconf-tree.xml');

None were able to reformat "table_name" into proper mysql's `table_name` . Some messed up empty string value.

Sinan Ünür ,Jul 1, 2009 at 3:24

I am not sure what is so hard to understand about this that it requires a snide remark as in your comment above. Note that <> is called the diamond operator. s/// is the substitution operator and // is the match operator m// .

Ken_g6 ,Jul 1, 2009 at 3:22

Based on http://docs.python.org/dev/howto/regex.html ...
  1. Replace $line =~ /.*/ with re.search(r".*", line) .
  2. $line !~ /.*/ is just !($line =~ /.*/) .
  3. Replace $line =~ s/.*/x/g with line=re.sub(r".*", "x", line) .
  4. Replace $1 through $9 inside re.sub with \1 through \9 respectively.
  5. Outside a sub, save the return value, i.e. m=re.search() , and replace $1 with the return value of m.group(1) .
  6. For "INSERT INTO $1$2\n" specifically, you can do "INSERT INTO %s%s\n" % (m.group(1), m.group(2)) .

hpavc ,Jul 1, 2009 at 12:33

Real issue is do you know actually how to migrate the database? What is presented is merely a search and replace loop.

> ,

Shortest? The tilde signifies a regex in perl. "import re" and go from there. The only key differences are that you'll be using \1 and \2 instead of $1 and $2 when you assign values, and you'll be using %s for when you're replacing regexp matches inside strings.

[Mar 16, 2019] Regex translation from Perl to Python - Stack Overflow

Mar 16, 2019 | stackoverflow.com

Regex translation from Perl to Python Ask Question 1


royskatt ,Jan 30, 2014 at 14:45

I would like to rewrite a small Perl programm to Python. I am processing text files with it as follows:

Input:

00000001;Root;;
00000002;  Documents;;
00000003;    oracle-advanced_plsql.zip;file;
00000004;  Public;;
00000005;  backup;;
00000006;    20110323-JM-F.7z.001;file;
00000007;    20110426-JM-F.7z.001;file;
00000008;    20110603-JM-F.7z.001;file;
00000009;    20110701-JM-F-via-summer_school;;
00000010;      20110701-JM-F-yyy.7z.001;file;

Desired output:

00000001;;Root;;
00000002;  ;Documents;;
00000003;    ;oracle-advanced_plsql.zip;file;
00000004;  ;Public;;
00000005;  ;backup;;
00000006;    ;20110323-JM-F.7z.001;file;
00000007;    ;20110426-JM-F.7z.001;file;
00000008;    ;20110603-JM-F.7z.001;file;
00000009;    ;20110701-JM-F-via-summer_school;;
00000010;      ;20110701-JM-F-yyy.7z.001;file;

Here is the working Perl code:

#filename: perl_regex.pl
#/usr/bin/perl -w
while(<>) {                                                           
  s/^(.*?;.*?)(\w)/$1;$2/;                                            
  print $_;                                                           
}

It call it from the command line: perl_regex.pl input.txt

Explanation of the Perl-style regex:

s/        # start search-and-replace regexp
  ^       # start at the beginning of this line
  (       # save the matched characters until ')' in $1
    .*?;  # go forward until finding the first semicolon
    .*?   # go forward until finding... (to be continued below)
  )
  (       # save the matched characters until ')' in $2
    \w    # ... the next alphanumeric character.
  )
/         # continue with the replace part
  $1;$2   # write all characters found above, but insert a ; before $2
/         # finish the search-and-replace regexp.

Could anyone tell me, how to get the same result in Python? Especially for the $1 and $2 variables I couldn't find something alike.

royskatt ,Jan 31, 2014 at 6:18

Python regular expression is very similar to Perl's, except:

Use re.sub to replace.

import re
import sys

for line in sys.stdin: # Explicitly iterate standard input line by line
    # `line` contains trailing newline!
    line = re.sub(r'^(.*?;.*?)(\w)', r'\1;\2', line)
    #print(line) # This print trailing newline
    sys.stdout.write(line) # Print the replaced string back.

royskatt ,Jan 31, 2014 at 16:36

The replace instruction for s/pattern/replace/ in python regexes is the re.sub(pattern, replace, string) function, or re.compile(pattern).sub(replace, string). In your case, you will do it so:
_re_pattern = re.compile(r"^(.*?;.*?)(\w)")
result = _re_pattern.sub(r"\1;\2", line)

Note that $1 becomes \1 . As for perl, you need to iterate over your lines the way you want to do it (open, inputfile, splitlines, ...).

[Mar 13, 2019] amp html - Convert img to amp-img - Stack Overflow

Mar 13, 2019 | stackoverflow.com

> ,

Which is the default way to convert an <img> to a <amp-img> ?

I explain myself: In the site that I'm converting to AMP I have lot of images without widht and height e.g.:

<img src="/img/image.png" alt="My image">

If I not specify the layout, the layout="container" is set by default and the most of the images throw the following error:

amp-img error: Layout not supported for: container

In the other hand, the most of the images don't fit with the responsive layout, which is recommended by Google for most of the cases

I have been checking the types of layout on the documentation:

But any of them seems to fit with an image that have to be shown as its real size, not specifying width or height.

So, in that case, which is the equivalent in AMP?

,

As you are saying you have multiple images, it's better you use the ' layout="responsive" ', with that, you will make your images responsive atleast.

Now regarding the Width and Height . They are must.

If you read the purpose of AMP, one of them is to make the pages ' Jumping/Flickering Free Content ', which happens if there is no width mentioned for Images.

By Specifying the Width, the Browser (mobile browser), can calculate the precise space and keep it for that Image and show the Content after that. In that way, there wont' be any flickering of the content, as the page and images are loaded.

Regarding the re-writing of your HTML, one tip I can provide is, you can write some small utility with PHP, Python or Node JavaScript, which can actually read the source image, calculate their dimensions and replace your IMG tags.

Hope this helps and wish you good luck for your AMP powered site :-)

[Mar 10, 2019] How do I detach a process from Terminal, entirely?

Mar 10, 2019 | superuser.com

stackoverflow.com, Aug 25, 2016 at 17:24

I use Tilda (drop-down terminal) on Ubuntu as my "command central" - pretty much the way others might use GNOME Do, Quicksilver or Launchy.

However, I'm struggling with how to completely detach a process (e.g. Firefox) from the terminal it's been launched from - i.e. prevent that such a (non-)child process

For example, in order to start Vim in a "proper" terminal window, I have tried a simple script like the following:

exec gnome-terminal -e "vim $@" &> /dev/null &

However, that still causes pollution (also, passing a file name doesn't seem to work).

lhunath, Sep 23, 2016 at 19:08

First of all; once you've started a process, you can background it by first stopping it (hit Ctrl - Z ) and then typing bg to let it resume in the background. It's now a "job", and its stdout / stderr / stdin are still connected to your terminal.

You can start a process as backgrounded immediately by appending a "&" to the end of it:

firefox &

To run it in the background silenced, use this:

firefox </dev/null &>/dev/null &

Some additional info:

nohup is a program you can use to run your application with such that its stdout/stderr can be sent to a file instead and such that closing the parent script won't SIGHUP the child. However, you need to have had the foresight to have used it before you started the application. Because of the way nohup works, you can't just apply it to a running process .

disown is a bash builtin that removes a shell job from the shell's job list. What this basically means is that you can't use fg , bg on it anymore, but more importantly, when you close your shell it won't hang or send a SIGHUP to that child anymore. Unlike nohup , disown is used after the process has been launched and backgrounded.

What you can't do, is change the stdout/stderr/stdin of a process after having launched it. At least not from the shell. If you launch your process and tell it that its stdout is your terminal (which is what you do by default), then that process is configured to output to your terminal. Your shell has no business with the processes' FD setup, that's purely something the process itself manages. The process itself can decide whether to close its stdout/stderr/stdin or not, but you can't use your shell to force it to do so.

To manage a background process' output, you have plenty of options from scripts, "nohup" probably being the first to come to mind. But for interactive processes you start but forgot to silence ( firefox < /dev/null &>/dev/null & ) you can't do much, really.

I recommend you get GNU screen . With screen you can just close your running shell when the process' output becomes a bother and open a new one ( ^Ac ).


Oh, and by the way, don't use " $@ " where you're using it.

$@ means, $1 , $2 , $3 ..., which would turn your command into:

gnome-terminal -e "vim $1" "$2" "$3" ...

That's probably not what you want because -e only takes one argument. Use $1 to show that your script can only handle one argument.

It's really difficult to get multiple arguments working properly in the scenario that you gave (with the gnome-terminal -e ) because -e takes only one argument, which is a shell command string. You'd have to encode your arguments into one. The best and most robust, but rather cludgy, way is like so:

gnome-terminal -e "vim $(printf "%q " "$@")"

Limited Atonement ,Aug 25, 2016 at 17:22

nohup cmd &

nohup detaches the process completely (daemonizes it)

Randy Proctor ,Sep 13, 2016 at 23:00

If you are using bash , try disown [ jobspec ] ; see bash(1) .

Another approach you can try is at now . If you're not superuser, your permission to use at may be restricted.

Stephen Rosen ,Jan 22, 2014 at 17:08

Reading these answers, I was under the initial impression that issuing nohup <command> & would be sufficient. Running zsh in gnome-terminal, I found that nohup <command> & did not prevent my shell from killing child processes on exit. Although nohup is useful, especially with non-interactive shells, it only guarantees this behavior if the child process does not reset its handler for the SIGHUP signal.

In my case, nohup should have prevented hangup signals from reaching the application, but the child application (VMWare Player in this case) was resetting its SIGHUP handler. As a result when the terminal emulator exits, it could still kill your subprocesses. This can only be resolved, to my knowledge, by ensuring that the process is removed from the shell's jobs table. If nohup is overridden with a shell builtin, as is sometimes the case, this may be sufficient, however, in the event that it is not...


disown is a shell builtin in bash , zsh , and ksh93 ,

<command> &
disown

or

<command> &; disown

if you prefer one-liners. This has the generally desirable effect of removing the subprocess from the jobs table. This allows you to exit the terminal emulator without accidentally signaling the child process at all. No matter what the SIGHUP handler looks like, this should not kill your child process.

After the disown, the process is still a child of your terminal emulator (play with pstree if you want to watch this in action), but after the terminal emulator exits, you should see it attached to the init process. In other words, everything is as it should be, and as you presumably want it to be.

What to do if your shell does not support disown ? I'd strongly advocate switching to one that does, but in the absence of that option, you have a few choices.

  1. screen and tmux can solve this problem, but they are much heavier weight solutions, and I dislike having to run them for such a simple task. They are much more suitable for situations in which you want to maintain a tty, typically on a remote machine.
  2. For many users, it may be desirable to see if your shell supports a capability like zsh's setopt nohup . This can be used to specify that SIGHUP should not be sent to the jobs in the jobs table when the shell exits. You can either apply this just before exiting the shell, or add it to shell configuration like ~/.zshrc if you always want it on.
  3. Find a way to edit the jobs table. I couldn't find a way to do this in tcsh or csh , which is somewhat disturbing.
  4. Write a small C program to fork off and exec() . This is a very poor solution, but the source should only consist of a couple dozen lines. You can then pass commands as commandline arguments to the C program, and thus avoid a process specific entry in the jobs table.

Sheljohn ,Jan 10 at 10:20

  1. nohup $COMMAND &
  2. $COMMAND & disown
  3. setsid command

I've been using number 2 for a very long time, but number 3 works just as well. Also, disown has a 'nohup' flag of '-h', can disown all processes with '-a', and can disown all running processes with '-ar'.

Silencing is accomplished by '$COMMAND &>/dev/null'.

Hope this helps!

dunkyp

add a comment ,Mar 25, 2009 at 1:51
I think screen might solve your problem

Nathan Fellman ,Mar 23, 2009 at 14:55

in tcsh (and maybe in other shells as well), you can use parentheses to detach the process.

Compare this:

> jobs # shows nothing
> firefox &
> jobs
[1]  + Running                       firefox

To this:

> jobs # shows nothing
> (firefox &)
> jobs # still shows nothing
>

This removes firefox from the jobs listing, but it is still tied to the terminal; if you logged in to this node via 'ssh', trying to log out will still hang the ssh process.

,

To disassociate tty shell run command through sub-shell for e.g.

(command)&

When exit used terminal closed but process is still alive.

check -

(sleep 100) & exit

Open other terminal

ps aux | grep sleep

Process is still alive.

[Mar 10, 2019] How to run tmux/screen with systemd 230 ?

Aug 02, 2018 | askubuntu.com

MvanGeest ,May 10, 2017 at 20:59

I run 16.04 and systemd now kills tmux when the user disconnects ( summary of the change ).

Is there a way to run tmux or screen (or any similar program) with systemd 230? I read all the heated disussion about pros and cons of the behavious but no solution was suggested.

(I see the behaviour in 229 as well)

WoJ ,Aug 2, 2016 at 20:30

RemainAfterExit=

Takes a boolean value that specifies whether the service shall be considered active even when all its processes exited. Defaults to no.

jpath ,Feb 13 at 12:29

The proper solution is to disable the offending systemd behavior system-wide.

Edit /etc/systemd/logind.conf ( you must sudo , of course) and set

KillUserProcesses=no

You can also put this setting in a separate file, e.g. /etc/systemd/logind.conf.d/99-dont-kill-user-processes.conf .

Then restart systemd-logind.service .

sudo systemctl restart systemd-logind

sarnold ,Dec 9, 2016 at 11:59

Based on @Rinzwind's answer and inspired by a unit description the best I could find is to use TaaS (Tmux as a Service) - a generic detached instance of tmux one reattaches to.
# cat /etc/systemd/system/tmux@.service

[Unit]
Description=tmux default session (detached)
Documentation=man:tmux(1)

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/tmux new-session -d -s %I
ExecStop=/usr/bin/tmux kill-server
KillMode=none

[Install]
WantedBy=multiplexer.target

# systemctl start tmux@instanceone.service
# systemctl start tmux@instancetwo.service
# tmux list-sessions

instanceone: 1 windows (created Sun Jul 24 00:52:15 2016) [193x49]
instancetwo: 1 windows (created Sun Jul 24 00:52:19 2016) [193x49]

# tmux attach-session -t instanceone

(instanceone)#

Robin Hartmann ,Aug 2, 2018 at 20:23

You need to set the Type of the service to forking , as explained here .

Let's assume the service you want to run in screen is called minecraft . Then you would open minecraft.service in a text editor and add or edit the entry Type=forking under the section [Service] .

> ,

According to https://unix.stackexchange.com/a/287282/117599 invoking tmux using
systemd-run --user --scope tmux

should also do the trick.

[Mar 10, 2019] linux - How to attach terminal to detached process

Mar 10, 2019 | unix.stackexchange.com

Ask Question 86


Gilles ,Feb 16, 2012 at 21:39

I have detached a process from my terminal, like this:
$ process &

That terminal is now long closed, but process is still running and I want to send some commands to that process's stdin. Is that possible?

Samuel Edwin Ward ,Dec 22, 2018 at 13:34

Yes, it is. First, create a pipe: mkfifo /tmp/fifo . Use gdb to attach to the process: gdb -p PID

Then close stdin: call close (0) ; and open it again: call open ("/tmp/fifo", 0600)

Finally, write away (from a different terminal, as gdb will probably hang):

echo blah > /tmp/fifo

NiKiZe ,Jan 6, 2017 at 22:52

When original terminal is no longer accessible...

reptyr might be what you want, see https://serverfault.com/a/284795/187998

Quote from there:

Have a look at reptyr , which does exactly that. The github page has all the information.
reptyr - A tool for "re-ptying" programs.

reptyr is a utility for taking an existing running program and attaching it to a new terminal. Started a long-running process over ssh, but have to leave and don't want to interrupt it? Just start a screen, use reptyr to grab it, and then kill the ssh session and head on home.

USAGE

reptyr PID

"reptyr PID" will grab the process with id PID and attach it to your current terminal.

After attaching, the process will take input from and write output to the new terminal, including ^C and ^Z. (Unfortunately, if you background it, you will still have to run "bg" or "fg" in the old terminal. This is likely impossible to fix in a reasonable way without patching your shell.)

manatwork ,Nov 20, 2014 at 22:59

I am quite sure you can not.

Check using ps x . If a process has a ? as controlling tty , you can not send input to it any more.

9942 ?        S      0:00 tail -F /var/log/messages
9947 pts/1    S      0:00 tail -F /var/log/messages

In this example, you can send input to 9947 doing something like echo "test" > /dev/pts/1 . The other process ( 9942 ) is not reachable.

Next time, you could use screen or tmux to avoid this situation.

Stéphane Gimenez ,Feb 16, 2012 at 16:16

EDIT : As Stephane Gimenez said, it's not that simple. It's only allowing you to print to a different terminal.

You can try to write to this process using /proc . It should be located in /proc/ pid /fd/0 , so a simple :

echo "hello" > /proc/PID/fd/0

should do it. I have not tried it, but it should work, as long as this process still has a valid stdin file descriptor. You can check it with ls -l on /proc/ pid /fd/ .

See nohup for more details about how to keep processes running.

Stéphane Gimenez ,Nov 20, 2015 at 5:08

Just ending the command line with & will not completely detach the process, it will just run it in the background. (With zsh you can use &! to actually detach it, otherwise you have do disown it later).

When a process runs in the background, it won't receive input from its controlling terminal anymore. But you can send it back into the foreground with fg and then it will read input again.

Otherwise, it's not possible to externally change its filedescriptors (including stdin) or to reattach a lost controlling terminal unless you use debugging tools (see Ansgar's answer , or have a look at the retty command).

[Mar 10, 2019] linux - Preventing tmux session created by systemd from automatically terminating on Ctrl+C - Stack Overflow

Mar 10, 2019 | stackoverflow.com

Preventing tmux session created by systemd from automatically terminating on Ctrl+C Ask Question -1


Jim Stewart ,Nov 10, 2018 at 12:55

Since a few days I'm successfully running the new Minecraft Bedrock Edition dedicated server on my Ubuntu 18.04 LTS home server. Because it should be available 24/7 and automatically startup after boot I created a systemd service for a detached tmux session:

tmux.minecraftserver.service

[Unit]
Description=tmux minecraft_server detached

[Service]
Type=forking
WorkingDirectory=/home/mine/minecraftserver
ExecStart=/usr/bin/tmux new -s minecraftserver -d "LD_LIBRARY_PATH=. /home/mine/minecraftser$
User=mine

[Install]
WantedBy=multi-user.target

Everything works as expected but there's one tiny thing that keeps bugging me:

How can I prevent tmux from terminating it's whole session when I press Ctrl+C ? I just want to terminate the Minecraft server process itself instead of the whole tmux session. When starting the server from the command line in a manually created tmux session this does work (session stays alive) but not when the session was brought up by systemd .

FlKo ,Nov 12, 2018 at 6:21

When starting the server from the command line in a manually created tmux session this does work (session stays alive) but not when the session was brought up by systemd .

The difference between these situations is actually unrelated to systemd. In one case, you're starting the server from a shell within the tmux session, and when the server terminates, control returns to the shell. In the other case, you're starting the server directly within the tmux session, and when it terminates there's no shell to return to, so the tmux session also dies.

tmux has an option to keep the session alive after the process inside it dies (look for remain-on-exit in the manpage), but that's probably not what you want: you want to be able to return to an interactive shell, to restart the server, investigate why it died, or perform maintenance tasks, for example. So it's probably better to change your command to this:

'LD_LIBRARY_PATH=. /home/mine/minecraftserver/ ; exec bash'

That is, first run the server, and then, after it terminates, replace the process (the shell which tmux implicitly spawns to run the command, but which will then exit) with another, interactive shell. (For some other ways to get an interactive shell after the command exits, see e. g. this question – but note that the <(echo commands) syntax suggested in the top answer is not available in systemd unit files.)

FlKo ,Nov 12, 2018 at 6:21

I as able to solve this by using systemd's ExecStartPost and tmux's send-keys like this:
[Unit]
Description=tmux minecraft_server detached

[Service]
Type=forking
WorkingDirectory=/home/mine/minecraftserver
ExecStart=/usr/bin/tmux new -d -s minecraftserver
ExecStartPost=/usr/bin/tmux send-keys -t minecraftserver "cd /home/mine/minecraftserver/" Enter "LD_LIBRARY_PATH=. ./bedrock_server" Enter

User=mine

[Install]
WantedBy=multi-user.target

[Mar 01, 2019] Creating symlinks instead of /bin /sbin /lib and /lib64 directories in RHEL7

That change essentially means that /usr should be on the root partition, not on a separate partition which with the current sizes of harddrive is a resobale requirement.
Notable quotes:
"... On Linux /bin and /usr/bin are still separate because it is common to have /usr on a separate partition (although this configuration breaks in subtle ways, sometimes). In /bin is all the commands that you will need if you only have / mounted. ..."
Mar 01, 2019 | unix.stackexchange.com

balki ,May 2, 2015 at 6:17

What? no /bin/ is not a symlink to /usr/bin on any FHS compliant system. Note that there are still popular Unixes and Linuxes that ignore this - for example, /bin and /sbin are symlinked to /usr/bin on Arch Linux (the reasoning being that you don't need /bin for rescue/single-user-mode, since you'd just boot a live CD).

/bin

contains commands that may be used by both the system administrator and by users, but which are required when no other filesystems are mounted (e.g. in single user mode). It may also contain commands which are used indirectly by scripts

/usr/bin/

This is the primary directory of executable commands on the system.

essentially, /bin contains executables which are required by the system for emergency repairs, booting, and single user mode. /usr/bin contains any binaries that aren't required.

I will note, that they can be on separate disks/partitions, /bin must be on the same disk as / . /usr/bin can be on another disk - although note that this configuration has been kind of broken for a while (this is why e.g. systemd warns about this configuration on boot).

For full correctness, some unices may ignore FHS, as I believe it is only a Linux Standard, I'm not aware that it has yet been included in SUS, Posix or any other UNIX standard, though it should be IMHO. It is a part of the LSB standard though.

LawrenceC ,Jan 13, 2015 at 16:12

/sbin - Binaries needed for booting, low-level system repair, or maintenance (run level 1 or S)

/bin - Binaries needed for normal/standard system functioning at any run level.

/usr/bin - Application/distribution binaries meant to be accessed by locally logged in users

/usr/sbin - Application/distribution binaries that support or configure stuff in /sbin.

/usr/share/bin - Application/distribution binaries or scripts meant to be accessed via the web, i.e. Apache web applications

*local* - Binaries not part of a distribution; locally compiled or manually installed. There's usually never a /local/bin but always a /usr/local/bin and /usr/local/share/bin .

JonnyJD ,Jan 3, 2013 at 0:17

Some kind of "update" on this issue:

Recently some Linux distributions are merging /bin into /usr/bin and relatedly /lib into /usr/lib . Sometimes also (/usr)/sbin to /usr/bin (Arch Linux). So /usr is expected to be available at the same time as / .

The distinction between the two hierarchies is taken to be unnecessary complexity now. The idea was once having only /bin available at boot, but having an initial ramdisk makes this obsolete.

I know of Fedora Linux (2011) and Arch Linux (2012) going this way and Solaris is doing this for a long time (> 15 years).

xenoterracide ,Jan 17, 2011 at 16:23

On Linux /bin and /usr/bin are still separate because it is common to have /usr on a separate partition (although this configuration breaks in subtle ways, sometimes). In /bin is all the commands that you will need if you only have / mounted.

On Solaris and Arch Linux (and probably others) /bin is a symlink to /usr/bin . Arch also has /sbin and /usr/sbin symlinked to /usr/bin .

Of particular note, the statement that /bin is for "system administrator" commands and /usr/bin is for user commands is not true (unless you think that bash and ls are for admins only, in which case you have a lot to learn). Administrator commands are in /sbin and /usr/sbin .

[Feb 21, 2019] The rm='rm -i' alias is an horror

Feb 21, 2019 | superuser.com

The rm='rm -i' alias is an horror because after a while using it, you will expect rm to prompt you by default before removing files. Of course, one day you'll run it with an account that hasn't that alias set and before you understand what's going on, it is too late.

... ... ...

If you want save aliases, but don't want to risk getting used to the commands working differently on your system than on others, you can to disable rm like this
alias rm='echo "rm is disabled, use remove or trash or /bin/rm instead."'

Then you can create your own safe alias, e.g.

alias remove='/bin/rm -irv'

or use trash instead.

[Feb 21, 2019] What is the minimum I have to do to create an RPM file?

Feb 21, 2019 | stackoverflow.com

webwesen ,Jan 29, 2016 at 6:42

I just want to create an RPM file to distribute my Linux binary "foobar", with only a couple of dependencies. It has a config file, /etc/foobar.conf and should be installed in /usr/bin/foobar.

Unfortunately the documentation for RPM is 27 chapters long and I really don't have a day to sit down and read this, because I am also busy making .deb and EXE installers for other platforms.

What is the absolute minimum I have to do to create an RPM? Assume the foobar binary and foobar.conf are in the current working directory.

icasimpan ,Apr 10, 2018 at 13:33

I often do binary rpm per packaging proprietary apps - also moster as websphere - on linux. So my experience could be useful also a you, besides that it would better to do a TRUE RPM if you can. But i digress.

So the a basic step for packaging your (binary) program is as follow - in which i suppose the program is toybinprog with version 1.0, have a conf to be installed in /etc/toybinprog/toybinprog.conf and have a bin to be installed in /usr/bin called tobinprog :

1. create your rpm build env for RPM < 4.6,4.7
mkdir -p ~/rpmbuild/{RPMS,SRPMS,BUILD,SOURCES,SPECS,tmp}

cat <<EOF >~/.rpmmacros
%_topdir   %(echo $HOME)/rpmbuild
%_tmppath  %{_topdir}/tmp
EOF

cd ~/rpmbuild
2. create the tarball of your project
mkdir toybinprog-1.0
mkdir -p toybinprog-1.0/usr/bin
mkdir -p toybinprog-1.0/etc/toybinprog
install -m 755 toybinprog toybinprog-1.0/usr/bin
install -m 644 toybinprog.conf toybinprog-1.0/etc/toybinprog/

tar -zcvf toybinprog-1.0.tar.gz toybinprog-1.0/
3. Copy to the sources dir
cp toybinprog-1.0.tar.gz SOURCES/

cat <<EOF > SPECS/toybinprog.spec
# Don't try fancy stuff like debuginfo, which is useless on binary-only
# packages. Don't strip binary too
# Be sure buildpolicy set to do nothing
%define        __spec_install_post %{nil}
%define          debug_package %{nil}
%define        __os_install_post %{_dbpath}/brp-compress

Summary: A very simple toy bin rpm package
Name: toybinprog
Version: 1.0
Release: 1
License: GPL+
Group: Development/Tools
SOURCE0 : %{name}-%{version}.tar.gz
URL: http://toybinprog.company.com/

BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root

%description
%{summary}

%prep
%setup -q

%build
# Empty section.

%install
rm -rf %{buildroot}
mkdir -p  %{buildroot}

# in builddir
cp -a * %{buildroot}


%clean
rm -rf %{buildroot}


%files
%defattr(-,root,root,-)
%config(noreplace) %{_sysconfdir}/%{name}/%{name}.conf
%{_bindir}/*

%changelog
* Thu Apr 24 2009  Elia Pinto <devzero2000@rpm5.org> 1.0-1
- First Build

EOF
4. build the source and the binary rpm
rpmbuild -ba SPECS/toybinprog.spec

And that's all.

Hope this help

> ,

As an application distributor, fpm sounds perfect for your needs . There is an example here which shows how to package an app from source. FPM can produce both deb files and RPM files.

[Feb 21, 2019] perl - How to prompt for input and exit if the user entered an empty string - Stack Overflow

Feb 20, 2019 | stackoverflow.com

NewLearner ,Mar 12, 2012 at 3:22

I'm new to Perl and I'm writing a program where I want to force the user to enter a word. If the user enters an empty string then the program should exit.

This is what I have so far:

print "Enter a word to look up: ";

chomp ($usrword = <STDIN>);

DVK , Nov 19, 2015 at 19:11

You're almost there.
print "Enter a word to look up: ";
my $userword = <STDIN>; # I moved chomp to a new line to make it more readable
chomp $userword; # Get rid of newline character at the end
exit 0 if ($userword eq ""); # If empty string, exit.

Pondy , Jul 6 '16 at 22:11

File output is buffered by default. Since the prompt is so short, it is still sitting in the output buffer. You can disable buffering on STDOUT by adding this line of code before printing...
select((select(STDOUT), $|=1)[0]);

[Feb 11, 2019] Resuming rsync on a interrupted transfer

May 15, 2013 | stackoverflow.com

Glitches , May 15, 2013 at 18:06

I am trying to backup my file server to a remove file server using rsync. Rsync is not successfully resuming when a transfer is interrupted. I used the partial option but rsync doesn't find the file it already started because it renames it to a temporary file and when resumed it creates a new file and starts from beginning.

Here is my command:

rsync -avztP -e "ssh -p 2222" /volume1/ myaccont@backup-server-1:/home/myaccount/backup/ --exclude "@spool" --exclude "@tmp"

When this command is ran, a backup file named OldDisk.dmg from my local machine get created on the remote machine as something like .OldDisk.dmg.SjDndj23 .

Now when the internet connection gets interrupted and I have to resume the transfer, I have to find where rsync left off by finding the temp file like .OldDisk.dmg.SjDndj23 and rename it to OldDisk.dmg so that it sees there already exists a file that it can resume.

How do I fix this so I don't have to manually intervene each time?

Richard Michael , Nov 6, 2013 at 4:26

TL;DR : Use --timeout=X (X in seconds) to change the default rsync server timeout, not --inplace .

The issue is the rsync server processes (of which there are two, see rsync --server ... in ps output on the receiver) continue running, to wait for the rsync client to send data.

If the rsync server processes do not receive data for a sufficient time, they will indeed timeout, self-terminate and cleanup by moving the temporary file to it's "proper" name (e.g., no temporary suffix). You'll then be able to resume.

If you don't want to wait for the long default timeout to cause the rsync server to self-terminate, then when your internet connection returns, log into the server and clean up the rsync server processes manually. However, you must politely terminate rsync -- otherwise, it will not move the partial file into place; but rather, delete it (and thus there is no file to resume). To politely ask rsync to terminate, do not SIGKILL (e.g., -9 ), but SIGTERM (e.g., pkill -TERM -x rsync - only an example, you should take care to match only the rsync processes concerned with your client).

Fortunately there is an easier way: use the --timeout=X (X in seconds) option; it is passed to the rsync server processes as well.

For example, if you specify rsync ... --timeout=15 ... , both the client and server rsync processes will cleanly exit if they do not send/receive data in 15 seconds. On the server, this means moving the temporary file into position, ready for resuming.

I'm not sure of the default timeout value of the various rsync processes will try to send/receive data before they die (it might vary with operating system). In my testing, the server rsync processes remain running longer than the local client. On a "dead" network connection, the client terminates with a broken pipe (e.g., no network socket) after about 30 seconds; you could experiment or review the source code. Meaning, you could try to "ride out" the bad internet connection for 15-20 seconds.

If you do not clean up the server rsync processes (or wait for them to die), but instead immediately launch another rsync client process, two additional server processes will launch (for the other end of your new client process). Specifically, the new rsync client will not re-use/reconnect to the existing rsync server processes. Thus, you'll have two temporary files (and four rsync server processes) -- though, only the newer, second temporary file has new data being written (received from your new rsync client process).

Interestingly, if you then clean up all rsync server processes (for example, stop your client which will stop the new rsync servers, then SIGTERM the older rsync servers, it appears to merge (assemble) all the partial files into the new proper named file. So, imagine a long running partial copy which dies (and you think you've "lost" all the copied data), and a short running re-launched rsync (oops!).. you can stop the second client, SIGTERM the first servers, it will merge the data, and you can resume.

Finally, a few short remarks:

JamesTheAwesomeDude , Dec 29, 2013 at 16:50

Just curious: wouldn't SIGINT (aka ^C ) be 'politer' than SIGTERM ? – JamesTheAwesomeDude Dec 29 '13 at 16:50

Richard Michael , Dec 29, 2013 at 22:34

I didn't test how the server-side rsync handles SIGINT, so I'm not sure it will keep the partial file - you could check. Note that this doesn't have much to do with Ctrl-c ; it happens that your terminal sends SIGINT to the foreground process when you press Ctrl-c , but the server-side rsync has no controlling terminal. You must log in to the server and use kill . The client-side rsync will not send a message to the server (for example, after the client receives SIGINT via your terminal Ctrl-c ) - might be interesting though. As for anthropomorphizing, not sure what's "politer". :-) – Richard Michael Dec 29 '13 at 22:34

d-b , Feb 3, 2015 at 8:48

I just tried this timeout argument rsync -av --delete --progress --stats --human-readable --checksum --timeout=60 --partial-dir /tmp/rsync/ rsync://$remote:/ /src/ but then it timed out during the "receiving file list" phase (which in this case takes around 30 minutes). Setting the timeout to half an hour so kind of defers the purpose. Any workaround for this? – d-b Feb 3 '15 at 8:48

Cees Timmerman , Sep 15, 2015 at 17:10

@user23122 --checksum reads all data when preparing the file list, which is great for many small files that change often, but should be done on-demand for large files. – Cees Timmerman Sep 15 '15 at 17:10

[Jan 29, 2019] Do journaling filesystems guarantee against corruption after a power failure

Jan 29, 2019 | unix.stackexchange.com

Nathan Osman ,May 6, 2011 at 1:50

I am asking this question on behalf of another user who raised the issue in the Ubuntu chat room.

Do journaling filesystems guarantee that no corruption will occur if a power failure occurs?

If this answer depends on the filesystem, please indicate which ones do protect against corruption and which ones don't.

Andrew Lambert ,May 6, 2011 at 2:51

There are no guarantees. A Journaling File System is more resilient and is less prone to corruption, but not immune.

All a journal is is a list of operations which have recently been done to the file system. The crucial part is that the journal entry is made before the operations take place. Most operations have multiple steps. Deleting a file, for example might entail deleting the file's entry in the file system's table of contents and then marking the sectors on the drive as free. If something happens between the two steps, a journaled file system can tell immediately and perform the necessary clean up to keep everything consistent. This is not the case with a non-journaled file system which has to look at the entire contents of the volume to find errors.

While this journaling is much less prone to corruption than not journaling, corruption can still occur. For example, if the hard drive is mechanically malfunctioning or if writes to the journal itself are failing or interrupted.

The basic premise of journaling is that writing a journal entry is much quicker, usually, than the actual transaction it describes will be. So, the period between the OS ordering a (journal) write and the hard drive fulfilling it is much shorter than for a normal write: a narrower window for things to go wrong in, but there's still a window.

Further reading

Nathan Osman ,May 6, 2011 at 2:57

Could you please elaborate a little bit on why this is true? Perhaps you could give an example of how corruption would occur in a certain scenario. – Nathan Osman May 6 '11 at 2:57

Andrew Lambert ,May 6, 2011 at 3:21

@George Edison See my expanded answer. – Andrew Lambert May 6 '11 at 3:21

psusi ,May 6, 2011 at 17:58

That last bit is incorrect; there is no window for things to go wrong. Since it records what it is about to do before it starts doing it, the operation can be restarted after the power failure, no matter at what point it occurs during the operation. It is a matter of ordering, not timing. – psusi May 6 '11 at 17:58

Andrew Lambert ,May 6, 2011 at 21:23

@psusi there is still a window for the write to the journal to be interrupted. Journal writes may appear atomic to the OS but they're still writes to the disk. – Andrew Lambert May 6 '11 at 21:23

psusi ,May 7, 2011 at 1:57

@Amazed they are atomic because they have sequence numbers and/or checksums, so the journal entry is either written entirely, or not. If it is not written entirely, it is simply ignored after the system restarts, and no further changes were made to the fs so it remains consistent. – psusi May 7 '11 at 1:57

Mikel ,May 6, 2011 at 6:03

No.

The most common type of journaling, called metadata journaling, only protects the integrity of the file system, not of data. This includes xfs , and ext3 / ext4 in the default data=ordered mode.

If a non-journaling file system suffers a crash, it will be checked using fsck on the next boot. fsck scans every inode on the file system, looking for blocks that are marked as used but are not reachable (i.e. have no file name), and marks those blocks as unused. Doing this takes a long time.

With a metadata journaling file system, instead of doing an fsck , it knows which blocks it was in the middle of changing, so it can mark them as free without searching the whole partition for them.

There is a less common type of journaling, called data journaling, which is what ext3 does if you mount it with the data=journal option.

It attempts to protect all your data by writing not just a list of logical operations, but also the entire contents of each write to the journal. But because it's writing your data twice, it can be much slower.

As others have pointed out, even this is not a guarantee, because the hard drive might have told the operating system it had stored the data, when it fact it was still in the hard drive's cache.

For more information, take a look at the Wikipedia Journaling File System article and the Data Mode section of the ext4 documentation .

SplinterReality ,May 6, 2011 at 8:03

+1 for the distinction between file system corruption and data corruption. That little distinction is quite the doozy in practice. – SplinterReality May 6 '11 at 8:03

boehj ,May 6, 2011 at 10:57

Excuse my utter ignorance, but doesn't data=journal as a feature make no sense at all? – boehj May 6 '11 at 10:57

psusi ,May 6, 2011 at 18:11

Again, the OS knows when the drive caches data and forces it to flush it when needed in order to maintain a coherent fs. Your data file of course, can be lost or corrupted if the application that was writing it when the power failed was not doing so carefully, and that applies whether or not you use data=journal. – psusi May 6 '11 at 18:11

user3338098 ,Aug 1, 2016 at 16:30

@psusi doesn't matter how careful the program is in writing the data, plenty of hard drives silently corrupt the data on READING stackoverflow.com/q/34141117/3338098user3338098 Aug 1 '16 at 16:30

psusi ,Aug 21, 2016 at 3:22

@user3338098, drives that silently corrupt data are horribly broken and should not ever be used, and are an entirely different conversation than corruption caused by software doing the wrong thing. – psusi Aug 21 '16 at 3:22

camh ,May 6, 2011 at 3:26

A filesystem cannot guarantee the consistency of its filesystem if a power failure occurs, because it does not know what the hardware will do.

If a hard drive buffers data for write but tells the OS that it has written the data and does not support the appropriate write barriers, then out-of-order writes can occur where an earlier write has not hit the platter, but a later one has. See this serverfault answer for more details.

Also, the position of the head on a magnetic HDD is controlled with electro-magnets. If power fails in the middle of a write, it is possible for some data to continue to be written while the heads move, corrupting data on blocks that the filesystem never intended to be written.

Nathan Osman ,May 6, 2011 at 6:43

Isn't the drive's firmware smart enough to suspend writing when retracting the head? – Nathan Osman May 6 '11 at 6:43

camh ,May 6, 2011 at 7:54

@George: It's going to depend on the drive. There's a lot out there and you don't know how well your (cheap) drive does things. – camh May 6 '11 at 7:54

psusi ,May 6, 2011 at 18:05

The hard drive tells the OS if it uses a write behind cache, and the OS takes measures to ensure they are flushed in the correct order. Also drives are designed so that when the power fails, they stop writing. I have seen some cases where the sector being written at the time of power loss becomes corrupt because it did not finish updating the ecc ( but can be easily re-written correctly ), but never heard of random sectors being corrupted on power loss. – psusi May 6 '11 at 18:05

jlliagre ,May 6, 2011 at 8:35

ZFS, which is close but not exactly a journaling filesystem, is guaranteeing by design against corruption after a power failure.

It doesn't matter if an ongoing write is interrupted in the middle as in such case, its checksum will be certainly incorrect so the block will be ignored. As the file system is copy on write, the previous correct data (or meta-data) is still on disk and will be used instead.

sakisk ,May 6, 2011 at 10:13

The answer is in most cases no:

Nathan Osman ,May 6, 2011 at 16:35

What events could lead to a corrupt journal? The only thing I could think of was bad sectors - is there anything else? – Nathan Osman May 6 '11 at 16:35

sakisk ,May 7, 2011 at 13:21

That's right, hardware failures are the usual case. – sakisk May 7 '11 at 13:21

[Jan 29, 2019] Split string into an array in Bash

May 14, 2012 | stackoverflow.com

Lgn ,May 14, 2012 at 15:15

In a Bash script I would like to split a line into pieces and store them in an array.

The line:

Paris, France, Europe

I would like to have them in an array like this:

array[0] = Paris
array[1] = France
array[2] = Europe

I would like to use simple code, the command's speed doesn't matter. How can I do it?

antak ,Jun 18, 2018 at 9:22

This is #1 Google hit but there's controversy in the answer because the question unfortunately asks about delimiting on , (comma-space) and not a single character such as comma. If you're only interested in the latter, answers here are easier to follow: stackoverflow.com/questions/918886/ – antak Jun 18 '18 at 9:22

Dennis Williamson ,May 14, 2012 at 15:16

IFS=', ' read -r -a array <<< "$string"

Note that the characters in $IFS are treated individually as separators so that in this case fields may be separated by either a comma or a space rather than the sequence of the two characters. Interestingly though, empty fields aren't created when comma-space appears in the input because the space is treated specially.

To access an individual element:

echo "${array[0]}"

To iterate over the elements:

for element in "${array[@]}"
do
    echo "$element"
done

To get both the index and the value:

for index in "${!array[@]}"
do
    echo "$index ${array[index]}"
done

The last example is useful because Bash arrays are sparse. In other words, you can delete an element or add an element and then the indices are not contiguous.

unset "array[1]"
array[42]=Earth

To get the number of elements in an array:

echo "${#array[@]}"

As mentioned above, arrays can be sparse so you shouldn't use the length to get the last element. Here's how you can in Bash 4.2 and later:

echo "${array[-1]}"

in any version of Bash (from somewhere after 2.05b):

echo "${array[@]: -1:1}"

Larger negative offsets select farther from the end of the array. Note the space before the minus sign in the older form. It is required.

l0b0 ,May 14, 2012 at 15:24

Just use IFS=', ' , then you don't have to remove the spaces separately. Test: IFS=', ' read -a array <<< "Paris, France, Europe"; echo "${array[@]}" – l0b0 May 14 '12 at 15:24

Dennis Williamson ,May 14, 2012 at 16:33

@l0b0: Thanks. I don't know what I was thinking. I like to use declare -p array for test output, by the way. – Dennis Williamson May 14 '12 at 16:33

Nathan Hyde ,Mar 16, 2013 at 21:09

@Dennis Williamson - Awesome, thorough answer. – Nathan Hyde Mar 16 '13 at 21:09

dsummersl ,Aug 9, 2013 at 14:06

MUCH better than multiple cut -f calls! – dsummersl Aug 9 '13 at 14:06

caesarsol ,Oct 29, 2015 at 14:45

Warning: the IFS variable means split by one of these characters , so it's not a sequence of chars to split by. IFS=', ' read -a array <<< "a,d r s,w" => ${array[*]} == a d r s w – caesarsol Oct 29 '15 at 14:45

Jim Ho ,Mar 14, 2013 at 2:20

Here is a way without setting IFS:
string="1:2:3:4:5"
set -f                      # avoid globbing (expansion of *).
array=(${string//:/ })
for i in "${!array[@]}"
do
    echo "$i=>${array[i]}"
done

The idea is using string replacement:

${string//substring/replacement}

to replace all matches of $substring with white space and then using the substituted string to initialize a array:

(element1 element2 ... elementN)

Note: this answer makes use of the split+glob operator . Thus, to prevent expansion of some characters (such as * ) it is a good idea to pause globbing for this script.

Werner Lehmann ,May 4, 2013 at 22:32

Used this approach... until I came across a long string to split. 100% CPU for more than a minute (then I killed it). It's a pity because this method allows to split by a string, not some character in IFS. – Werner Lehmann May 4 '13 at 22:32

Dieter Gribnitz ,Sep 2, 2014 at 15:46

WARNING: Just ran into a problem with this approach. If you have an element named * you will get all the elements of your cwd as well. thus string="1:2:3:4:*" will give some unexpected and possibly dangerous results depending on your implementation. Did not get the same error with (IFS=', ' read -a array <<< "$string") and this one seems safe to use. – Dieter Gribnitz Sep 2 '14 at 15:46

akostadinov ,Nov 6, 2014 at 14:31

not reliable for many kinds of values, use with care – akostadinov Nov 6 '14 at 14:31

Andrew White ,Jun 1, 2016 at 11:44

quoting ${string//:/ } prevents shell expansion – Andrew White Jun 1 '16 at 11:44

Mark Thomson ,Jun 5, 2016 at 20:44

I had to use the following on OSX: array=(${string//:/ }) – Mark Thomson Jun 5 '16 at 20:44

bgoldst ,Jul 19, 2017 at 21:20

All of the answers to this question are wrong in one way or another.

Wrong answer #1

IFS=', ' read -r -a array <<< "$string"

1: This is a misuse of $IFS . The value of the $IFS variable is not taken as a single variable-length string separator, rather it is taken as a set of single-character string separators, where each field that read splits off from the input line can be terminated by any character in the set (comma or space, in this example).

Actually, for the real sticklers out there, the full meaning of $IFS is slightly more involved. From the bash manual :

The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators. If IFS is unset, or its value is exactly <space><tab><newline> , the default, then sequences of <space> , <tab> , and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters <space> , <tab> , and <newline> are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.

Basically, for non-default non-null values of $IFS , fields can be separated with either (1) a sequence of one or more characters that are all from the set of "IFS whitespace characters" (that is, whichever of <space> , <tab> , and <newline> ("newline" meaning line feed (LF) ) are present anywhere in $IFS ), or (2) any non-"IFS whitespace character" that's present in $IFS along with whatever "IFS whitespace characters" surround it in the input line.

For the OP, it's possible that the second separation mode I described in the previous paragraph is exactly what he wants for his input string, but we can be pretty confident that the first separation mode I described is not correct at all. For example, what if his input string was 'Los Angeles, United States, North America' ?

IFS=', ' read -ra a <<<'Los Angeles, United States, North America'; declare -p a;
## declare -a a=([0]="Los" [1]="Angeles" [2]="United" [3]="States" [4]="North" [5]="America")

2: Even if you were to use this solution with a single-character separator (such as a comma by itself, that is, with no following space or other baggage), if the value of the $string variable happens to contain any LFs, then read will stop processing once it encounters the first LF. The read builtin only processes one line per invocation. This is true even if you are piping or redirecting input only to the read statement, as we are doing in this example with the here-string mechanism, and thus unprocessed input is guaranteed to be lost. The code that powers the read builtin has no knowledge of the data flow within its containing command structure.

You could argue that this is unlikely to cause a problem, but still, it's a subtle hazard that should be avoided if possible. It is caused by the fact that the read builtin actually does two levels of input splitting: first into lines, then into fields. Since the OP only wants one level of splitting, this usage of the read builtin is not appropriate, and we should avoid it.

3: A non-obvious potential issue with this solution is that read always drops the trailing field if it is empty, although it preserves empty fields otherwise. Here's a demo:

string=', , a, , b, c, , , '; IFS=', ' read -ra a <<<"$string"; declare -p a;
## declare -a a=([0]="" [1]="" [2]="a" [3]="" [4]="b" [5]="c" [6]="" [7]="")

Maybe the OP wouldn't care about this, but it's still a limitation worth knowing about. It reduces the robustness and generality of the solution.

This problem can be solved by appending a dummy trailing delimiter to the input string just prior to feeding it to read , as I will demonstrate later.


Wrong answer #2

string="1:2:3:4:5"
set -f                     # avoid globbing (expansion of *).
array=(${string//:/ })

Similar idea:

t="one,two,three"
a=($(echo $t | tr ',' "\n"))

(Note: I added the missing parentheses around the command substitution which the answerer seems to have omitted.)

Similar idea:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)

These solutions leverage word splitting in an array assignment to split the string into fields. Funnily enough, just like read , general word splitting also uses the $IFS special variable, although in this case it is implied that it is set to its default value of <space><tab><newline> , and therefore any sequence of one or more IFS characters (which are all whitespace characters now) is considered to be a field delimiter.

This solves the problem of two levels of splitting committed by read , since word splitting by itself constitutes only one level of splitting. But just as before, the problem here is that the individual fields in the input string can already contain $IFS characters, and thus they would be improperly split during the word splitting operation. This happens to not be the case for any of the sample input strings provided by these answerers (how convenient...), but of course that doesn't change the fact that any code base that used this idiom would then run the risk of blowing up if this assumption were ever violated at some point down the line. Once again, consider my counterexample of 'Los Angeles, United States, North America' (or 'Los Angeles:United States:North America' ).

Also, word splitting is normally followed by filename expansion ( aka pathname expansion aka globbing), which, if done, would potentially corrupt words containing the characters * , ? , or [ followed by ] (and, if extglob is set, parenthesized fragments preceded by ? , * , + , @ , or ! ) by matching them against file system objects and expanding the words ("globs") accordingly. The first of these three answerers has cleverly undercut this problem by running set -f beforehand to disable globbing. Technically this works (although you should probably add set +f afterward to reenable globbing for subsequent code which may depend on it), but it's undesirable to have to mess with global shell settings in order to hack a basic string-to-array parsing operation in local code.

Another issue with this answer is that all empty fields will be lost. This may or may not be a problem, depending on the application.

Note: If you're going to use this solution, it's better to use the ${string//:/ } "pattern substitution" form of parameter expansion , rather than going to the trouble of invoking a command substitution (which forks the shell), starting up a pipeline, and running an external executable ( tr or sed ), since parameter expansion is purely a shell-internal operation. (Also, for the tr and sed solutions, the input variable should be double-quoted inside the command substitution; otherwise word splitting would take effect in the echo command and potentially mess with the field values. Also, the $(...) form of command substitution is preferable to the old `...` form since it simplifies nesting of command substitutions and allows for better syntax highlighting by text editors.)


Wrong answer #3

str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','

This answer is almost the same as #2 . The difference is that the answerer has made the assumption that the fields are delimited by two characters, one of which being represented in the default $IFS , and the other not. He has solved this rather specific case by removing the non-IFS-represented character using a pattern substitution expansion and then using word splitting to split the fields on the surviving IFS-represented delimiter character.

This is not a very generic solution. Furthermore, it can be argued that the comma is really the "primary" delimiter character here, and that stripping it and then depending on the space character for field splitting is simply wrong. Once again, consider my counterexample: 'Los Angeles, United States, North America' .

Also, again, filename expansion could corrupt the expanded words, but this can be prevented by temporarily disabling globbing for the assignment with set -f and then set +f .

Also, again, all empty fields will be lost, which may or may not be a problem depending on the application.


Wrong answer #4

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

This is similar to #2 and #3 in that it uses word splitting to get the job done, only now the code explicitly sets $IFS to contain only the single-character field delimiter present in the input string. It should be repeated that this cannot work for multicharacter field delimiters such as the OP's comma-space delimiter. But for a single-character delimiter like the LF used in this example, it actually comes close to being perfect. The fields cannot be unintentionally split in the middle as we saw with previous wrong answers, and there is only one level of splitting, as required.

One problem is that filename expansion will corrupt affected words as described earlier, although once again this can be solved by wrapping the critical statement in set -f and set +f .

Another potential problem is that, since LF qualifies as an "IFS whitespace character" as defined earlier, all empty fields will be lost, just as in #2 and #3 . This would of course not be a problem if the delimiter happens to be a non-"IFS whitespace character", and depending on the application it may not matter anyway, but it does vitiate the generality of the solution.

So, to sum up, assuming you have a one-character delimiter, and it is either a non-"IFS whitespace character" or you don't care about empty fields, and you wrap the critical statement in set -f and set +f , then this solution works, but otherwise not.

(Also, for information's sake, assigning a LF to a variable in bash can be done more easily with the $'...' syntax, e.g. IFS=$'\n'; .)


Wrong answer #5

countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"

Similar idea:

IFS=', ' eval 'array=($string)'

This solution is effectively a cross between #1 (in that it sets $IFS to comma-space) and #2-4 (in that it uses word splitting to split the string into fields). Because of this, it suffers from most of the problems that afflict all of the above wrong answers, sort of like the worst of all worlds.

Also, regarding the second variant, it may seem like the eval call is completely unnecessary, since its argument is a single-quoted string literal, and therefore is statically known. But there's actually a very non-obvious benefit to using eval in this way. Normally, when you run a simple command which consists of a variable assignment only , meaning without an actual command word following it, the assignment takes effect in the shell environment:

IFS=', '; ## changes $IFS in the shell environment

This is true even if the simple command involves multiple variable assignments; again, as long as there's no command word, all variable assignments affect the shell environment:

IFS=', ' array=($countries); ## changes both $IFS and $array in the shell environment

But, if the variable assignment is attached to a command name (I like to call this a "prefix assignment") then it does not affect the shell environment, and instead only affects the environment of the executed command, regardless whether it is a builtin or external:

IFS=', ' :; ## : is a builtin command, the $IFS assignment does not outlive it
IFS=', ' env; ## env is an external command, the $IFS assignment does not outlive it

Relevant quote from the bash manual :

If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.

It is possible to exploit this feature of variable assignment to change $IFS only temporarily, which allows us to avoid the whole save-and-restore gambit like that which is being done with the $OIFS variable in the first variant. But the challenge we face here is that the command we need to run is itself a mere variable assignment, and hence it would not involve a command word to make the $IFS assignment temporary. You might think to yourself, well why not just add a no-op command word to the statement like the : builtin to make the $IFS assignment temporary? This does not work because it would then make the $array assignment temporary as well:

IFS=', ' array=($countries) :; ## fails; new $array value never escapes the : command

So, we're effectively at an impasse, a bit of a catch-22. But, when eval runs its code, it runs it in the shell environment, as if it was normal, static source code, and therefore we can run the $array assignment inside the eval argument to have it take effect in the shell environment, while the $IFS prefix assignment that is prefixed to the eval command will not outlive the eval command. This is exactly the trick that is being used in the second variant of this solution:

IFS=', ' eval 'array=($string)'; ## $IFS does not outlive the eval command, but $array does

So, as you can see, it's actually quite a clever trick, and accomplishes exactly what is required (at least with respect to assignment effectation) in a rather non-obvious way. I'm actually not against this trick in general, despite the involvement of eval ; just be careful to single-quote the argument string to guard against security threats.

But again, because of the "worst of all worlds" agglomeration of problems, this is still a wrong answer to the OP's requirement.


Wrong answer #6

IFS=', '; array=(Paris, France, Europe)

IFS=' ';declare -a array=(Paris France Europe)

Um... what? The OP has a string variable that needs to be parsed into an array. This "answer" starts with the verbatim contents of the input string pasted into an array literal. I guess that's one way to do it.

It looks like the answerer may have assumed that the $IFS variable affects all bash parsing in all contexts, which is not true. From the bash manual:

IFS The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is <space><tab><newline> .

So the $IFS special variable is actually only used in two contexts: (1) word splitting that is performed after expansion (meaning not when parsing bash source code) and (2) for splitting input lines into words by the read builtin.

Let me try to make this clearer. I think it might be good to draw a distinction between parsing and execution . Bash must first parse the source code, which obviously is a parsing event, and then later it executes the code, which is when expansion comes into the picture. Expansion is really an execution event. Furthermore, I take issue with the description of the $IFS variable that I just quoted above; rather than saying that word splitting is performed after expansion , I would say that word splitting is performed during expansion, or, perhaps even more precisely, word splitting is part of the expansion process. The phrase "word splitting" refers only to this step of expansion; it should never be used to refer to the parsing of bash source code, although unfortunately the docs do seem to throw around the words "split" and "words" a lot. Here's a relevant excerpt from the linux.die.net version of the bash manual:

Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion , tilde expansion , parameter and variable expansion , command substitution , arithmetic expansion , word splitting , and pathname expansion .

The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.

You could argue the GNU version of the manual does slightly better, since it opts for the word "tokens" instead of "words" in the first sentence of the Expansion section:

Expansion is performed on the command line after it has been split into tokens.

The important point is, $IFS does not change the way bash parses source code. Parsing of bash source code is actually a very complex process that involves recognition of the various elements of shell grammar, such as command sequences, command lists, pipelines, parameter expansions, arithmetic substitutions, and command substitutions. For the most part, the bash parsing process cannot be altered by user-level actions like variable assignments (actually, there are some minor exceptions to this rule; for example, see the various compatxx shell settings , which can change certain aspects of parsing behavior on-the-fly). The upstream "words"/"tokens" that result from this complex parsing process are then expanded according to the general process of "expansion" as broken down in the above documentation excerpts, where word splitting of the expanded (expanding?) text into downstream words is simply one step of that process. Word splitting only touches text that has been spit out of a preceding expansion step; it does not affect literal text that was parsed right off the source bytestream.


Wrong answer #7

string='first line
        second line
        third line'

while read -r line; do lines+=("$line"); done <<<"$string"

This is one of the best solutions. Notice that we're back to using read . Didn't I say earlier that read is inappropriate because it performs two levels of splitting, when we only need one? The trick here is that you can call read in such a way that it effectively only does one level of splitting, specifically by splitting off only one field per invocation, which necessitates the cost of having to call it repeatedly in a loop. It's a bit of a sleight of hand, but it works.

But there are problems. First: When you provide at least one NAME argument to read , it automatically ignores leading and trailing whitespace in each field that is split off from the input string. This occurs whether $IFS is set to its default value or not, as described earlier in this post. Now, the OP may not care about this for his specific use-case, and in fact, it may be a desirable feature of the parsing behavior. But not everyone who wants to parse a string into fields will want this. There is a solution, however: A somewhat non-obvious usage of read is to pass zero NAME arguments. In this case, read will store the entire input line that it gets from the input stream in a variable named $REPLY , and, as a bonus, it does not strip leading and trailing whitespace from the value. This is a very robust usage of read which I've exploited frequently in my shell programming career. Here's a demonstration of the difference in behavior:

string=$'  a  b  \n  c  d  \n  e  f  '; ## input string

a=(); while read -r line; do a+=("$line"); done <<<"$string"; declare -p a;
## declare -a a=([0]="a  b" [1]="c  d" [2]="e  f") ## read trimmed surrounding whitespace

a=(); while read -r; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="  a  b  " [1]="  c  d  " [2]="  e  f  ") ## no trimming

The second issue with this solution is that it does not actually address the case of a custom field separator, such as the OP's comma-space. As before, multicharacter separators are not supported, which is an unfortunate limitation of this solution. We could try to at least split on comma by specifying the separator to the -d option, but look what happens:

string='Paris, France, Europe';
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France")

Predictably, the unaccounted surrounding whitespace got pulled into the field values, and hence this would have to be corrected subsequently through trimming operations (this could also be done directly in the while-loop). But there's another obvious error: Europe is missing! What happened to it? The answer is that read returns a failing return code if it hits end-of-file (in this case we can call it end-of-string) without encountering a final field terminator on the final field. This causes the while-loop to break prematurely and we lose the final field.

Technically this same error afflicted the previous examples as well; the difference there is that the field separator was taken to be LF, which is the default when you don't specify the -d option, and the <<< ("here-string") mechanism automatically appends a LF to the string just before it feeds it as input to the command. Hence, in those cases, we sort of accidentally solved the problem of a dropped final field by unwittingly appending an additional dummy terminator to the input. Let's call this solution the "dummy-terminator" solution. We can apply the dummy-terminator solution manually for any custom delimiter by concatenating it against the input string ourselves when instantiating it in the here-string:

a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; declare -p a;
declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

There, problem solved. Another solution is to only break the while-loop if both (1) read returned failure and (2) $REPLY is empty, meaning read was not able to read any characters prior to hitting end-of-file. Demo:

a=(); while read -rd,|| [[ -n "$REPLY" ]]; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

This approach also reveals the secretive LF that automatically gets appended to the here-string by the <<< redirection operator. It could of course be stripped off separately through an explicit trimming operation as described a moment ago, but obviously the manual dummy-terminator approach solves it directly, so we could just go with that. The manual dummy-terminator solution is actually quite convenient in that it solves both of these two problems (the dropped-final-field problem and the appended-LF problem) in one go.

So, overall, this is quite a powerful solution. It's only remaining weakness is a lack of support for multicharacter delimiters, which I will address later.


Wrong answer #8

string='first line
        second line
        third line'

readarray -t lines <<<"$string"

(This is actually from the same post as #7 ; the answerer provided two solutions in the same post.)

The readarray builtin, which is a synonym for mapfile , is ideal. It's a builtin command which parses a bytestream into an array variable in one shot; no messing with loops, conditionals, substitutions, or anything else. And it doesn't surreptitiously strip any whitespace from the input string. And (if -O is not given) it conveniently clears the target array before assigning to it. But it's still not perfect, hence my criticism of it as a "wrong answer".

First, just to get this out of the way, note that, just like the behavior of read when doing field-parsing, readarray drops the trailing field if it is empty. Again, this is probably not a concern for the OP, but it could be for some use-cases. I'll come back to this in a moment.

Second, as before, it does not support multicharacter delimiters. I'll give a fix for this in a moment as well.

Third, the solution as written does not parse the OP's input string, and in fact, it cannot be used as-is to parse it. I'll expand on this momentarily as well.

For the above reasons, I still consider this to be a "wrong answer" to the OP's question. Below I'll give what I consider to be the right answer.


Right answer

Here's a naοve attempt to make #8 work by just specifying the -d option:

string='Paris, France, Europe';
readarray -td, a <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

We see the result is identical to the result we got from the double-conditional approach of the looping read solution discussed in #7 . We can almost solve this with the manual dummy-terminator trick:

readarray -td, a <<<"$string,"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe" [3]=$'\n')

The problem here is that readarray preserved the trailing field, since the <<< redirection operator appended the LF to the input string, and therefore the trailing field was not empty (otherwise it would've been dropped). We can take care of this by explicitly unsetting the final array element after-the-fact:

readarray -td, a <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

The only two problems that remain, which are actually related, are (1) the extraneous whitespace that needs to be trimmed, and (2) the lack of support for multicharacter delimiters.

The whitespace could of course be trimmed afterward (for example, see How to trim whitespace from a Bash variable? ). But if we can hack a multicharacter delimiter, then that would solve both problems in one shot.

Unfortunately, there's no direct way to get a multicharacter delimiter to work. The best solution I've thought of is to preprocess the input string to replace the multicharacter delimiter with a single-character delimiter that will be guaranteed not to collide with the contents of the input string. The only character that has this guarantee is the NUL byte . This is because, in bash (though not in zsh, incidentally), variables cannot contain the NUL byte. This preprocessing step can be done inline in a process substitution. Here's how to do it using awk :

readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; }' <<<"$string, "); unset 'a[-1]';
declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

There, finally! This solution will not erroneously split fields in the middle, will not cut out prematurely, will not drop empty fields, will not corrupt itself on filename expansions, will not automatically strip leading and trailing whitespace, will not leave a stowaway LF on the end, does not require loops, and does not settle for a single-character delimiter.


Trimming solution

Lastly, I wanted to demonstrate my own fairly intricate trimming solution using the obscure -C callback option of readarray . Unfortunately, I've run out of room against Stack Overflow's draconian 30,000 character post limit, so I won't be able to explain it. I'll leave that as an exercise for the reader.

function mfcb { local val="$4"; "$1"; eval "$2[$3]=\$val;"; };
function val_ltrim { if [[ "$val" =~ ^[[:space:]]+ ]]; then val="${val:${#BASH_REMATCH[0]}}"; fi; };
function val_rtrim { if [[ "$val" =~ [[:space:]]+$ ]]; then val="${val:0:${#val}-${#BASH_REMATCH[0]}}"; fi; };
function val_trim { val_ltrim; val_rtrim; };
readarray -c1 -C 'mfcb val_trim a' -td, <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

fbicknel ,Aug 18, 2017 at 15:57

It may also be helpful to note (though understandably you had no room to do so) that the -d option to readarray first appears in Bash 4.4. – fbicknel Aug 18 '17 at 15:57

Cyril Duchon-Doris ,Nov 3, 2017 at 9:16

You should add a "TL;DR : scroll 3 pages to see the right solution at the end of my answer" – Cyril Duchon-Doris Nov 3 '17 at 9:16

dawg ,Nov 26, 2017 at 22:28

Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,"\0"); print }' and eliminate that concatenation of the final ", " then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,"\0"); print; }' <<<"$string") on Bash that supports readarray . Note your method is Bash 4.4+ I think because of the -d in readarray – dawg Nov 26 '17 at 22:28

datUser ,Feb 22, 2018 at 14:54

Looks like readarray is not an available builtin on OSX. – datUser Feb 22 '18 at 14:54

bgoldst ,Feb 23, 2018 at 3:37

@datUser That's unfortunate. Your version of bash must be too old for readarray . In this case, you can use the second-best solution built on read . I'm referring to this: a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; (with the awk substitution if you need multicharacter delimiter support). Let me know if you run into any problems; I'm pretty sure this solution should work on fairly old versions of bash, back to version 2-something, released like two decades ago. – bgoldst Feb 23 '18 at 3:37

Jmoney38 ,Jul 14, 2015 at 11:54

t="one,two,three"
a=($(echo "$t" | tr ',' '\n'))
echo "${a[2]}"

Prints three

shrimpwagon ,Oct 16, 2015 at 20:04

I actually prefer this approach. Simple. – shrimpwagon Oct 16 '15 at 20:04

Ben ,Oct 31, 2015 at 3:11

I copied and pasted this and it did did not work with echo, but did work when I used it in a for loop. – Ben Oct 31 '15 at 3:11

Pinaki Mukherjee ,Nov 9, 2015 at 20:22

This is the simplest approach. thanks – Pinaki Mukherjee Nov 9 '15 at 20:22

abalter ,Aug 30, 2016 at 5:13

This does not work as stated. @Jmoney38 or shrimpwagon if you can paste this in a terminal and get the desired output, please paste the result here. – abalter Aug 30 '16 at 5:13

leaf ,Jul 17, 2017 at 16:28

@abalter Works for me with a=($(echo $t | tr ',' "\n")) . Same result with a=($(echo $t | tr ',' ' ')) . – leaf Jul 17 '17 at 16:28

Luca Borrione ,Nov 2, 2012 at 13:44

Sometimes it happened to me that the method described in the accepted answer didn't work, especially if the separator is a carriage return.
In those cases I solved in this way:
string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

for line in "${lines[@]}"
    do
        echo "--> $line"
done

Stefan van den Akker ,Feb 9, 2015 at 16:52

+1 This completely worked for me. I needed to put multiple strings, divided by a newline, into an array, and read -a arr <<< "$strings" did not work with IFS=$'\n' . – Stefan van den Akker Feb 9 '15 at 16:52

Stefan van den Akker ,Feb 10, 2015 at 13:49

Here is the answer to make the accepted answer work when the delimiter is a newline . – Stefan van den Akker Feb 10 '15 at 13:49

,Jul 24, 2015 at 21:24

The accepted answer works for values in one line.
If the variable has several lines:
string='first line
        second line
        third line'

We need a very different command to get all lines:

while read -r line; do lines+=("$line"); done <<<"$string"

Or the much simpler bash readarray :

readarray -t lines <<<"$string"

Printing all lines is very easy taking advantage of a printf feature:

printf ">[%s]\n" "${lines[@]}"

>[first line]
>[        second line]
>[        third line]

Mayhem ,Dec 31, 2015 at 3:13

While not every solution works for every situation, your mention of readarray... replaced my last two hours with 5 minutes... you got my vote – Mayhem Dec 31 '15 at 3:13

Derek 朕會功夫 ,Mar 23, 2018 at 19:14

readarray is the right answer. – Derek 朕會功夫 Mar 23 '18 at 19:14

ssanch ,Jun 3, 2016 at 15:24

This is similar to the approach by Jmoney38, but using sed:
string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)
echo ${array[0]}

Prints 1

dawg ,Nov 26, 2017 at 19:59

The key to splitting your string into an array is the multi character delimiter of ", " . Any solution using IFS for multi character delimiters is inherently wrong since IFS is a set of those characters, not a string.

If you assign IFS=", " then the string will break on EITHER "," OR " " or any combination of them which is not an accurate representation of the two character delimiter of ", " .

You can use awk or sed to split the string, with process substitution:

#!/bin/bash

str="Paris, France, Europe"
array=()
while read -r -d $'\0' each; do   # use a NUL terminated field separator 
    array+=("$each")
done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"\0"); print }')
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output

It is more efficient to use a regex you directly in Bash:

#!/bin/bash

str="Paris, France, Europe"

array=()
while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do
    array+=("${BASH_REMATCH[1]}")   # capture the field
    i=${#BASH_REMATCH}              # length of field + delimiter
    str=${str:i}                    # advance the string by that length
done                                # the loop deletes $str, so make a copy if needed

declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...

With the second form, there is no sub shell and it will be inherently faster.


Edit by bgoldst: Here are some benchmarks comparing my readarray solution to dawg's regex solution, and I also included the read solution for the heck of it (note: I slightly modified the regex solution for greater harmony with my solution) (also see my comments below the post):

## competitors
function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); unset 'a[-1]'; };
function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); };
function c_regex { a=(); local s="$1, "; while [[ $s =~ ([^,]+),\  ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; };

## helper functions
function rep {
    local -i i=-1;
    for ((i = 0; i<$1; ++i)); do
        printf %s "$2";
    done;
}; ## end rep()

function testAll {
    local funcs=();
    local args=();
    local func='';
    local -i rc=-1;
    while [[ "$1" != ':' ]]; do
        func="$1";
        if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then
            echo "bad function name: $func" >&2;
            return 2;
        fi;
        funcs+=("$func");
        shift;
    done;
    shift;
    args=("$@");
    for func in "${funcs[@]}"; do
        echo -n "$func ";
        { time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/';
        rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi;
    done| column -ts/;
}; ## end testAll()

function makeStringToSplit {
    local -i n=$1; ## number of fields
    if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi;
    if [[ $n -eq 0 ]]; then
        echo;
    elif [[ $n -eq 1 ]]; then
        echo 'first field';
    elif [[ "$n" -eq 2 ]]; then
        echo 'first field, last field';
    else
        echo "first field, $(rep $[$1-2] 'mid field, ')last field";
    fi;
}; ## end makeStringToSplit()

function testAll_splitIntoArray {
    local -i n=$1; ## number of fields in input string
    local s='';
    echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) =====";
    s="$(makeStringToSplit "$n")";
    testAll c_readarray c_read c_regex : "$s";
}; ## end testAll_splitIntoArray()

## results
testAll_splitIntoArray 1;
## ===== 1 field =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.000s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 10;
## ===== 10 fields =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.001s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 100;
## ===== 100 fields =====
## c_readarray   real  0m0.069s   user 0m0.000s   sys  0m0.062s
## c_read        real  0m0.065s   user 0m0.000s   sys  0m0.046s
## c_regex       real  0m0.005s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 1000;
## ===== 1000 fields =====
## c_readarray   real  0m0.084s   user 0m0.031s   sys  0m0.077s
## c_read        real  0m0.092s   user 0m0.031s   sys  0m0.046s
## c_regex       real  0m0.125s   user 0m0.125s   sys  0m0.000s
##
testAll_splitIntoArray 10000;
## ===== 10000 fields =====
## c_readarray   real  0m0.209s   user 0m0.093s   sys  0m0.108s
## c_read        real  0m0.333s   user 0m0.234s   sys  0m0.109s
## c_regex       real  0m9.095s   user 0m9.078s   sys  0m0.000s
##
testAll_splitIntoArray 100000;
## ===== 100000 fields =====
## c_readarray   real  0m1.460s   user 0m0.326s   sys  0m1.124s
## c_read        real  0m2.780s   user 0m1.686s   sys  0m1.092s
## c_regex       real  17m38.208s   user 15m16.359s   sys  2m19.375s
##

bgoldst ,Nov 27, 2017 at 4:28

Very cool solution! I never thought of using a loop on a regex match, nifty use of $BASH_REMATCH . It works, and does indeed avoid spawning subshells. +1 from me. However, by way of criticism, the regex itself is a little non-ideal, in that it appears you were forced to duplicate part of the delimiter token (specifically the comma) so as to work around the lack of support for non-greedy multipliers (also lookarounds) in ERE ("extended" regex flavor built into bash). This makes it a little less generic and robust. – bgoldst Nov 27 '17 at 4:28

bgoldst ,Nov 27, 2017 at 4:28

Secondly, I did some benchmarking, and although the performance is better than the other solutions for smallish strings, it worsens exponentially due to the repeated string-rebuilding, becoming catastrophic for very large strings. See my edit to your answer. – bgoldst Nov 27 '17 at 4:28

dawg ,Nov 27, 2017 at 4:46

@bgoldst: What a cool benchmark! In defense of the regex, for 10's or 100's of thousands of fields (what the regex is splitting) there would probably be some form of record (like \n delimited text lines) comprising those fields so the catastrophic slow-down would likely not occur. If you have a string with 100,000 fields -- maybe Bash is not ideal ;-) Thanks for the benchmark. I learned a thing or two. – dawg Nov 27 '17 at 4:46

Geoff Lee ,Mar 4, 2016 at 6:02

Try this
IFS=', '; array=(Paris, France, Europe)
for item in ${array[@]}; do echo $item; done

It's simple. If you want, you can also add a declare (and also remove the commas):

IFS=' ';declare -a array=(Paris France Europe)

The IFS is added to undo the above but it works without it in a fresh bash instance

MrPotatoHead ,Nov 13, 2018 at 13:19

Pure bash multi-character delimiter solution.

As others have pointed out in this thread, the OP's question gave an example of a comma delimited string to be parsed into an array, but did not indicate if he/she was only interested in comma delimiters, single character delimiters, or multi-character delimiters.

Since Google tends to rank this answer at or near the top of search results, I wanted to provide readers with a strong answer to the question of multiple character delimiters, since that is also mentioned in at least one response.

If you're in search of a solution to a multi-character delimiter problem, I suggest reviewing Mallikarjun M 's post, in particular the response from gniourf_gniourf who provides this elegant pure BASH solution using parameter expansion:

#!/bin/bash
str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
    array+=( "${s%%"$delimiter"*}" );
    s=${s#*"$delimiter"};
done;
declare -p array

Link to cited comment/referenced post

Link to cited question: Howto split a string on a multi-character delimiter in bash?

Eduardo Cuomo ,Dec 19, 2016 at 15:27

Use this:
countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"

#${array[1]} == Paris
#${array[2]} == France
#${array[3]} == Europe

gniourf_gniourf ,Dec 19, 2016 at 17:22

Bad: subject to word splitting and pathname expansion. Please don't revive old questions with good answers to give bad answers. – gniourf_gniourf Dec 19 '16 at 17:22

Scott Weldon ,Dec 19, 2016 at 18:12

This may be a bad answer, but it is still a valid answer. Flaggers / reviewers: For incorrect answers such as this one, downvote, don't delete! – Scott Weldon Dec 19 '16 at 18:12

George Sovetov ,Dec 26, 2016 at 17:31

@gniourf_gniourf Could you please explain why it is a bad answer? I really don't understand when it fails. – George Sovetov Dec 26 '16 at 17:31

gniourf_gniourf ,Dec 26, 2016 at 18:07

@GeorgeSovetov: As I said, it's subject to word splitting and pathname expansion. More generally, splitting a string into an array as array=( $string ) is a (sadly very common) antipattern: word splitting occurs: string='Prague, Czech Republic, Europe' ; Pathname expansion occurs: string='foo[abcd],bar[efgh]' will fail if you have a file named, e.g., food or barf in your directory. The only valid usage of such a construct is when string is a glob. – gniourf_gniourf Dec 26 '16 at 18:07

user1009908 ,Jun 9, 2015 at 23:28

UPDATE: Don't do this, due to problems with eval.

With slightly less ceremony:

IFS=', ' eval 'array=($string)'

e.g.

string="foo, bar,baz"
IFS=', ' eval 'array=($string)'
echo ${array[1]} # -> bar

caesarsol ,Oct 29, 2015 at 14:42

eval is evil! don't do this. – caesarsol Oct 29 '15 at 14:42

user1009908 ,Oct 30, 2015 at 4:05

Pfft. No. If you're writing scripts large enough for this to matter, you're doing it wrong. In application code, eval is evil. In shell scripting, it's common, necessary, and inconsequential. – user1009908 Oct 30 '15 at 4:05

caesarsol ,Nov 2, 2015 at 18:19

put a $ in your variable and you'll see... I write many scripts and I never ever had to use a single eval – caesarsol Nov 2 '15 at 18:19

Dennis Williamson ,Dec 2, 2015 at 17:00

Eval command and security issues – Dennis Williamson Dec 2 '15 at 17:00

user1009908 ,Dec 22, 2015 at 23:04

You're right, this is only usable when the input is known to be clean. Not a robust solution. – user1009908 Dec 22 '15 at 23:04

Eduardo Lucio ,Jan 31, 2018 at 20:45

Here's my hack!

Splitting strings by strings is a pretty boring thing to do using bash. What happens is that we have limited approaches that only work in a few cases (split by ";", "/", "." and so on) or we have a variety of side effects in the outputs.

The approach below has required a number of maneuvers, but I believe it will work for most of our needs!

#!/bin/bash

# --------------------------------------
# SPLIT FUNCTION
# ----------------

F_SPLIT_R=()
f_split() {
    : 'It does a "split" into a given string and returns an array.

    Args:
        TARGET_P (str): Target string to "split".
        DELIMITER_P (Optional[str]): Delimiter used to "split". If not 
    informed the split will be done by spaces.

    Returns:
        F_SPLIT_R (array): Array with the provided string separated by the 
    informed delimiter.
    '

    F_SPLIT_R=()
    TARGET_P=$1
    DELIMITER_P=$2
    if [ -z "$DELIMITER_P" ] ; then
        DELIMITER_P=" "
    fi

    REMOVE_N=1
    if [ "$DELIMITER_P" == "\n" ] ; then
        REMOVE_N=0
    fi

    # NOTE: This was the only parameter that has been a problem so far! 
    # By Questor
    # [Ref.: https://unix.stackexchange.com/a/390732/61742]
    if [ "$DELIMITER_P" == "./" ] ; then
        DELIMITER_P="[.]/"
    fi

    if [ ${REMOVE_N} -eq 1 ] ; then

        # NOTE: Due to bash limitations we have some problems getting the 
        # output of a split by awk inside an array and so we need to use 
        # "line break" (\n) to succeed. Seen this, we remove the line breaks 
        # momentarily afterwards we reintegrate them. The problem is that if 
        # there is a line break in the "string" informed, this line break will 
        # be lost, that is, it is erroneously removed in the output! 
        # By Questor
        TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("\n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")

    fi

    # NOTE: The replace of "\n" by "3F2C417D448C46918289218B7337FCAF" results 
    # in more occurrences of "3F2C417D448C46918289218B7337FCAF" than the 
    # amount of "\n" that there was originally in the string (one more 
    # occurrence at the end of the string)! We can not explain the reason for 
    # this side effect. The line below corrects this problem! By Questor
    TARGET_P=${TARGET_P%????????????????????????????????}

    SPLIT_NOW=$(awk -F"$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%s\n", $i}}' <<< "${TARGET_P}")

    while IFS= read -r LINE_NOW ; do
        if [ ${REMOVE_N} -eq 1 ] ; then

            # NOTE: We use "'" to prevent blank lines with no other characters 
            # in the sequence being erroneously removed! We do not know the 
            # reason for this side effect! By Questor
            LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "\n"); printf $0}' <<< "'${LINE_NOW}'")

            # NOTE: We use the commands below to revert the intervention made 
            # immediately above! By Questor
            LN_NOW_WITH_N=${LN_NOW_WITH_N%?}
            LN_NOW_WITH_N=${LN_NOW_WITH_N#?}

            F_SPLIT_R+=("$LN_NOW_WITH_N")
        else
            F_SPLIT_R+=("$LINE_NOW")
        fi
    done <<< "$SPLIT_NOW"
}

# --------------------------------------
# HOW TO USE
# ----------------

STRING_TO_SPLIT="
 * How do I list all databases and tables using psql?

\"
sudo -u postgres /usr/pgsql-9.4/bin/psql -c \"\l\"
sudo -u postgres /usr/pgsql-9.4/bin/psql <DB_NAME> -c \"\dt\"
\"

\"
\list or \l: list all databases
\dt: list all tables in the current database
\"

[Ref.: https://dba.stackexchange.com/questions/1285/how-do-i-list-all-databases-and-tables-using-psql]


"

f_split "$STRING_TO_SPLIT" "bin/psql -c"

# --------------------------------------
# OUTPUT AND TEST
# ----------------

ARR_LENGTH=${#F_SPLIT_R[*]}
for (( i=0; i<=$(( $ARR_LENGTH -1 )); i++ )) ; do
    echo " > -----------------------------------------"
    echo "${F_SPLIT_R[$i]}"
    echo " < -----------------------------------------"
done

if [ "$STRING_TO_SPLIT" == "${F_SPLIT_R[0]}bin/psql -c${F_SPLIT_R[1]}" ] ; then
    echo " > -----------------------------------------"
    echo "The strings are the same!"
    echo " < -----------------------------------------"
fi

sel-en-ium ,May 31, 2018 at 5:56

Another way to do it without modifying IFS:
read -r -a myarray <<< "${string//, /$IFS}"

Rather than changing IFS to match our desired delimiter, we can replace all occurrences of our desired delimiter ", " with contents of $IFS via "${string//, /$IFS}" .

Maybe this will be slow for very large strings though?

This is based on Dennis Williamson's answer.

rsjethani ,Sep 13, 2016 at 16:21

Another approach can be:
str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','

After this 'arr' is an array with four strings. This doesn't require dealing IFS or read or any other special stuff hence much simpler and direct.

gniourf_gniourf ,Dec 26, 2016 at 18:12

Same (sadly common) antipattern as other answers: subject to word splitting and filename expansion. – gniourf_gniourf Dec 26 '16 at 18:12

Safter Arslan ,Aug 9, 2017 at 3:21

Another way would be:
string="Paris, France, Europe"
IFS=', ' arr=(${string})

Now your elements are stored in "arr" array. To iterate through the elements:

for i in ${arr[@]}; do echo $i; done

bgoldst ,Aug 13, 2017 at 22:38

I cover this idea in my answer ; see Wrong answer #5 (you might be especially interested in my discussion of the eval trick). Your solution leaves $IFS set to the comma-space value after-the-fact. – bgoldst Aug 13 '17 at 22:38

[Jan 28, 2019] regex - Safe rm -rf function in shell script

Jan 28, 2019 | stackoverflow.com

community wiki
5 revs
,May 23, 2017 at 12:26

This question is similar to What is the safest way to empty a directory in *nix?

I'm writing bash script which defines several path constants and will use them for file and directory manipulation (copying, renaming and deleting). Often it will be necessary to do something like:

rm -rf "/${PATH1}"
rm -rf "${PATH2}/"*

While developing this script I'd want to protect myself from mistyping names like PATH1 and PATH2 and avoid situations where they are expanded to empty string, thus resulting in wiping whole disk. I decided to create special wrapper:

rmrf() {
    if [[ $1 =~ "regex" ]]; then
        echo "Ignoring possibly unsafe path ${1}"
        exit 1
    fi

    shopt -s dotglob
    rm -rf -- $1
    shopt -u dotglob
}

Which will be called as:

rmrf "/${PATH1}"
rmrf "${PATH2}/"*

Regex (or sed expression) should catch paths like "*", "/*", "/**/", "///*" etc. but allow paths like "dir", "/dir", "/dir1/dir2/", "/dir1/dir2/*". Also I don't know how to enable shell globbing in case like "/dir with space/*". Any ideas?

EDIT: this is what I came up with so far:

rmrf() {
    local RES
    local RMPATH="${1}"
    SAFE=$(echo "${RMPATH}" | sed -r 's:^((\.?\*+/+)+.*|(/+\.?\*+)+.*|[\.\*/]+|.*/\.\*+)$::g')
    if [ -z "${SAFE}" ]; then
        echo "ERROR! Unsafe deletion of ${RMPATH}"
        return 1
    fi

    shopt -s dotglob
    if [ '*' == "${RMPATH: -1}" ]; then
        echo rm -rf -- "${RMPATH/%\*/}"*
        RES=$?
    else
        echo rm -rf -- "${RMPATH}"
        RES=$?
    fi
    shopt -u dotglob

    return $RES
}

Intended use is (note an asterisk inside quotes):

rmrf "${SOMEPATH}"
rmrf "${SOMEPATH}/*"

where $SOMEPATH is not system or /home directory (in my case all such operations are performed on filesystem mounted under /scratch directory).

CAVEATS:

SpliFF ,Jun 14, 2009 at 13:45

I've found a big danger with rm in bash is that bash usually doesn't stop for errors. That means that:
cd $SOMEPATH
rm -rf *

Is a very dangerous combination if the change directory fails. A safer way would be:

cd $SOMEPATH && rm -rf *

Which will ensure the rf won't run unless you are really in $SOMEPATH. This doesn't protect you from a bad $SOMEPATH but it can be combined with the advice given by others to help make your script safer.

EDIT: @placeybordeaux makes a good point that if $SOMEPATH is undefined or empty cd doesn't treat it as an error and returns 0. In light of that this answer should be considered unsafe unless $SOMEPATH is validated as existing and non-empty first. I believe cd with no args should be an illegal command since at best is performs a no-op and at worse it can lead to unexpected behaviour but it is what it is.

Sazzad Hissain Khan ,Jul 6, 2017 at 11:45

nice trick, I am one stupid victim. – Sazzad Hissain Khan Jul 6 '17 at 11:45

placeybordeaux ,Jun 21, 2018 at 22:59

If $SOMEPATH is empty won't this rm -rf the user's home directory? – placeybordeaux Jun 21 '18 at 22:59

SpliFF ,Jun 27, 2018 at 4:10

@placeybordeaux The && only runs the second command if the first succeeds - so if cd fails rm never runs – SpliFF Jun 27 '18 at 4:10

placeybordeaux ,Jul 3, 2018 at 18:46

@SpliFF at least in ZSH the return value of cd $NONEXISTANTVAR is 0placeybordeaux Jul 3 '18 at 18:46

ruakh ,Jul 13, 2018 at 6:46

Instead of cd $SOMEPATH , you should write cd "${SOMEPATH?}" . The ${varname?} notation ensures that the expansion fails with a warning-message if the variable is unset or empty (such that the && ... part is never run); the double-quotes ensure that special characters in $SOMEPATH , such as whitespace, don't have undesired effects. – ruakh Jul 13 '18 at 6:46

community wiki
2 revs
,Jul 24, 2009 at 22:36

There is a set -u bash directive that will cause exit, when uninitialized variable is used. I read about it here , with rm -rf as an example. I think that's what you're looking for. And here is set's manual .

,Jun 14, 2009 at 12:38

I think "rm" command has a parameter to avoid the deleting of "/". Check it out.

Max ,Jun 14, 2009 at 12:56

Thanks! I didn't know about such option. Actually it is named --preserve-root and is not mentioned in the manpage. – Max Jun 14 '09 at 12:56

Max ,Jun 14, 2009 at 13:18

On my system this option is on by default, but it cat't help in case like rm -ri /* – Max Jun 14 '09 at 13:18

ynimous ,Jun 14, 2009 at 12:42

I would recomend to use realpath(1) and not the command argument directly, so that you can avoid things like /A/B/../ or symbolic links.

Max ,Jun 14, 2009 at 13:30

Useful but non-standard command. I've found possible bash replacement: archlinux.org/pipermail/pacman-dev/2009-February/008130.htmlMax Jun 14 '09 at 13:30

Jonathan Leffler ,Jun 14, 2009 at 12:47

Generally, when I'm developing a command with operations such as ' rm -fr ' in it, I will neutralize the remove during development. One way of doing that is:
RMRF="echo rm -rf"
...
$RMRF "/${PATH1}"

This shows me what should be deleted - but does not delete it. I will do a manual clean up while things are under development - it is a small price to pay for not running the risk of screwing up everything.

The notation ' "/${PATH1}" ' is a little unusual; normally, you would ensure that PATH1 simply contains an absolute pathname.

Using the metacharacter with ' "${PATH2}/"* ' is unwise and unnecessary. The only difference between using that and using just ' "${PATH2}" ' is that if the directory specified by PATH2 contains any files or directories with names starting with dot, then those files or directories will not be removed. Such a design is unlikely and is rather fragile. It would be much simpler just to pass PATH2 and let the recursive remove do its job. Adding the trailing slash is not necessarily a bad idea; the system would have to ensure that $PATH2 contains a directory name, not just a file name, but the extra protection is rather minimal.

Using globbing with ' rm -fr ' is usually a bad idea. You want to be precise and restrictive and limiting in what it does - to prevent accidents. Of course, you'd never run the command (shell script you are developing) as root while it is under development - that would be suicidal. Or, if root privileges are absolutely necessary, you neutralize the remove operation until you are confident it is bullet-proof.

Max ,Jun 14, 2009 at 13:09

To delete subdirectories and files starting with dot I use "shopt -s dotglob". Using rm -rf "${PATH2}" is not appropriate because in my case PATH2 can be only removed by superuser and this results in error status for "rm" command (and I verify it to track other errors). – Max Jun 14 '09 at 13:09

Jonathan Leffler ,Jun 14, 2009 at 13:37

Then, with due respect, you should use a private sub-directory under $PATH2 that you can remove. Avoid glob expansion with commands like 'rm -rf' like you would avoid the plague (or should that be A/H1N1?). – Jonathan Leffler Jun 14 '09 at 13:37

Max ,Jun 14, 2009 at 14:10

Meanwhile I've found this perl project: http://code.google.com/p/safe-rm/

community wiki
too much php
,Jun 15, 2009 at 1:55

If it is possible, you should try and put everything into a folder with a hard-coded name which is unlikely to be found anywhere else on the filesystem, such as ' foofolder '. Then you can write your rmrf() function as:
rmrf() {
    rm -rf "foofolder/$PATH1"
    # or
    rm -rf "$PATH1/foofolder"
}

There is no way that function can delete anything but the files you want it to.

vadipp ,Jan 13, 2017 at 11:37

Actually there is a way: if PATH1 is something like ../../someotherdirvadipp Jan 13 '17 at 11:37

community wiki
btop
,Jun 15, 2009 at 6:34

You may use
set -f    # cf. help set

to disable filename generation (*).

community wiki
Howard Hong
,Oct 28, 2009 at 19:56

You don't need to use regular expressions.
Just assign the directories you want to protect to a variable and then iterate over the variable. eg:
protected_dirs="/ /bin /usr/bin /home $HOME"
for d in $protected_dirs; do
    if [ "$1" = "$d" ]; then
        rm=0
        break;
    fi
done
if [ ${rm:-1} -eq 1 ]; then
    rm -rf $1
fi

,

Add the following codes to your ~/.bashrc
# safe delete
move_to_trash () { now="$(date +%Y%m%d_%H%M%S)"; mv "$@" ~/.local/share/Trash/files/"$@_$now"; }
alias del='move_to_trash'

# safe rm
alias rmi='rm -i'

Every time you need to rm something, first consider del , you can change the trash folder. If you do need to rm something, you could go to the trash folder and use rmi .

One small bug for del is that when del a folder, for example, my_folder , it should be del my_folder but not del my_folder/ since in order for possible later restore, I attach the time information in the end ( "$@_$now" ). For files, it works fine.

[Jan 17, 2019] How do I launch the default web browser in Perl on any operating system

Jan 17, 2019 | stackoverflow.com

The second hit on "open url" at search.cpan brings up Browser::Open:

use Browser::Open qw( open_browser );

my $url = 'http://www.google.com/';
open_browser($url);

If your OS isn't supported, send a patch or a bug report.

--cjm

More at Stack Overflow More at Stack Overflow

[Jan 10, 2019] linux - How does cat EOF work in bash - Stack Overflow

Notable quotes:
"... The $sql variable now holds the new-line characters too. You can verify with echo -e "$sql" . ..."
"... The print.sh file now contains: ..."
"... The b.txt file contains bar and baz lines. The same output is printed to stdout . ..."
Jan 10, 2019 | stackoverflow.com

How does "cat << EOF" work in bash? Ask Question 454


hasen ,Mar 23, 2010 at 13:57

I needed to write a script to enter multi-line input to a program ( psql ).

After a bit of googling, I found the following syntax works:

cat << EOF | psql ---params
BEGIN;

`pg_dump ----something`

update table .... statement ...;

END;
EOF

This correctly constructs the multi-line string (from BEGIN; to END; , inclusive) and pipes it as an input to psql .

But I have no idea how/why it works, can some one please explain?

I'm referring mainly to cat << EOF , I know > outputs to a file, >> appends to a file, < reads input from file.

What does << exactly do?

And is there a man page for it?

Dennis Williamson ,Mar 23, 2010 at 18:28

That's probably a useless use of cat . Try psql ... << EOF ... See also "here strings". mywiki.wooledge.org/BashGuide/InputAndOutput?#Here_StringsDennis Williamson Mar 23 '10 at 18:28

hasen ,Mar 23, 2010 at 18:54

@Dennis: good point, and thanks for the link! – hasen Mar 23 '10 at 18:54

Alex ,Mar 23, 2015 at 23:31

I'm surprised it works with cat but not with echo. cat should expect a file name as stdin, not a char string. psql << EOF sounds logical, but not othewise. Works with cat but not with echo. Strange behaviour. Any clue about that? – Alex Mar 23 '15 at 23:31

Alex ,Mar 23, 2015 at 23:39

Answering to myself: cat without parameters executes and replicates to the output whatever send via input (stdin), hence using its output to fill the file via >. In fact a file name read as a parameter is not a stdin stream. – Alex Mar 23 '15 at 23:39

The-null-Pointer- ,Jan 1, 2018 at 18:03

@Alex echo just prints it's command line arguments while cat reads stding(when piped to it) or reads a file that corresponds to it's command line args – The-null-Pointer- Jan 1 '18 at 18:03

kennytm ,Mar 23, 2010 at 13:58

This is called heredoc format to provide a string into stdin. See https://en.wikipedia.org/wiki/Here_document#Unix_shells for more details.

From man bash :

Here Documents

This type of redirection instructs the shell to read input from the current source until a line containing only word (with no trailing blanks) is seen.

All of the lines read up to that point are then used as the standard input for a command.

The format of here-documents is:

          <<[-]word
                  here-document
          delimiter

No parameter expansion, command substitution, arithmetic expansion, or pathname expansion is performed on word . If any characters in word are quoted, the delimiter is the result of quote removal on word , and the lines in the here-document are not expanded. If word is unquoted, all lines of the here-document are subjected to parameter expansion, command substitution, and arithmetic expansion. In the latter case, the character sequence \<newline> is ignored, and \ must be used to quote the characters \ , $ , and ` .

If the redirection operator is <<- , then all leading tab characters are stripped from input lines and the line containing delimiter . This allows here-documents within shell scripts to be indented in a natural fashion.

Xeoncross ,May 26, 2011 at 22:51

I was having the hardest time disabling variable/parameter expansion. All I needed to do was use "double-quotes" and that fixed it! Thanks for the info! – Xeoncross May 26 '11 at 22:51

trkoch ,Nov 10, 2015 at 17:23

Concerning <<- please note that only leading tab characters are stripped -- not soft tab characters. This is one of those rare case when you actually need the tab character. If the rest of your document uses soft tabs, make sure to show invisible characters and (e.g.) copy and paste a tab character. If you do it right, your syntax highlighting should correctly catch the ending delimiter. – trkoch Nov 10 '15 at 17:23

BrDaHa ,Jul 13, 2017 at 19:01

I don't see how this answer is more helpful than the ones below. It merely regurgitates information that can be found in other places (that have likely already been checked) – BrDaHa Jul 13 '17 at 19:01

Vojtech Vitek ,Feb 4, 2014 at 10:28

The cat <<EOF syntax is very useful when working with multi-line text in Bash, eg. when assigning multi-line string to a shell variable, file or a pipe. Examples of cat <<EOF syntax usage in Bash: 1. Assign multi-line string to a shell variable
$ sql=$(cat <<EOF
SELECT foo, bar FROM db
WHERE foo='baz'
EOF
)

The $sql variable now holds the new-line characters too. You can verify with echo -e "$sql" .

2. Pass multi-line string to a file in Bash
$ cat <<EOF > print.sh
#!/bin/bash
echo \$PWD
echo $PWD
EOF

The print.sh file now contains:

#!/bin/bash
echo $PWD
echo /home/user
3. Pass multi-line string to a pipe in Bash
$ cat <<EOF | grep 'b' | tee b.txt
foo
bar
baz
EOF

The b.txt file contains bar and baz lines. The same output is printed to stdout .

edelans ,Aug 22, 2014 at 8:48

In your case, "EOF" is known as a "Here Tag". Basically <<Here tells the shell that you are going to enter a multiline string until the "tag" Here . You can name this tag as you want, it's often EOF or STOP .

Some rules about the Here tags:

  1. The tag can be any string, uppercase or lowercase, though most people use uppercase by convention.
  2. The tag will not be considered as a Here tag if there are other words in that line. In this case, it will merely be considered part of the string. The tag should be by itself on a separate line, to be considered a tag.
  3. The tag should have no leading or trailing spaces in that line to be considered a tag. Otherwise it will be considered as part of the string.

example:

$ cat >> test <<HERE
> Hello world HERE <-- Not by itself on a separate line -> not considered end of string
> This is a test
>  HERE <-- Leading space, so not considered end of string
> and a new line
> HERE <-- Now we have the end of the string

oemb1905 ,Feb 22, 2017 at 7:17

this is the best actual answer ... you define both and clearly state the primary purpose of the use instead of related theory ... which is important but not necessary ... thanks - super helpful – oemb1905 Feb 22 '17 at 7:17

The-null-Pointer- ,Jan 1, 2018 at 18:05

@edelans you must add that when <<- is used leading tab will not prevent the tag from being recognized – The-null-Pointer- Jan 1 '18 at 18:05

JawSaw ,Oct 28, 2018 at 13:44

your answer clicked me on "you are going to enter a multiline string" – JawSaw Oct 28 '18 at 13:44

Ciro Santilli 新疆改造中心 六四事件 法轮功 ,Jun 9, 2015 at 9:41

POSIX 7

kennytm quoted man bash , but most of that is also POSIX 7: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04 :

The redirection operators "<<" and "<<-" both allow redirection of lines contained in a shell input file, known as a "here-document", to the input of a command.

The here-document shall be treated as a single word that begins after the next and continues until there is a line containing only the delimiter and a , with no characters in between. Then the next here-document starts, if there is one. The format is as follows:

[n]<<word
    here-document
delimiter

where the optional n represents the file descriptor number. If the number is omitted, the here-document refers to standard input (file descriptor 0).

If any character in word is quoted, the delimiter shall be formed by performing quote removal on word, and the here-document lines shall not be expanded. Otherwise, the delimiter shall be the word itself.

If no characters in word are quoted, all lines of the here-document shall be expanded for parameter expansion, command substitution, and arithmetic expansion. In this case, the in the input behaves as the inside double-quotes (see Double-Quotes). However, the double-quote character ( '"' ) shall not be treated specially within a here-document, except when the double-quote appears within "$()", "``", or "${}".

If the redirection symbol is "<<-", all leading <tab> characters shall be stripped from input lines and the line containing the trailing delimiter. If more than one "<<" or "<<-" operator is specified on a line, the here-document associated with the first operator shall be supplied first by the application and shall be read first by the shell.

When a here-document is read from a terminal device and the shell is interactive, it shall write the contents of the variable PS2, processed as described in Shell Variables, to standard error before reading each line of input until the delimiter has been recognized.

Examples

Some examples not yet given.

Quotes prevent parameter expansion

Without quotes:

a=0
cat <<EOF
$a
EOF

Output:

0

With quotes:

a=0
cat <<'EOF'
$a
EOF

or (ugly but valid):

a=0
cat <<E"O"F
$a
EOF

Outputs:

$a
Hyphen removes leading tabs

Without hyphen:

cat <<EOF
<tab>a
EOF

where <tab> is a literal tab, and can be inserted with Ctrl + V <tab>

Output:

<tab>a

With hyphen:

cat <<-EOF
<tab>a
<tab>EOF

Output:

a

This exists of course so that you can indent your cat like the surrounding code, which is easier to read and maintain. E.g.:

if true; then
    cat <<-EOF
    a
    EOF
fi

Unfortunately, this does not work for space characters: POSIX favored tab indentation here. Yikes.

David C. Rankin ,Aug 12, 2015 at 7:10

In your last example discussing <<- and <tab>a , it should be noted that the purpose was to allow normal indentation of code within the script while allowing heredoc text presented to the receiving process to begin in column 0. It is a not too commonly seen feature and a bit more context may prevent a good deal of head-scratching... – David C. Rankin Aug 12 '15 at 7:10

Ciro Santilli 新疆改造中心 六四事件 法轮功 ,Aug 12, 2015 at 8:22

@DavidC.Rankin updated to clarify that, thanks. – Ciro Santilli 新疆改造中心 六四事件 法轮功 Aug 12 '15 at 8:22

Jeanmichel Cote ,Sep 23, 2015 at 19:58

How should i escape expension if some of the content in between my EOF tags needs to be expanded and some don't? – Jeanmichel Cote Sep 23 '15 at 19:58

Jeanmichel Cote ,Sep 23, 2015 at 20:00

...just use the backslash in front of the $Jeanmichel Cote Sep 23 '15 at 20:00

Ciro Santilli 新疆改造中心 六四事件 法轮功 ,Sep 23, 2015 at 20:01

@JeanmichelCote I don't see a better option :-) With regular strings you can also consider mixing up quotes like "$a"'$b'"$c" , but there is no analogue here AFAIK. – Ciro Santilli 新疆改造中心 六四事件 法轮功 Sep 23 '15 at 20:01

Andreas Maier ,Feb 13, 2017 at 12:14

Using tee instead of cat

Not exactly as an answer to the original question, but I wanted to share this anyway: I had the need to create a config file in a directory that required root rights.

The following does not work for that case:

$ sudo cat <<EOF >/etc/somedir/foo.conf
# my config file
foo=bar
EOF

because the redirection is handled outside of the sudo context.

I ended up using this instead:

$ sudo tee <<EOF /etc/somedir/foo.conf >/dev/null
# my config file
foo=bar
EOF

user9048395

add a comment ,Jun 6, 2018 at 0:15
This isn't necessarily an answer to the original question, but a sharing of some results from my own testing. This:
<<test > print.sh
#!/bin/bash
echo \$PWD
echo $PWD
test

will produce the same file as:

cat <<test > print.sh
#!/bin/bash
echo \$PWD
echo $PWD
test

So, I don't see the point of using the cat command.

> ,Dec 19, 2013 at 21:40

Worth noting that here docs work in bash loops too. This example shows how-to get the column list of table:
export postgres_db_name='my_db'
export table_name='my_table_name'

# start copy 
while read -r c; do test -z "$c" || echo $table_name.$c , ; done < <(cat << EOF | psql -t -q -d $postgres_db_name -v table_name="${table_name:-}"
SELECT column_name
FROM information_schema.columns
WHERE 1=1
AND table_schema = 'public'
AND table_name   =:'table_name'  ;
EOF
)
# stop copy , now paste straight into the bash shell ...

output: 
my_table_name.guid ,
my_table_name.id ,
my_table_name.level ,
my_table_name.seq ,

or even without the new line

while read -r c; do test -z "$c" || echo $table_name.$c , | perl -ne 
's/\n//gm;print' ; done < <(cat << EOF | psql -t -q -d $postgres_db_name -v table_name="${table_name:-}"
 SELECT column_name
 FROM information_schema.columns
 WHERE 1=1
 AND table_schema = 'public'
 AND table_name   =:'table_name'  ;
 EOF
 )

 # output: daily_issues.guid ,daily_issues.id ,daily_issues.level ,daily_issues.seq ,daily_issues.prio ,daily_issues.weight ,daily_issues.status ,daily_issues.category ,daily_issues.name ,daily_issues.description ,daily_issues.type ,daily_issues.owner

[Jan 03, 2019] Using Lua for working with excel - Stack Overflow

Jan 03, 2019 | stackoverflow.com

Using Lua for working with excel Ask Question 2


Animesh ,Oct 14, 2009 at 12:04

I am planning to learn Lua for my desktop scripting needs. I want to know if there is any documentation available and also if there are all the things needed in the Standard Lib.

uroc ,Oct 14, 2009 at 12:09

You should check out Lua for Windows -- a 'batteries included environment' for the Lua scripting language on Windows

http://luaforwindows.luaforge.net/

It includes the LuaCOM library, from which you can access the Excel COM object.

Try looking at the LuaCOM documentation, there are some Excel examples in that:

http://www.tecgraf.puc-rio.br/~rcerq/luacom/pub/1.3/luacom-htmldoc/

I've only ever used this for very simplistic things. Here is a sample to get you started:

-- test.lua
require('luacom')
excel = luacom.CreateObject("Excel.Application")
excel.Visible = true
wb = excel.Workbooks:Add()
ws = wb.Worksheets(1)

for i=1, 20 do
    ws.Cells(i,1).Value2 = i
end

Animesh ,Oct 14, 2009 at 12:26

Thanks uroc for your quick response. If possible, please let me know of any beginner tutorial or atleast some sample code for using COM programming via Lua. :) – Animesh Oct 14 '09 at 12:26

sagasw ,Oct 16, 2009 at 1:02

More complex code example for lua working with excel:
require "luacom"

excel = luacom.CreateObject("Excel.Application")

local book  = excel.Workbooks:Add()
local sheet = book.Worksheets(1)

excel.Visible = true

for row=1, 30 do
  for col=1, 30 do
    sheet.Cells(row, col).Value2 = math.floor(math.random() * 100)
  end
end


local range = sheet:Range("A1")

for row=1, 30 do
  for col=1, 30 do
    local v = sheet.Cells(row, col).Value2

    if v > 50 then
        local cell = range:Offset(row-1, col-1)

        cell:Select()
        excel.Selection.Interior.Color = 65535
    end
  end
end

excel.DisplayAlerts = false
excel:Quit()
excel = nil

Another example, could add a graph chart.

require "luacom"

excel = luacom.CreateObject("Excel.Application")

local book  = excel.Workbooks:Add()
local sheet = book.Worksheets(1)

excel.Visible = true

for row=1, 30 do
  sheet.Cells(row, 1).Value2 = math.floor(math.random() * 100)
end

local chart = excel.Charts:Add()
chart.ChartType = 4  --  xlLine

local range = sheet:Range("A1:A30")
chart:SetSourceData(range)

Incredulous Monk ,Oct 19, 2009 at 4:17

A quick suggestion: fragments of code will look better if you format them as code (use the little "101 010" button). – Incredulous Monk Oct 19 '09 at 4:17

[Jan 01, 2019] mc - How can I set the default (user defined) listing mode in Midnight Commander- - Unix Linux Stack Exchange

Jan 01, 2019 | unix.stackexchange.com

Ask Question 0

papaiatis ,Jul 14, 2016 at 11:51

I defined my own listing mode and I'd like to make it permanent so that on the next mc start my defined listing mode will be set. I found no configuration file for mc.

,

You have probably Auto save setup turned off in Options->Configuration menu.

You can save the configuration manually by Options->Save setup .

Panels setup is saved to ~/.config/mc/panels.ini .

[Dec 05, 2018] How to make putty ssh connection never to timeout when user is idle?

Dec 05, 2018 | askubuntu.com

David MZ ,Feb 13, 2013 at 18:07

I have a Ubuntu 12.04 server I bought, if I connect with putty using ssh and a sudoer user putty gets disconnected by the server after some time if I am idle How do I configure Ubuntu to keep this connection alive indefinitely?

das Keks ,Feb 13, 2013 at 18:24

If you go to your putty settings -> Connection and set the value of "Seconds between keepalives" to 30 seconds this should solve your problem.

kokbira ,Feb 19 at 11:42

?????? "0 to turn off" or 30 to turn off????????? I think he must put 0 instead of 30! – kokbira Feb 19 at 11:42

das Keks ,Feb 19 at 11:46

No, it's the time between keepalives. If you set it to 0, no keepalives are sent but you want putty to send keepalives to keep the connection alive. – das Keks Feb 19 at 11:46

Aaron ,Mar 19 at 20:39

I did this but still it drops.. – Aaron Mar 19 at 20:39

0xC0000022L ,Feb 13, 2013 at 19:29

In addition to the answer from "das Keks" there is at least one other aspect that can affect this behavior. Bash (usually the default shell on Ubuntu) has a value TMOUT which governs (decimal value in seconds) after which time an idle shell session will time out and the user will be logged out, leading to a disconnect in an SSH session.

In addition I would strongly recommend that you do something else entirely. Set up byobu (or even just tmux alone as it's superior to GNU screen ) and always log in and attach to a preexisting session (that's GNU screen and tmux terminology). This way even if you get forcibly disconnected - let's face it, a power outage or network interruption can always happen - you can always resume your work where you left. And that works across different machines. So you can connect to the same session from another machine (e.g. from home). The possibilities are manifold and it's a true productivity booster. And not to forget, terminal multiplexers overcome one of the big disadvantages of PuTTY: no tabbed interface. Now you get "tabs" in the form of windows and panes inside GNU screen and tmux .

apt-get install tmux
apt-get install byobu

Byobu is a nice frontend to both terminal multiplexers, but tmux is so comfortable that in my opinion it obsoletes byobu to a large extent. So my recommendation would be tmux .

Also search for "dotfiles", in particular tmux.conf and .tmux.conf on the web for many good customizations to get you started.

Rajesh ,Mar 19, 2015 at 15:10

Go to PuTTy options --> Connection
  1. Change the default value for "Seconds between keepalives(0 to turn off)" : from 0 to 600 (10 minutes) --This varies...reduce if 10 minutes doesn't help
  2. Check the "Enable TCP_keepalives (SO_KEEPALIVE option)" check box.
  3. Finally save setting for session

,

I keep my PuTTY sessions alive by monitoring the cron logs
tail -f /var/log/cron

I want the PuTTY session alive because I'm proxying through socks.

[Dec 05, 2018] How can I scroll up to see the past output in PuTTY?

Dec 05, 2018 | superuser.com

Ask Question up vote 3 down vote favorite 1

user1721949 ,Dec 12, 2012 at 8:32

I have a script which, when I run it from PuTTY, it scrolls the screen. Now, I want to go back to see the errors, but when I scroll up, I can see the past commands, but not the output of the command.

How can I see the past output?

Rico ,Dec 13, 2012 at 8:24

Shift+Pgup/PgDn should work for scrolling without using the scrollbar.

> ,Jul 12, 2017 at 21:45

If shift pageup/pagedown fails, try this command: "reset", which seems to correct the display. – user530079 Jul 12 '17 at 21:45

RedGrittyBrick ,Dec 12, 2012 at 9:31

If you don't pipe the output of your commands into something like less , you will be able to use Putty's scroll-bars to view earlier output.

Putty has settings for how many lines of past output it retains in it's buffer.


before scrolling

after scrolling back (upwards)

If you use something like less the output doesn't get into Putty's scroll buffer


after using less

David Dai ,Dec 14, 2012 at 3:31

why is putty different with the native linux console at this point? – David Dai Dec 14 '12 at 3:31

konradstrack ,Dec 12, 2012 at 9:52

I would recommend using screen if you want to have good control over the scroll buffer on a remote shell.

You can change the scroll buffer size to suit your needs by setting:

defscrollback 4000

in ~/.screenrc , which will specify the number of lines you want to be buffered (4000 in this case).

Then you should run your script in a screen session, e.g. by executing screen ./myscript.sh or first executing screen and then ./myscript.sh inside the session.

It's also possible to enable logging of the console output to a file. You can find more info on the screen's man page .

,

From your descript, it sounds like the "problem" is that you are using screen, tmux, or another window manager dependent on them (byobu). Normally you should be able to scroll back in putty with no issue. Exceptions include if you are in an application like less or nano that creates it's own "window" on the terminal.

With screen and tmux you can generally scroll back with SHIFT + PGUP (same as you could from the physical terminal of the remote machine). They also both have a "copy" mode that frees the cursor from the prompt and lets you use arrow keys to move it around (for selecting text to copy with just the keyboard). It also lets you scroll up and down with the PGUP and PGDN keys. Copy mode under byobu using screen or tmux backends is accessed by pressing F7 (careful, F6 disconnects the session). To do so directly under screen you press CTRL + a then ESC or [ . You can use ESC to exit copy mode. Under tmux you press CTRL + b then [ to enter copy mode and ] to exit.

The simplest solution, of course, is not to use either. I've found both to be quite a bit more trouble than they are worth. If you would like to use multiple different terminals on a remote machine simply connect with multiple instances of putty and manage your windows using, er... Windows. Now forgive me but I must flee before I am burned at the stake for my heresy.

EDIT: almost forgot, some keys may not be received correctly by the remote terminal if putty has not been configured correctly. In your putty config check Terminal -> Keyboard . You probably want the function keys and keypad set to be either Linux or Xterm R6 . If you are seeing strange characters on the terminal when attempting the above this is most likely the problem.

[Dec 01, 2018] Lua editors WoWWiki FANDOM powered by Wikia

Dec 01, 2018 | wowwiki.wikia.com

[Nov 13, 2018] Resuming rsync partial (-P/--partial) on a interrupted transfer

Notable quotes:
"... should ..."
May 15, 2013 | stackoverflow.com

Glitches , May 15, 2013 at 18:06

I am trying to backup my file server to a remove file server using rsync. Rsync is not successfully resuming when a transfer is interrupted. I used the partial option but rsync doesn't find the file it already started because it renames it to a temporary file and when resumed it creates a new file and starts from beginning.

Here is my command:

rsync -avztP -e "ssh -p 2222" /volume1/ myaccont@backup-server-1:/home/myaccount/backup/ --exclude "@spool" --exclude "@tmp"

When this command is ran, a backup file named OldDisk.dmg from my local machine get created on the remote machine as something like .OldDisk.dmg.SjDndj23 .

Now when the internet connection gets interrupted and I have to resume the transfer, I have to find where rsync left off by finding the temp file like .OldDisk.dmg.SjDndj23 and rename it to OldDisk.dmg so that it sees there already exists a file that it can resume.

How do I fix this so I don't have to manually intervene each time?

Richard Michael , Nov 6, 2013 at 4:26

TL;DR : Use --timeout=X (X in seconds) to change the default rsync server timeout, not --inplace .

The issue is the rsync server processes (of which there are two, see rsync --server ... in ps output on the receiver) continue running, to wait for the rsync client to send data.

If the rsync server processes do not receive data for a sufficient time, they will indeed timeout, self-terminate and cleanup by moving the temporary file to it's "proper" name (e.g., no temporary suffix). You'll then be able to resume.

If you don't want to wait for the long default timeout to cause the rsync server to self-terminate, then when your internet connection returns, log into the server and clean up the rsync server processes manually. However, you must politely terminate rsync -- otherwise, it will not move the partial file into place; but rather, delete it (and thus there is no file to resume). To politely ask rsync to terminate, do not SIGKILL (e.g., -9 ), but SIGTERM (e.g., pkill -TERM -x rsync - only an example, you should take care to match only the rsync processes concerned with your client).

Fortunately there is an easier way: use the --timeout=X (X in seconds) option; it is passed to the rsync server processes as well.

For example, if you specify rsync ... --timeout=15 ... , both the client and server rsync processes will cleanly exit if they do not send/receive data in 15 seconds. On the server, this means moving the temporary file into position, ready for resuming.

I'm not sure of the default timeout value of the various rsync processes will try to send/receive data before they die (it might vary with operating system). In my testing, the server rsync processes remain running longer than the local client. On a "dead" network connection, the client terminates with a broken pipe (e.g., no network socket) after about 30 seconds; you could experiment or review the source code. Meaning, you could try to "ride out" the bad internet connection for 15-20 seconds.

If you do not clean up the server rsync processes (or wait for them to die), but instead immediately launch another rsync client process, two additional server processes will launch (for the other end of your new client process). Specifically, the new rsync client will not re-use/reconnect to the existing rsync server processes. Thus, you'll have two temporary files (and four rsync server processes) -- though, only the newer, second temporary file has new data being written (received from your new rsync client process).

Interestingly, if you then clean up all rsync server processes (for example, stop your client which will stop the new rsync servers, then SIGTERM the older rsync servers, it appears to merge (assemble) all the partial files into the new proper named file. So, imagine a long running partial copy which dies (and you think you've "lost" all the copied data), and a short running re-launched rsync (oops!).. you can stop the second client, SIGTERM the first servers, it will merge the data, and you can resume.

Finally, a few short remarks:

JamesTheAwesomeDude , Dec 29, 2013 at 16:50

Just curious: wouldn't SIGINT (aka ^C ) be 'politer' than SIGTERM ? – JamesTheAwesomeDude Dec 29 '13 at 16:50

Richard Michael , Dec 29, 2013 at 22:34

I didn't test how the server-side rsync handles SIGINT, so I'm not sure it will keep the partial file - you could check. Note that this doesn't have much to do with Ctrl-c ; it happens that your terminal sends SIGINT to the foreground process when you press Ctrl-c , but the server-side rsync has no controlling terminal. You must log in to the server and use kill . The client-side rsync will not send a message to the server (for example, after the client receives SIGINT via your terminal Ctrl-c ) - might be interesting though. As for anthropomorphizing, not sure what's "politer". :-) – Richard Michael Dec 29 '13 at 22:34

d-b , Feb 3, 2015 at 8:48

I just tried this timeout argument rsync -av --delete --progress --stats --human-readable --checksum --timeout=60 --partial-dir /tmp/rsync/ rsync://$remote:/ /src/ but then it timed out during the "receiving file list" phase (which in this case takes around 30 minutes). Setting the timeout to half an hour so kind of defers the purpose. Any workaround for this? – d-b Feb 3 '15 at 8:48

Cees Timmerman , Sep 15, 2015 at 17:10

@user23122 --checksum reads all data when preparing the file list, which is great for many small files that change often, but should be done on-demand for large files. – Cees Timmerman Sep 15 '15 at 17:10

[Nov 08, 2018] How to find which process is regularly writing to disk?

Notable quotes:
"... tick...tick...tick...trrrrrr ..."
"... /var/log/syslog ..."
Nov 08, 2018 | unix.stackexchange.com

Cedric Martin , Jul 27, 2012 at 4:31

How can I find which process is constantly writing to disk?

I like my workstation to be close to silent and I just build a new system (P8B75-M + Core i5 3450s -- the 's' because it has a lower max TDP) with quiet fans etc. and installed Debian Wheezy 64-bit on it.

And something is getting on my nerve: I can hear some kind of pattern like if the hard disk was writing or seeking someting ( tick...tick...tick...trrrrrr rinse and repeat every second or so).

In the past I had a similar issue in the past (many, many years ago) and it turned out it was some CUPS log or something and I simply redirected that one (not important) logging to a (real) RAM disk.

But here I'm not sure.

I tried the following:

ls -lR /var/log > /tmp/a.tmp && sleep 5 && ls -lR /var/log > /tmp/b.tmp && diff /tmp/?.tmp

but nothing is changing there.

Now the strange thing is that I also hear the pattern when the prompt asking me to enter my LVM decryption passphrase is showing.

Could it be something in the kernel/system I just installed or do I have a faulty harddisk?

hdparm -tT /dev/sda report a correct HD speed (130 GB/s non-cached, sata 6GB) and I've already installed and compiled from big sources (Emacs) without issue so I don't think the system is bad.

(HD is a Seagate Barracude 500GB)

Mat , Jul 27, 2012 at 6:03

Are you sure it's a hard drive making that noise, and not something else? (Check the fans, including PSU fan. Had very strange clicking noises once when a very thin cable was too close to a fan and would sometimes very slightly touch the blades and bounce for a few "clicks"...) – Mat Jul 27 '12 at 6:03

Cedric Martin , Jul 27, 2012 at 7:02

@Mat: I'll take the hard drive outside of the case (the connectors should be long enough) to be sure and I'll report back ; ) – Cedric Martin Jul 27 '12 at 7:02

camh , Jul 27, 2012 at 9:48

Make sure your disk filesystems are mounted relatime or noatime. File reads can be causing writes to inodes to record the access time. – camh Jul 27 '12 at 9:48

mnmnc , Jul 27, 2012 at 8:27

Did you tried to examin what programs like iotop is showing? It will tell you exacly what kind of process is currently writing to the disk.

example output:

Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    6 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
    7 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
    8 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
 1033 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [flush-8:0]
   10 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]

Cedric Martin , Aug 2, 2012 at 15:56

thanks for that tip. I didn't know about iotop . On Debian I did an apt-cache search iotop to find out that I had to apt-get iotop . Very cool command! – Cedric Martin Aug 2 '12 at 15:56

ndemou , Jun 20, 2016 at 15:32

I use iotop -o -b -d 10 which every 10secs prints a list of processes that read/wrote to disk and the amount of IO bandwidth used. – ndemou Jun 20 '16 at 15:32

scai , Jul 27, 2012 at 10:48

You can enable IO debugging via echo 1 > /proc/sys/vm/block_dump and then watch the debugging messages in /var/log/syslog . This has the advantage of obtaining some type of log file with past activities whereas iotop only shows the current activity.

dan3 , Jul 15, 2013 at 8:32

It is absolutely crazy to leave sysloging enabled when block_dump is active. Logging causes disk activity, which causes logging, which causes disk activity etc. Better stop syslog before enabling this (and use dmesg to read the messages) – dan3 Jul 15 '13 at 8:32

scai , Jul 16, 2013 at 6:32

You are absolutely right, although the effect isn't as dramatic as you describe it. If you just want to have a short peek at the disk activity there is no need to stop the syslog daemon. – scai Jul 16 '13 at 6:32

dan3 , Jul 16, 2013 at 7:22

I've tried it about 2 years ago and it brought my machine to a halt. One of these days when I have nothing important running I'll try it again :) – dan3 Jul 16 '13 at 7:22

scai , Jul 16, 2013 at 10:50

I tried it, nothing really happened. Especially because of file system buffering. A write to syslog doesn't immediately trigger a write to disk. – scai Jul 16 '13 at 10:50

Volker Siegel , Apr 16, 2014 at 22:57

I would assume there is rate general rate limiting in place for the log messages, which handles this case too(?) – Volker Siegel Apr 16 '14 at 22:57

Gilles , Jul 28, 2012 at 1:34

Assuming that the disk noises are due to a process causing a write and not to some disk spindown problem , you can use the audit subsystem (install the auditd package ). Put a watch on the sync calls and its friends:
auditctl -S sync -S fsync -S fdatasync -a exit,always

Watch the logs in /var/log/audit/audit.log . Be careful not to do this if the audit logs themselves are flushed! Check in /etc/auditd.conf that the flush option is set to none .

If files are being flushed often, a likely culprit is the system logs. For example, if you log failed incoming connection attempts and someone is probing your machine, that will generate a lot of entries; this can cause a disk to emit machine gun-style noises. With the basic log daemon sysklogd, check /etc/syslog.conf : if a log file name is not be preceded by - , then that log is flushed to disk after each write.

Gilles , Mar 23 at 18:24

@StephenKitt Huh. No. The asker mentioned Debian so I've changed it to a link to the Debian package. – Gilles Mar 23 at 18:24

cas , Jul 27, 2012 at 9:40

It might be your drives automatically spinning down, lots of consumer-grade drives do that these days. Unfortunately on even a lightly loaded system, this results in the drives constantly spinning down and then spinning up again, especially if you're running hddtemp or similar to monitor the drive temperature (most drives stupidly don't let you query the SMART temperature value without spinning up the drive - cretinous!).

This is not only annoying, it can wear out the drives faster as many drives have only a limited number of park cycles. e.g. see https://bugs.launchpad.net/ubuntu/+source/hdparm/+bug/952556 for a description of the problem.

I disable idle-spindown on all my drives with the following bit of shell code. you could put it in an /etc/rc.boot script, or in /etc/rc.local or similar.

for disk in /dev/sd? ; do
  /sbin/hdparm -q -S 0 "/dev/$disk"
done

Cedric Martin , Aug 2, 2012 at 16:03

that you can't query SMART readings without spinning up the drive leaves me speechless :-/ Now obviously the "spinning down" issue can become quite complicated. Regarding disabling the spinning down: wouldn't that in itself cause the HD to wear out faster? I mean: it's never ever "resting" as long as the system is on then? – Cedric Martin Aug 2 '12 at 16:03

cas , Aug 2, 2012 at 21:42

IIRC you can query some SMART values without causing the drive to spin up, but temperature isn't one of them on any of the drives i've tested (incl models from WD, Seagate, Samsung, Hitachi). Which is, of course, crazy because concern over temperature is one of the reasons for idling a drive. re: wear: AIUI 1. constant velocity is less wearing than changing speed. 2. the drives have to park the heads in a safe area and a drive is only rated to do that so many times (IIRC up to a few hundred thousand - easily exceeded if the drive is idling and spinning up every few seconds) – cas Aug 2 '12 at 21:42

Micheal Johnson , Mar 12, 2016 at 20:48

It's a long debate regarding whether it's better to leave drives running or to spin them down. Personally I believe it's best to leave them running - I turn my computer off at night and when I go out but other than that I never spin my drives down. Some people prefer to spin them down, say, at night if they're leaving the computer on or if the computer's idle for a long time, and in such cases the advantage of spinning them down for a few hours versus leaving them running is debatable. What's never good though is when the hard drive repeatedly spins down and up again in a short period of time. – Micheal Johnson Mar 12 '16 at 20:48

Micheal Johnson , Mar 12, 2016 at 20:51

Note also that spinning the drive down after it's been idle for a few hours is a bit silly, because if it's been idle for a few hours then it's likely to be used again within an hour. In that case, it would seem better to spin the drive down promptly if it's idle (like, within 10 minutes), but it's also possible for the drive to be idle for a few minutes when someone is using the computer and is likely to need the drive again soon. – Micheal Johnson Mar 12 '16 at 20:51

,

I just found that s.m.a.r.t was causing an external USB disk to spin up again and again on my raspberry pi. Although SMART is generally a good thing, I decided to disable it again and since then it seems that unwanted disk activity has stopped

[Nov 08, 2018] Determining what process is bound to a port

Mar 14, 2011 | unix.stackexchange.com
I know that using the command:
lsof -i TCP

(or some variant of parameters with lsof) I can determine which process is bound to a particular port. This is useful say if I'm trying to start something that wants to bind to 8080 and some else is already using that port, but I don't know what.

Is there an easy way to do this without using lsof? I spend time working on many systems and lsof is often not installed.

Cakemox , Mar 14, 2011 at 20:48

netstat -lnp will list the pid and process name next to each listening port. This will work under Linux, but not all others (like AIX.) Add -t if you want TCP only.
# netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:24800           0.0.0.0:*               LISTEN      27899/synergys
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN      3361/python
tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      2264/mysqld
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      22964/apache2
tcp        0      0 192.168.99.1:53         0.0.0.0:*               LISTEN      3389/named
tcp        0      0 192.168.88.1:53         0.0.0.0:*               LISTEN      3389/named

etc.

xxx , Mar 14, 2011 at 21:01

Cool, thanks. Looks like that that works under RHEL, but not under Solaris (as you indicated). Anybody know if there's something similar for Solaris? – user5721 Mar 14 '11 at 21:01

Rich Homolka , Mar 15, 2011 at 19:56

netstat -p above is my vote. also look at lsof . – Rich Homolka Mar 15 '11 at 19:56

Jonathan , Aug 26, 2014 at 18:50

As an aside, for windows it's similar: netstat -aon | more – Jonathan Aug 26 '14 at 18:50

sudo , May 25, 2017 at 2:24

What about for SCTP? – sudo May 25 '17 at 2:24

frielp , Mar 15, 2011 at 13:33

On AIX, netstat & rmsock can be used to determine process binding:
[root@aix] netstat -Ana|grep LISTEN|grep 80
f100070000280bb0 tcp4       0      0  *.37               *.*        LISTEN
f1000700025de3b0 tcp        0      0  *.80               *.*        LISTEN
f1000700002803b0 tcp4       0      0  *.111              *.*        LISTEN
f1000700021b33b0 tcp4       0      0  127.0.0.1.32780    *.*        LISTEN

# Port 80 maps to f1000700025de3b0 above, so we type:
[root@aix] rmsock f1000700025de3b0 tcpcb
The socket 0x25de008 is being held by process 499790 (java).

Olivier Dulac , Sep 18, 2013 at 4:05

Thanks for this! Is there a way, however, to just display what process listen on the socket (instead of using rmsock which attempt to remove it) ? – Olivier Dulac Sep 18 '13 at 4:05

Vitor Py , Sep 26, 2013 at 14:18

@OlivierDulac: "Unlike what its name implies, rmsock does not remove the socket, if it is being used by a process. It just reports the process holding the socket." ( ibm.com/developerworks/community/blogs/cgaix/entry/ ) – Vitor Py Sep 26 '13 at 14:18

Olivier Dulac , Sep 26, 2013 at 16:00

@vitor-braga: Ah thx! I thought it was trying but just said which process holds in when it couldn't remove it. Apparently it doesn't even try to remove it when a process holds it. That's cool! Thx! – Olivier Dulac Sep 26 '13 at 16:00

frielp , Mar 15, 2011 at 13:27

Another tool available on Linux is ss . From the ss man page on Fedora:
NAME
       ss - another utility to investigate sockets
SYNOPSIS
       ss [options] [ FILTER ]
DESCRIPTION
       ss is used to dump socket statistics. It allows showing information 
       similar to netstat. It can display more TCP and state informations  
       than other tools.

Example output below - the final column shows the process binding:

[root@box] ss -ap
State      Recv-Q Send-Q      Local Address:Port          Peer Address:Port
LISTEN     0      128                    :::http                    :::*        users:(("httpd",20891,4),("httpd",20894,4),("httpd",20895,4),("httpd",20896,4)
LISTEN     0      128             127.0.0.1:munin                    *:*        users:(("munin-node",1278,5))
LISTEN     0      128                    :::ssh                     :::*        users:(("sshd",1175,4))
LISTEN     0      128                     *:ssh                      *:*        users:(("sshd",1175,3))
LISTEN     0      10              127.0.0.1:smtp                     *:*        users:(("sendmail",1199,4))
LISTEN     0      128             127.0.0.1:x11-ssh-offset                  *:*        users:(("sshd",25734,8))
LISTEN     0      128                   ::1:x11-ssh-offset                 :::*        users:(("sshd",25734,7))

Eugen Constantin Dinca , Mar 14, 2011 at 23:47

For Solaris you can use pfiles and then grep by sockname: or port: .

A sample (from here ):

pfiles `ptree | awk '{print $1}'` | egrep '^[0-9]|port:'

rickumali , May 8, 2011 at 14:40

I was once faced with trying to determine what process was behind a particular port (this time it was 8000). I tried a variety of lsof and netstat, but then took a chance and tried hitting the port via a browser (i.e. http://hostname:8000/ ). Lo and behold, a splash screen greeted me, and it became obvious what the process was (for the record, it was Splunk ).

One more thought: "ps -e -o pid,args" (YMMV) may sometimes show the port number in the arguments list. Grep is your friend!

Gilles , Oct 8, 2015 at 21:04

In the same vein, you could telnet hostname 8000 and see if the server prints a banner. However, that's mostly useful when the server is running on a machine where you don't have shell access, and then finding the process ID isn't relevant. – Gilles May 8 '11 at 14:45

[Nov 08, 2018] How to find which process is regularly writing to disk?

Notable quotes:
"... tick...tick...tick...trrrrrr ..."
"... /var/log/syslog ..."
Jul 27, 2012 | unix.stackexchange.com

Cedric Martin , Jul 27, 2012 at 4:31

How can I find which process is constantly writing to disk?

I like my workstation to be close to silent and I just build a new system (P8B75-M + Core i5 3450s -- the 's' because it has a lower max TDP) with quiet fans etc. and installed Debian Wheezy 64-bit on it.

And something is getting on my nerve: I can hear some kind of pattern like if the hard disk was writing or seeking someting ( tick...tick...tick...trrrrrr rinse and repeat every second or so).

In the past I had a similar issue in the past (many, many years ago) and it turned out it was some CUPS log or something and I simply redirected that one (not important) logging to a (real) RAM disk.

But here I'm not sure.

I tried the following:

ls -lR /var/log > /tmp/a.tmp && sleep 5 && ls -lR /var/log > /tmp/b.tmp && diff /tmp/?.tmp

but nothing is changing there.

Now the strange thing is that I also hear the pattern when the prompt asking me to enter my LVM decryption passphrase is showing.

Could it be something in the kernel/system I just installed or do I have a faulty harddisk?

hdparm -tT /dev/sda report a correct HD speed (130 GB/s non-cached, sata 6GB) and I've already installed and compiled from big sources (Emacs) without issue so I don't think the system is bad.

(HD is a Seagate Barracude 500GB)

Mat , Jul 27, 2012 at 6:03

Are you sure it's a hard drive making that noise, and not something else? (Check the fans, including PSU fan. Had very strange clicking noises once when a very thin cable was too close to a fan and would sometimes very slightly touch the blades and bounce for a few "clicks"...) – Mat Jul 27 '12 at 6:03

Cedric Martin , Jul 27, 2012 at 7:02

@Mat: I'll take the hard drive outside of the case (the connectors should be long enough) to be sure and I'll report back ; ) – Cedric Martin Jul 27 '12 at 7:02

camh , Jul 27, 2012 at 9:48

Make sure your disk filesystems are mounted relatime or noatime. File reads can be causing writes to inodes to record the access time. – camh Jul 27 '12 at 9:48

mnmnc , Jul 27, 2012 at 8:27

Did you tried to examin what programs like iotop is showing? It will tell you exacly what kind of process is currently writing to the disk.

example output:

Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    6 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
    7 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
    8 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
 1033 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [flush-8:0]
   10 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]

Cedric Martin , Aug 2, 2012 at 15:56

thanks for that tip. I didn't know about iotop . On Debian I did an apt-cache search iotop to find out that I had to apt-get iotop . Very cool command! – Cedric Martin Aug 2 '12 at 15:56

ndemou , Jun 20, 2016 at 15:32

I use iotop -o -b -d 10 which every 10secs prints a list of processes that read/wrote to disk and the amount of IO bandwidth used. – ndemou Jun 20 '16 at 15:32

scai , Jul 27, 2012 at 10:48

You can enable IO debugging via echo 1 > /proc/sys/vm/block_dump and then watch the debugging messages in /var/log/syslog . This has the advantage of obtaining some type of log file with past activities whereas iotop only shows the current activity.

dan3 , Jul 15, 2013 at 8:32

It is absolutely crazy to leave sysloging enabled when block_dump is active. Logging causes disk activity, which causes logging, which causes disk activity etc. Better stop syslog before enabling this (and use dmesg to read the messages) – dan3 Jul 15 '13 at 8:32

scai , Jul 16, 2013 at 6:32

You are absolutely right, although the effect isn't as dramatic as you describe it. If you just want to have a short peek at the disk activity there is no need to stop the syslog daemon. – scai Jul 16 '13 at 6:32

dan3 , Jul 16, 2013 at 7:22

I've tried it about 2 years ago and it brought my machine to a halt. One of these days when I have nothing important running I'll try it again :) – dan3 Jul 16 '13 at 7:22

scai , Jul 16, 2013 at 10:50

I tried it, nothing really happened. Especially because of file system buffering. A write to syslog doesn't immediately trigger a write to disk. – scai Jul 16 '13 at 10:50

Volker Siegel , Apr 16, 2014 at 22:57

I would assume there is rate general rate limiting in place for the log messages, which handles this case too(?) – Volker Siegel Apr 16 '14 at 22:57

Gilles , Jul 28, 2012 at 1:34

Assuming that the disk noises are due to a process causing a write and not to some disk spindown problem , you can use the audit subsystem (install the auditd package ). Put a watch on the sync calls and its friends:
auditctl -S sync -S fsync -S fdatasync -a exit,always

Watch the logs in /var/log/audit/audit.log . Be careful not to do this if the audit logs themselves are flushed! Check in /etc/auditd.conf that the flush option is set to none .

If files are being flushed often, a likely culprit is the system logs. For example, if you log failed incoming connection attempts and someone is probing your machine, that will generate a lot of entries; this can cause a disk to emit machine gun-style noises. With the basic log daemon sysklogd, check /etc/syslog.conf : if a log file name is not be preceded by - , then that log is flushed to disk after each write.

Gilles , Mar 23 at 18:24

@StephenKitt Huh. No. The asker mentioned Debian so I've changed it to a link to the Debian package. – Gilles Mar 23 at 18:24

cas , Jul 27, 2012 at 9:40

It might be your drives automatically spinning down, lots of consumer-grade drives do that these days. Unfortunately on even a lightly loaded system, this results in the drives constantly spinning down and then spinning up again, especially if you're running hddtemp or similar to monitor the drive temperature (most drives stupidly don't let you query the SMART temperature value without spinning up the drive - cretinous!).

This is not only annoying, it can wear out the drives faster as many drives have only a limited number of park cycles. e.g. see https://bugs.launchpad.net/ubuntu/+source/hdparm/+bug/952556 for a description of the problem.

I disable idle-spindown on all my drives with the following bit of shell code. you could put it in an /etc/rc.boot script, or in /etc/rc.local or similar.

for disk in /dev/sd? ; do
  /sbin/hdparm -q -S 0 "/dev/$disk"
done

Cedric Martin , Aug 2, 2012 at 16:03

that you can't query SMART readings without spinning up the drive leaves me speechless :-/ Now obviously the "spinning down" issue can become quite complicated. Regarding disabling the spinning down: wouldn't that in itself cause the HD to wear out faster? I mean: it's never ever "resting" as long as the system is on then? – Cedric Martin Aug 2 '12 at 16:03

cas , Aug 2, 2012 at 21:42

IIRC you can query some SMART values without causing the drive to spin up, but temperature isn't one of them on any of the drives i've tested (incl models from WD, Seagate, Samsung, Hitachi). Which is, of course, crazy because concern over temperature is one of the reasons for idling a drive. re: wear: AIUI 1. constant velocity is less wearing than changing speed. 2. the drives have to park the heads in a safe area and a drive is only rated to do that so many times (IIRC up to a few hundred thousand - easily exceeded if the drive is idling and spinning up every few seconds) – cas Aug 2 '12 at 21:42

Micheal Johnson , Mar 12, 2016 at 20:48

It's a long debate regarding whether it's better to leave drives running or to spin them down. Personally I believe it's best to leave them running - I turn my computer off at night and when I go out but other than that I never spin my drives down. Some people prefer to spin them down, say, at night if they're leaving the computer on or if the computer's idle for a long time, and in such cases the advantage of spinning them down for a few hours versus leaving them running is debatable. What's never good though is when the hard drive repeatedly spins down and up again in a short period of time. – Micheal Johnson Mar 12 '16 at 20:48

Micheal Johnson , Mar 12, 2016 at 20:51

Note also that spinning the drive down after it's been idle for a few hours is a bit silly, because if it's been idle for a few hours then it's likely to be used again within an hour. In that case, it would seem better to spin the drive down promptly if it's idle (like, within 10 minutes), but it's also possible for the drive to be idle for a few minutes when someone is using the computer and is likely to need the drive again soon. – Micheal Johnson Mar 12 '16 at 20:51

,

I just found that s.m.a.r.t was causing an external USB disk to spin up again and again on my raspberry pi. Although SMART is generally a good thing, I decided to disable it again and since then it seems that unwanted disk activity has stopped

[Nov 08, 2018] How to split one string into multiple variables in bash shell? [duplicate]

Nov 08, 2018 | stackoverflow.com
This question already has an answer here:

Rob I , May 9, 2012 at 19:22

For your second question, see @mkb's comment to my answer below - that's definitely the way to go! – Rob I May 9 '12 at 19:22

Dennis Williamson , Jul 4, 2012 at 16:14

See my edited answer for one way to read individual characters into an array. – Dennis Williamson Jul 4 '12 at 16:14

Nick Weedon , Dec 31, 2015 at 11:04

Here is the same thing in a more concise form: var1=$(cut -f1 -d- <<<$STR) – Nick Weedon Dec 31 '15 at 11:04

Rob I , May 9, 2012 at 17:00

If your solution doesn't have to be general, i.e. only needs to work for strings like your example, you could do:
var1=$(echo $STR | cut -f1 -d-)
var2=$(echo $STR | cut -f2 -d-)

I chose cut here because you could simply extend the code for a few more variables...

crunchybutternut , May 9, 2012 at 17:40

Can you look at my post again and see if you have a solution for the followup question? thanks! – crunchybutternut May 9 '12 at 17:40

mkb , May 9, 2012 at 17:59

You can use cut to cut characters too! cut -c1 for example. – mkb May 9 '12 at 17:59

FSp , Nov 27, 2012 at 10:26

Although this is very simple to read and write, is a very slow solution because forces you to read twice the same data ($STR) ... if you care of your script performace, the @anubhava solution is much better – FSp Nov 27 '12 at 10:26

tripleee , Jan 25, 2016 at 6:47

Apart from being an ugly last-resort solution, this has a bug: You should absolutely use double quotes in echo "$STR" unless you specifically want the shell to expand any wildcards in the string as a side effect. See also stackoverflow.com/questions/10067266/tripleee Jan 25 '16 at 6:47

Rob I , Feb 10, 2016 at 13:57

You're right about double quotes of course, though I did point out this solution wasn't general. However I think your assessment is a bit unfair - for some people this solution may be more readable (and hence extensible etc) than some others, and doesn't completely rely on arcane bash feature that wouldn't translate to other shells. I suspect that's why my solution, though less elegant, continues to get votes periodically... – Rob I Feb 10 '16 at 13:57

Dennis Williamson , May 10, 2012 at 3:14

read with IFS are perfect for this:
$ IFS=- read var1 var2 <<< ABCDE-123456
$ echo "$var1"
ABCDE
$ echo "$var2"
123456

Edit:

Here is how you can read each individual character into array elements:

$ read -a foo <<<"$(echo "ABCDE-123456" | sed 's/./& /g')"

Dump the array:

$ declare -p foo
declare -a foo='([0]="A" [1]="B" [2]="C" [3]="D" [4]="E" [5]="-" [6]="1" [7]="2" [8]="3" [9]="4" [10]="5" [11]="6")'

If there are spaces in the string:

$ IFS=$'\v' read -a foo <<<"$(echo "ABCDE 123456" | sed 's/./&\v/g')"
$ declare -p foo
declare -a foo='([0]="A" [1]="B" [2]="C" [3]="D" [4]="E" [5]=" " [6]="1" [7]="2" [8]="3" [9]="4" [10]="5" [11]="6")'

insecure , Apr 30, 2014 at 7:51

Great, the elegant bash-only way, without unnecessary forks. – insecure Apr 30 '14 at 7:51

Martin Serrano , Jan 11 at 4:34

this solution also has the benefit that if delimiter is not present, the var2 will be empty – Martin Serrano Jan 11 at 4:34

mkb , May 9, 2012 at 17:02

If you know it's going to be just two fields, you can skip the extra subprocesses like this:
var1=${STR%-*}
var2=${STR#*-}

What does this do? ${STR%-*} deletes the shortest substring of $STR that matches the pattern -* starting from the end of the string. ${STR#*-} does the same, but with the *- pattern and starting from the beginning of the string. They each have counterparts %% and ## which find the longest anchored pattern match. If anyone has a helpful mnemonic to remember which does which, let me know! I always have to try both to remember.

Jens , Jan 30, 2015 at 15:17

Plus 1 For knowing your POSIX shell features, avoiding expensive forks and pipes, and the absence of bashisms. – Jens Jan 30 '15 at 15:17

Steven Lu , May 1, 2015 at 20:19

Dunno about "absence of bashisms" considering that this is already moderately cryptic .... if your delimiter is a newline instead of a hyphen, then it becomes even more cryptic. On the other hand, it works with newlines , so there's that. – Steven Lu May 1 '15 at 20:19

mkb , Mar 9, 2016 at 17:30

@KErlandsson: done – mkb Mar 9 '16 at 17:30

mombip , Aug 9, 2016 at 15:58

I've finally found documentation for it: Shell-Parameter-Expansionmombip Aug 9 '16 at 15:58

DS. , Jan 13, 2017 at 19:56

Mnemonic: "#" is to the left of "%" on a standard keyboard, so "#" removes a prefix (on the left), and "%" removes a suffix (on the right). – DS. Jan 13 '17 at 19:56

tripleee , May 9, 2012 at 17:57

Sounds like a job for set with a custom IFS .
IFS=-
set $STR
var1=$1
var2=$2

(You will want to do this in a function with a local IFS so you don't mess up other parts of your script where you require IFS to be what you expect.)

Rob I , May 9, 2012 at 19:20

Nice - I knew about $IFS but hadn't seen how it could be used. – Rob I May 9 '12 at 19:20

Sigg3.net , Jun 19, 2013 at 8:08

I used triplee's example and it worked exactly as advertised! Just change last two lines to <pre> myvar1= echo $1 && myvar2= echo $2 </pre> if you need to store them throughout a script with several "thrown" variables. – Sigg3.net Jun 19 '13 at 8:08

tripleee , Jun 19, 2013 at 13:25

No, don't use a useless echo in backticks . – tripleee Jun 19 '13 at 13:25

Daniel Andersson , Mar 27, 2015 at 6:46

This is a really sweet solution if we need to write something that is not Bash specific. To handle IFS troubles, one can add OLDIFS=$IFS at the beginning before overwriting it, and then add IFS=$OLDIFS just after the set line. – Daniel Andersson Mar 27 '15 at 6:46

tripleee , Mar 27, 2015 at 6:58

FWIW the link above is broken. I was lazy and careless. The canonical location still works; iki.fi/era/unix/award.html#echotripleee Mar 27 '15 at 6:58

anubhava , May 9, 2012 at 17:09

Using bash regex capabilities:
re="^([^-]+)-(.*)$"
[[ "ABCDE-123456" =~ $re ]] && var1="${BASH_REMATCH[1]}" && var2="${BASH_REMATCH[2]}"
echo $var1
echo $var2

OUTPUT

ABCDE
123456

Cometsong , Oct 21, 2016 at 13:29

Love pre-defining the re for later use(s)! – Cometsong Oct 21 '16 at 13:29

Archibald , Nov 12, 2012 at 11:03

string="ABCDE-123456"
IFS=- # use "local IFS=-" inside the function
set $string
echo $1 # >>> ABCDE
echo $2 # >>> 123456

tripleee , Mar 27, 2015 at 7:02

Hmmm, isn't this just a restatement of my answer ? – tripleee Mar 27 '15 at 7:02

Archibald , Sep 18, 2015 at 12:36

Actually yes. I just clarified it a bit. – Archibald Sep 18 '15 at 12:36

[Nov 08, 2018] How to split a string in shell and get the last field

Nov 08, 2018 | stackoverflow.com

cd1 , Jul 1, 2010 at 23:29

Suppose I have the string 1:2:3:4:5 and I want to get its last field ( 5 in this case). How do I do that using Bash? I tried cut , but I don't know how to specify the last field with -f .

Stephen , Jul 2, 2010 at 0:05

You can use string operators :
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5

This trims everything from the front until a ':', greedily.

${foo  <-- from variable foo
  ##   <-- greedy front trim
  *    <-- matches anything
  :    <-- until the last ':'
 }

eckes , Jan 23, 2013 at 15:23

While this is working for the given problem, the answer of William below ( stackoverflow.com/a/3163857/520162 ) also returns 5 if the string is 1:2:3:4:5: (while using the string operators yields an empty result). This is especially handy when parsing paths that could contain (or not) a finishing / character. – eckes Jan 23 '13 at 15:23

Dobz , Jun 25, 2014 at 11:44

How would you then do the opposite of this? to echo out '1:2:3:4:'? – Dobz Jun 25 '14 at 11:44

Mihai Danila , Jul 9, 2014 at 14:07

And how does one keep the part before the last separator? Apparently by using ${foo%:*} . # - from beginning; % - from end. # , % - shortest match; ## , %% - longest match. – Mihai Danila Jul 9 '14 at 14:07

Putnik , Feb 11, 2016 at 22:33

If i want to get the last element from path, how should I use it? echo ${pwd##*/} does not work. – Putnik Feb 11 '16 at 22:33

Stan Strum , Dec 17, 2017 at 4:22

@Putnik that command sees pwd as a variable. Try dir=$(pwd); echo ${dir##*/} . Works for me! – Stan Strum Dec 17 '17 at 4:22

a3nm , Feb 3, 2012 at 8:39

Another way is to reverse before and after cut :
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef

This makes it very easy to get the last but one field, or any range of fields numbered from the end.

Dannid , Jan 14, 2013 at 20:50

This answer is nice because it uses 'cut', which the author is (presumably) already familiar. Plus, I like this answer because I am using 'cut' and had this exact question, hence finding this thread via search. – Dannid Jan 14 '13 at 20:50

funroll , Aug 12, 2013 at 19:51

Some cut-and-paste fodder for people using spaces as delimiters: echo "1 2 3 4" | rev | cut -d " " -f1 | revfunroll Aug 12 '13 at 19:51

EdgeCaseBerg , Sep 8, 2013 at 5:01

the rev | cut -d -f1 | rev is so clever! Thanks! Helped me a bunch (my use case was rev | -d ' ' -f 2- | rev – EdgeCaseBerg Sep 8 '13 at 5:01

Anarcho-Chossid , Sep 16, 2015 at 15:54

Wow. Beautiful and dark magic. – Anarcho-Chossid Sep 16 '15 at 15:54

shearn89 , Aug 17, 2017 at 9:27

I always forget about rev , was just what I needed! cut -b20- | rev | cut -b10- | revshearn89 Aug 17 '17 at 9:27

William Pursell , Jul 2, 2010 at 7:09

It's difficult to get the last field using cut, but here's (one set of) solutions in awk and perl
$ echo 1:2:3:4:5 | awk -F: '{print $NF}'
5
$ echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
5

eckes , Jan 23, 2013 at 15:20

great advantage of this solution over the accepted answer: it also matches paths that contain or do not contain a finishing / character: /a/b/c/d and /a/b/c/d/ yield the same result ( d ) when processing pwd | awk -F/ '{print $NF}' . The accepted answer results in an empty result in the case of /a/b/c/d/eckes Jan 23 '13 at 15:20

stamster , May 21 at 11:52

@eckes In case of AWK solution, on GNU bash, version 4.3.48(1)-release that's not true, as it matters whenever you have trailing slash or not. Simply put AWK will use / as delimiter, and if your path is /my/path/dir/ it will use value after last delimiter, which is simply an empty string. So it's best to avoid trailing slash if you need to do such a thing like I do. – stamster May 21 at 11:52

Nicholas M T Elliott , Jul 1, 2010 at 23:39

Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5

Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.

Dennis Williamson , Jul 2, 2010 at 0:05

One way:
var1="1:2:3:4:5"
var2=${var1##*:}

Another, using an array:

var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[@]: -1}

Yet another with an array:

var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[@]}
var2=${var2[$count-1]}

Using Bash (version >= 3.2) regular expressions:

var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}

liuyang1 , Mar 24, 2015 at 6:02

Thanks so much for array style, as I need this feature, but not have cut, awk these utils. – liuyang1 Mar 24 '15 at 6:02

user3133260 , Dec 24, 2013 at 19:04

$ echo "a b c d e" | tr ' ' '\n' | tail -1
e

Simply translate the delimiter into a newline and choose the last entry with tail -1 .

Yajo , Jul 30, 2014 at 10:13

It will fail if the last item contains a \n , but for most cases is the most readable solution. – Yajo Jul 30 '14 at 10:13

Rafael , Nov 10, 2016 at 10:09

Using sed :
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5

$ echo '' | sed 's/.*://' # => (empty)

$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c

$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c

Ab Irato , Nov 13, 2013 at 16:10

If your last field is a single character, you could do this:
a="1:2:3:4:5"

echo ${a: -1}
echo ${a:(-1)}

Check string manipulation in bash .

gniourf_gniourf , Nov 13, 2013 at 16:15

This doesn't work: it gives the last character of a , not the last field . – gniourf_gniourf Nov 13 '13 at 16:15

Ab Irato , Nov 25, 2013 at 13:25

True, that's the idea, if you know the length of the last field it's good. If not you have to use something else... – Ab Irato Nov 25 '13 at 13:25

sphakka , Jan 25, 2016 at 16:24

Interesting, I didn't know of these particular Bash string manipulations. It also resembles to Python's string/array slicing . – sphakka Jan 25 '16 at 16:24

ghostdog74 , Jul 2, 2010 at 1:16

Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo  \$${#}
0

Sopalajo de Arrierez , Dec 24, 2014 at 5:04

I would buy some details about this method, please :-) . – Sopalajo de Arrierez Dec 24 '14 at 5:04

Rafa , Apr 27, 2017 at 22:10

Could have used echo ${!#} instead of eval echo \$${#} . – Rafa Apr 27 '17 at 22:10

Crytis , Dec 7, 2016 at 6:51

echo "a:b:c:d:e"|xargs -d : -n1|tail -1

First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.

BDL , Dec 7, 2016 at 13:47

Although this might solve the problem, one should always add an explanation to it. – BDL Dec 7 '16 at 13:47

Crytis , Jun 7, 2017 at 9:13

already added.. – Crytis Jun 7 '17 at 9:13

021 , Apr 26, 2016 at 11:33

There are many good answers here, but still I want to share this one using basename :
 basename $(echo "a:b:c:d:e" | tr ':' '/')

However it will fail if there are already some '/' in your string . If slash / is your delimiter then you just have to (and should) use basename.

It's not the best answer but it just shows how you can be creative using bash commands.

Nahid Akbar , Jun 22, 2012 at 2:55

for x in `echo $str | tr ";" "\n"`; do echo $x; done

chepner , Jun 22, 2012 at 12:58

This runs into problems if there is whitespace in any of the fields. Also, it does not directly address the question of retrieving the last field. – chepner Jun 22 '12 at 12:58

Christoph Böddeker , Feb 19 at 15:50

For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'

From the pythonpy help: -x treat each row of stdin as x .

With that tool, it is easy to write python code that gets applied to the input.

baz , Nov 24, 2017 at 19:27

a solution using the read builtin
IFS=':' read -a field <<< "1:2:3:4:5"
echo ${field[4]}

[Nov 08, 2018] How do I split a string on a delimiter in Bash?

Notable quotes:
"... Bash shell script split array ..."
"... associative array ..."
"... pattern substitution ..."
"... Debian GNU/Linux ..."
Nov 08, 2018 | stackoverflow.com

stefanB , May 28, 2009 at 2:03

I have this string stored in a variable:
IN="bla@some.com;john@home.com"

Now I would like to split the strings by ; delimiter so that I have:

ADDR1="bla@some.com"
ADDR2="john@home.com"

I don't necessarily need the ADDR1 and ADDR2 variables. If they are elements of an array that's even better.


After suggestions from the answers below, I ended up with the following which is what I was after:

#!/usr/bin/env bash

IN="bla@some.com;john@home.com"

mails=$(echo $IN | tr ";" "\n")

for addr in $mails
do
    echo "> [$addr]"
done

Output:

> [bla@some.com]
> [john@home.com]

There was a solution involving setting Internal_field_separator (IFS) to ; . I am not sure what happened with that answer, how do you reset IFS back to default?

RE: IFS solution, I tried this and it works, I keep the old IFS and then restore it:

IN="bla@some.com;john@home.com"

OIFS=$IFS
IFS=';'
mails2=$IN
for x in $mails2
do
    echo "> [$x]"
done

IFS=$OIFS

BTW, when I tried

mails2=($IN)

I only got the first string when printing it in loop, without brackets around $IN it works.

Brooks Moses , May 1, 2012 at 1:26

With regards to your "Edit2": You can simply "unset IFS" and it will return to the default state. There's no need to save and restore it explicitly unless you have some reason to expect that it's already been set to a non-default value. Moreover, if you're doing this inside a function (and, if you aren't, why not?), you can set IFS as a local variable and it will return to its previous value once you exit the function. – Brooks Moses May 1 '12 at 1:26

dubiousjim , May 31, 2012 at 5:21

@BrooksMoses: (a) +1 for using local IFS=... where possible; (b) -1 for unset IFS , this doesn't exactly reset IFS to its default value, though I believe an unset IFS behaves the same as the default value of IFS ($' \t\n'), however it seems bad practice to be assuming blindly that your code will never be invoked with IFS set to a custom value; (c) another idea is to invoke a subshell: (IFS=$custom; ...) when the subshell exits IFS will return to whatever it was originally. – dubiousjim May 31 '12 at 5:21

nicooga , Mar 7, 2016 at 15:32

I just want to have a quick look at the paths to decide where to throw an executable, so I resorted to run ruby -e "puts ENV.fetch('PATH').split(':')" . If you want to stay pure bash won't help but using any scripting language that has a built-in split is easier. – nicooga Mar 7 '16 at 15:32

Jeff , Apr 22 at 17:51

This is kind of a drive-by comment, but since the OP used email addresses as the example, has anyone bothered to answer it in a way that is fully RFC 5322 compliant, namely that any quoted string can appear before the @ which means you're going to need regular expressions or some other kind of parser instead of naive use of IFS or other simplistic splitter functions. – Jeff Apr 22 at 17:51

user2037659 , Apr 26 at 20:15

for x in $(IFS=';';echo $IN); do echo "> [$x]"; doneuser2037659 Apr 26 at 20:15

Johannes Schaub - litb , May 28, 2009 at 2:23

You can set the internal field separator (IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment to IFS only takes place to that single command's environment (to read ). It then parses the input according to the IFS variable value into an array, which we can then iterate over.
IFS=';' read -ra ADDR <<< "$IN"
for i in "${ADDR[@]}"; do
    # process "$i"
done

It will parse one line of items separated by ; , pushing it into an array. Stuff for processing whole of $IN , each time one line of input separated by ; :

 while IFS=';' read -ra ADDR; do
      for i in "${ADDR[@]}"; do
          # process "$i"
      done
 done <<< "$IN"

Chris Lutz , May 28, 2009 at 2:25

This is probably the best way. How long will IFS persist in it's current value, can it mess up my code by being set when it shouldn't be, and how can I reset it when I'm done with it? – Chris Lutz May 28 '09 at 2:25

Johannes Schaub - litb , May 28, 2009 at 3:04

now after the fix applied, only within the duration of the read command :) – Johannes Schaub - litb May 28 '09 at 3:04

lhunath , May 28, 2009 at 6:14

You can read everything at once without using a while loop: read -r -d '' -a addr <<< "$in" # The -d '' is key here, it tells read not to stop at the first newline (which is the default -d) but to continue until EOF or a NULL byte (which only occur in binary data). – lhunath May 28 '09 at 6:14

Charles Duffy , Jul 6, 2013 at 14:39

@LucaBorrione Setting IFS on the same line as the read with no semicolon or other separator, as opposed to in a separate command, scopes it to that command -- so it's always "restored"; you don't need to do anything manually. – Charles Duffy Jul 6 '13 at 14:39

chepner , Oct 2, 2014 at 3:50

@imagineerThis There is a bug involving herestrings and local changes to IFS that requires $IN to be quoted. The bug is fixed in bash 4.3. – chepner Oct 2 '14 at 3:50

palindrom , Mar 10, 2011 at 9:00

Taken from Bash shell script split array :
IN="bla@some.com;john@home.com"
arrIN=(${IN//;/ })

Explanation:

This construction replaces all occurrences of ';' (the initial // means global replace) in the string IN with ' ' (a single space), then interprets the space-delimited string as an array (that's what the surrounding parentheses do).

The syntax used inside of the curly braces to replace each ';' character with a ' ' character is called Parameter Expansion .

There are some common gotchas:

  1. If the original string has spaces, you will need to use IFS :
    • IFS=':'; arrIN=($IN); unset IFS;
  2. If the original string has spaces and the delimiter is a new line, you can set IFS with:
    • IFS=$'\n'; arrIN=($IN); unset IFS;

Oz123 , Mar 21, 2011 at 18:50

I just want to add: this is the simplest of all, you can access array elements with ${arrIN[1]} (starting from zeros of course) – Oz123 Mar 21 '11 at 18:50

KomodoDave , Jan 5, 2012 at 15:13

Found it: the technique of modifying a variable within a ${} is known as 'parameter expansion'. – KomodoDave Jan 5 '12 at 15:13

qbolec , Feb 25, 2013 at 9:12

Does it work when the original string contains spaces? – qbolec Feb 25 '13 at 9:12

Ethan , Apr 12, 2013 at 22:47

No, I don't think this works when there are also spaces present... it's converting the ',' to ' ' and then building a space-separated array. – Ethan Apr 12 '13 at 22:47

Charles Duffy , Jul 6, 2013 at 14:39

This is a bad approach for other reasons: For instance, if your string contains ;*; , then the * will be expanded to a list of filenames in the current directory. -1 – Charles Duffy Jul 6 '13 at 14:39

Chris Lutz , May 28, 2009 at 2:09

If you don't mind processing them immediately, I like to do this:
for i in $(echo $IN | tr ";" "\n")
do
  # process
done

You could use this kind of loop to initialize an array, but there's probably an easier way to do it. Hope this helps, though.

Chris Lutz , May 28, 2009 at 2:42

You should have kept the IFS answer. It taught me something I didn't know, and it definitely made an array, whereas this just makes a cheap substitute. – Chris Lutz May 28 '09 at 2:42

Johannes Schaub - litb , May 28, 2009 at 2:59

I see. Yeah i find doing these silly experiments, i'm going to learn new things each time i'm trying to answer things. I've edited stuff based on #bash IRC feedback and undeleted :) – Johannes Schaub - litb May 28 '09 at 2:59

lhunath , May 28, 2009 at 6:12

-1, you're obviously not aware of wordsplitting, because it's introducing two bugs in your code. one is when you don't quote $IN and the other is when you pretend a newline is the only delimiter used in wordsplitting. You are iterating over every WORD in IN, not every line, and DEFINATELY not every element delimited by a semicolon, though it may appear to have the side-effect of looking like it works. – lhunath May 28 '09 at 6:12

Johannes Schaub - litb , May 28, 2009 at 17:00

You could change it to echo "$IN" | tr ';' '\n' | while read -r ADDY; do # process "$ADDY"; done to make him lucky, i think :) Note that this will fork, and you can't change outer variables from within the loop (that's why i used the <<< "$IN" syntax) then – Johannes Schaub - litb May 28 '09 at 17:00

mklement0 , Apr 24, 2013 at 14:13

To summarize the debate in the comments: Caveats for general use : the shell applies word splitting and expansions to the string, which may be undesired; just try it with. IN="bla@some.com;john@home.com;*;broken apart" . In short: this approach will break, if your tokens contain embedded spaces and/or chars. such as * that happen to make a token match filenames in the current folder. – mklement0 Apr 24 '13 at 14:13

F. Hauri , Apr 13, 2013 at 14:20

Compatible answer

To this SO question, there is already a lot of different way to do this in bash . But bash has many special features, so called bashism that work well, but that won't work in any other shell .

In particular, arrays , associative array , and pattern substitution are pure bashisms and may not work under other shells .

On my Debian GNU/Linux , there is a standard shell called dash , but I know many people who like to use ksh .

Finally, in very small situation, there is a special tool called busybox with his own shell interpreter ( ash ).

Requested string

The string sample in SO question is:

IN="bla@some.com;john@home.com"

As this could be useful with whitespaces and as whitespaces could modify the result of the routine, I prefer to use this sample string:

 IN="bla@some.com;john@home.com;Full Name <fulnam@other.org>"
Split string based on delimiter in bash (version >=4.2)

Under pure bash, we may use arrays and IFS :

var="bla@some.com;john@home.com;Full Name <fulnam@other.org>"
oIFS="$IFS"
IFS=";"
declare -a fields=($var)
IFS="$oIFS"
unset oIFS

IFS=\; read -a fields <<<"$var"

Using this syntax under recent bash don't change $IFS for current session, but only for the current command:

set | grep ^IFS=
IFS=$' \t\n'

Now the string var is split and stored into an array (named fields ):

set | grep ^fields=\\\|^var=
fields=([0]="bla@some.com" [1]="john@home.com" [2]="Full Name <fulnam@other.org>")
var='bla@some.com;john@home.com;Full Name <fulnam@other.org>'

We could request for variable content with declare -p :

declare -p var fields
declare -- var="bla@some.com;john@home.com;Full Name <fulnam@other.org>"
declare -a fields=([0]="bla@some.com" [1]="john@home.com" [2]="Full Name <fulnam@other.org>")

read is the quickiest way to do the split, because there is no forks and no external resources called.

From there, you could use the syntax you already know for processing each field:

for x in "${fields[@]}";do
    echo "> [$x]"
    done
> [bla@some.com]
> [john@home.com]
> [Full Name <fulnam@other.org>]

or drop each field after processing (I like this shifting approach):

while [ "$fields" ] ;do
    echo "> [$fields]"
    fields=("${fields[@]:1}")
    done
> [bla@some.com]
> [john@home.com]
> [Full Name <fulnam@other.org>]

or even for simple printout (shorter syntax):

printf "> [%s]\n" "${fields[@]}"
> [bla@some.com]
> [john@home.com]
> [Full Name <fulnam@other.org>]
Split string based on delimiter in shell

But if you would write something usable under many shells, you have to not use bashisms .

There is a syntax, used in many shells, for splitting a string across first or last occurrence of a substring:

${var#*SubStr}  # will drop begin of string up to first occur of `SubStr`
${var##*SubStr} # will drop begin of string up to last occur of `SubStr`
${var%SubStr*}  # will drop part of string from last occur of `SubStr` to the end
${var%%SubStr*} # will drop part of string from first occur of `SubStr` to the end

(The missing of this is the main reason of my answer publication ;)

As pointed out by Score_Under :

# and % delete the shortest possible matching string, and

## and %% delete the longest possible.

This little sample script work well under bash , dash , ksh , busybox and was tested under Mac-OS's bash too:

var="bla@some.com;john@home.com;Full Name <fulnam@other.org>"
while [ "$var" ] ;do
    iter=${var%%;*}
    echo "> [$iter]"
    [ "$var" = "$iter" ] && \
        var='' || \
        var="${var#*;}"
  done
> [bla@some.com]
> [john@home.com]
> [Full Name <fulnam@other.org>]

Have fun!

Score_Under , Apr 28, 2015 at 16:58

The # , ## , % , and %% substitutions have what is IMO an easier explanation to remember (for how much they delete): # and % delete the shortest possible matching string, and ## and %% delete the longest possible. – Score_Under Apr 28 '15 at 16:58

sorontar , Oct 26, 2016 at 4:36

The IFS=\; read -a fields <<<"$var" fails on newlines and add a trailing newline. The other solution removes a trailing empty field. – sorontar Oct 26 '16 at 4:36

Eric Chen , Aug 30, 2017 at 17:50

The shell delimiter is the most elegant answer, period. – Eric Chen Aug 30 '17 at 17:50

sancho.s , Oct 4 at 3:42

Could the last alternative be used with a list of field separators set somewhere else? For instance, I mean to use this as a shell script, and pass a list of field separators as a positional parameter. – sancho.s Oct 4 at 3:42

F. Hauri , Oct 4 at 7:47

Yes, in a loop: for sep in "#" "ł" "@" ; do ... var="${var#*$sep}" ...F. Hauri Oct 4 at 7:47

DougW , Apr 27, 2015 at 18:20

I've seen a couple of answers referencing the cut command, but they've all been deleted. It's a little odd that nobody has elaborated on that, because I think it's one of the more useful commands for doing this type of thing, especially for parsing delimited log files.

In the case of splitting this specific example into a bash script array, tr is probably more efficient, but cut can be used, and is more effective if you want to pull specific fields from the middle.

Example:

$ echo "bla@some.com;john@home.com" | cut -d ";" -f 1
bla@some.com
$ echo "bla@some.com;john@home.com" | cut -d ";" -f 2
john@home.com

You can obviously put that into a loop, and iterate the -f parameter to pull each field independently.

This gets more useful when you have a delimited log file with rows like this:

2015-04-27|12345|some action|an attribute|meta data

cut is very handy to be able to cat this file and select a particular field for further processing.

MisterMiyagi , Nov 2, 2016 at 8:42

Kudos for using cut , it's the right tool for the job! Much cleared than any of those shell hacks. – MisterMiyagi Nov 2 '16 at 8:42

uli42 , Sep 14, 2017 at 8:30

This approach will only work if you know the number of elements in advance; you'd need to program some more logic around it. It also runs an external tool for every element. – uli42 Sep 14 '17 at 8:30

Louis Loudog Trottier , May 10 at 4:20

Excatly waht i was looking for trying to avoid empty string in a csv. Now i can point the exact 'column' value as well. Work with IFS already used in a loop. Better than expected for my situation. – Louis Loudog Trottier May 10 at 4:20

, May 28, 2009 at 10:31

How about this approach:
IN="bla@some.com;john@home.com" 
set -- "$IN" 
IFS=";"; declare -a Array=($*) 
echo "${Array[@]}" 
echo "${Array[0]}" 
echo "${Array[1]}"

Source

Yzmir Ramirez , Sep 5, 2011 at 1:06

+1 ... but I wouldn't name the variable "Array" ... pet peev I guess. Good solution. – Yzmir Ramirez Sep 5 '11 at 1:06

ata , Nov 3, 2011 at 22:33

+1 ... but the "set" and declare -a are unnecessary. You could as well have used just IFS";" && Array=($IN)ata Nov 3 '11 at 22:33

Luca Borrione , Sep 3, 2012 at 9:26

+1 Only a side note: shouldn't it be recommendable to keep the old IFS and then restore it? (as shown by stefanB in his edit3) people landing here (sometimes just copying and pasting a solution) might not think about this – Luca Borrione Sep 3 '12 at 9:26

Charles Duffy , Jul 6, 2013 at 14:44

-1: First, @ata is right that most of the commands in this do nothing. Second, it uses word-splitting to form the array, and doesn't do anything to inhibit glob-expansion when doing so (so if you have glob characters in any of the array elements, those elements are replaced with matching filenames). – Charles Duffy Jul 6 '13 at 14:44

John_West , Jan 8, 2016 at 12:29

Suggest to use $'...' : IN=$'bla@some.com;john@home.com;bet <d@\ns* kl.com>' . Then echo "${Array[2]}" will print a string with newline. set -- "$IN" is also neccessary in this case. Yes, to prevent glob expansion, the solution should include set -f . – John_West Jan 8 '16 at 12:29

Steven Lizarazo , Aug 11, 2016 at 20:45

This worked for me:
string="1;2"
echo $string | cut -d';' -f1 # output is 1
echo $string | cut -d';' -f2 # output is 2

Pardeep Sharma , Oct 10, 2017 at 7:29

this is sort and sweet :) – Pardeep Sharma Oct 10 '17 at 7:29

space earth , Oct 17, 2017 at 7:23

Thanks...Helped a lot – space earth Oct 17 '17 at 7:23

mojjj , Jan 8 at 8:57

cut works only with a single char as delimiter. – mojjj Jan 8 at 8:57

lothar , May 28, 2009 at 2:12

echo "bla@some.com;john@home.com" | sed -e 's/;/\n/g'
bla@some.com
john@home.com

Luca Borrione , Sep 3, 2012 at 10:08

-1 what if the string contains spaces? for example IN="this is first line; this is second line" arrIN=( $( echo "$IN" | sed -e 's/;/\n/g' ) ) will produce an array of 8 elements in this case (an element for each word space separated), rather than 2 (an element for each line semi colon separated) – Luca Borrione Sep 3 '12 at 10:08

lothar , Sep 3, 2012 at 17:33

@Luca No the sed script creates exactly two lines. What creates the multiple entries for you is when you put it into a bash array (which splits on white space by default) – lothar Sep 3 '12 at 17:33

Luca Borrione , Sep 4, 2012 at 7:09

That's exactly the point: the OP needs to store entries into an array to loop over it, as you can see in his edits. I think your (good) answer missed to mention to use arrIN=( $( echo "$IN" | sed -e 's/;/\n/g' ) ) to achieve that, and to advice to change IFS to IFS=$'\n' for those who land here in the future and needs to split a string containing spaces. (and to restore it back afterwards). :) – Luca Borrione Sep 4 '12 at 7:09

lothar , Sep 4, 2012 at 16:55

@Luca Good point. However the array assignment was not in the initial question when I wrote up that answer. – lothar Sep 4 '12 at 16:55

Ashok , Sep 8, 2012 at 5:01

This also works:
IN="bla@some.com;john@home.com"
echo ADD1=`echo $IN | cut -d \; -f 1`
echo ADD2=`echo $IN | cut -d \; -f 2`

Be careful, this solution is not always correct. In case you pass "bla@some.com" only, it will assign it to both ADD1 and ADD2.

fersarr , Mar 3, 2016 at 17:17

You can use -s to avoid the mentioned problem: superuser.com/questions/896800/ "-f, --fields=LIST select only these fields; also print any line that contains no delimiter character, unless the -s option is specified" – fersarr Mar 3 '16 at 17:17

Tony , Jan 14, 2013 at 6:33

I think AWK is the best and efficient command to resolve your problem. AWK is included in Bash by default in almost every Linux distribution.
echo "bla@some.com;john@home.com" | awk -F';' '{print $1,$2}'

will give

bla@some.com john@home.com

Of course your can store each email address by redefining the awk print field.

Jaro , Jan 7, 2014 at 21:30

Or even simpler: echo "bla@some.com;john@home.com" | awk 'BEGIN{RS=";"} {print}' – Jaro Jan 7 '14 at 21:30

Aquarelle , May 6, 2014 at 21:58

@Jaro This worked perfectly for me when I had a string with commas and needed to reformat it into lines. Thanks. – Aquarelle May 6 '14 at 21:58

Eduardo Lucio , Aug 5, 2015 at 12:59

It worked in this scenario -> "echo "$SPLIT_0" | awk -F' inode=' '{print $1}'"! I had problems when trying to use atrings (" inode=") instead of characters (";"). $ 1, $ 2, $ 3, $ 4 are set as positions in an array! If there is a way of setting an array... better! Thanks! – Eduardo Lucio Aug 5 '15 at 12:59

Tony , Aug 6, 2015 at 2:42

@EduardoLucio, what I'm thinking about is maybe you can first replace your delimiter inode= into ; for example by sed -i 's/inode\=/\;/g' your_file_to_process , then define -F';' when apply awk , hope that can help you. – Tony Aug 6 '15 at 2:42

nickjb , Jul 5, 2011 at 13:41

A different take on Darron's answer , this is how I do it:
IN="bla@some.com;john@home.com"
read ADDR1 ADDR2 <<<$(IFS=";"; echo $IN)

ColinM , Sep 10, 2011 at 0:31

This doesn't work. – ColinM Sep 10 '11 at 0:31

nickjb , Oct 6, 2011 at 15:33

I think it does! Run the commands above and then "echo $ADDR1 ... $ADDR2" and i get "bla@some.com ... john@home.com" output – nickjb Oct 6 '11 at 15:33

Nick , Oct 28, 2011 at 14:36

This worked REALLY well for me... I used it to itterate over an array of strings which contained comma separated DB,SERVER,PORT data to use mysqldump. – Nick Oct 28 '11 at 14:36

dubiousjim , May 31, 2012 at 5:28

Diagnosis: the IFS=";" assignment exists only in the $(...; echo $IN) subshell; this is why some readers (including me) initially think it won't work. I assumed that all of $IN was getting slurped up by ADDR1. But nickjb is correct; it does work. The reason is that echo $IN command parses its arguments using the current value of $IFS, but then echoes them to stdout using a space delimiter, regardless of the setting of $IFS. So the net effect is as though one had called read ADDR1 ADDR2 <<< "bla@some.com john@home.com" (note the input is space-separated not ;-separated). – dubiousjim May 31 '12 at 5:28

sorontar , Oct 26, 2016 at 4:43

This fails on spaces and newlines, and also expand wildcards * in the echo $IN with an unquoted variable expansion. – sorontar Oct 26 '16 at 4:43

gniourf_gniourf , Jun 26, 2014 at 9:11

In Bash, a bullet proof way, that will work even if your variable contains newlines:
IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")

Look:

$ in=$'one;two three;*;there is\na newline\nin this field'
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two three" [2]="*" [3]="there is
a newline
in this field")'

The trick for this to work is to use the -d option of read (delimiter) with an empty delimiter, so that read is forced to read everything it's fed. And we feed read with exactly the content of the variable in , with no trailing newline thanks to printf . Note that's we're also putting the delimiter in printf to ensure that the string passed to read has a trailing delimiter. Without it, read would trim potential trailing empty fields:

$ in='one;two;three;'    # there's an empty field
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two" [2]="three" [3]="")'

the trailing empty field is preserved.


Update for Bash≥4.4

Since Bash 4.4, the builtin mapfile (aka readarray ) supports the -d option to specify a delimiter. Hence another canonical way is:

mapfile -d ';' -t array < <(printf '%s;' "$in")

John_West , Jan 8, 2016 at 12:10

I found it as the rare solution on that list that works correctly with \n , spaces and * simultaneously. Also, no loops; array variable is accessible in the shell after execution (contrary to the highest upvoted answer). Note, in=$'...' , it does not work with double quotes. I think, it needs more upvotes. – John_West Jan 8 '16 at 12:10

Darron , Sep 13, 2010 at 20:10

How about this one liner, if you're not using arrays:
IFS=';' read ADDR1 ADDR2 <<<$IN

dubiousjim , May 31, 2012 at 5:36

Consider using read -r ... to ensure that, for example, the two characters "\t" in the input end up as the same two characters in your variables (instead of a single tab char). – dubiousjim May 31 '12 at 5:36

Luca Borrione , Sep 3, 2012 at 10:07

-1 This is not working here (ubuntu 12.04). Adding echo "ADDR1 $ADDR1"\n echo "ADDR2 $ADDR2" to your snippet will output ADDR1 bla@some.com john@home.com\nADDR2 (\n is newline) – Luca Borrione Sep 3 '12 at 10:07

chepner , Sep 19, 2015 at 13:59

This is probably due to a bug involving IFS and here strings that was fixed in bash 4.3. Quoting $IN should fix it. (In theory, $IN is not subject to word splitting or globbing after it expands, meaning the quotes should be unnecessary. Even in 4.3, though, there's at least one bug remaining--reported and scheduled to be fixed--so quoting remains a good idea.) – chepner Sep 19 '15 at 13:59

sorontar , Oct 26, 2016 at 4:55

This breaks if $in contain newlines even if $IN is quoted. And adds a trailing newline. – sorontar Oct 26 '16 at 4:55

kenorb , Sep 11, 2015 at 20:54

Here is a clean 3-liner:
in="foo@bar;bizz@buzz;fizz@buzz;buzz@woof"
IFS=';' list=($in)
for item in "${list[@]}"; do echo $item; done

where IFS delimit words based on the separator and () is used to create an array . Then [@] is used to return each item as a separate word.

If you've any code after that, you also need to restore $IFS , e.g. unset IFS .

sorontar , Oct 26, 2016 at 5:03

The use of $in unquoted allows wildcards to be expanded. – sorontar Oct 26 '16 at 5:03

user2720864 , Sep 24 at 13:46

+ for the unset command – user2720864 Sep 24 at 13:46

Emilien Brigand , Aug 1, 2016 at 13:15

Without setting the IFS

If you just have one colon you can do that:

a="foo:bar"
b=${a%:*}
c=${a##*:}

you will get:

b = foo
c = bar

Victor Choy , Sep 16, 2015 at 3:34

There is a simple and smart way like this:
echo "add:sfff" | xargs -d: -i  echo {}

But you must use gnu xargs, BSD xargs cant support -d delim. If you use apple mac like me. You can install gnu xargs :

brew install findutils

then

echo "add:sfff" | gxargs -d: -i  echo {}

Halle Knast , May 24, 2017 at 8:42

The following Bash/zsh function splits its first argument on the delimiter given by the second argument:
split() {
    local string="$1"
    local delimiter="$2"
    if [ -n "$string" ]; then
        local part
        while read -d "$delimiter" part; do
            echo $part
        done <<< "$string"
        echo $part
    fi
}

For instance, the command

$ split 'a;b;c' ';'

yields

a
b
c

This output may, for instance, be piped to other commands. Example:

$ split 'a;b;c' ';' | cat -n
1   a
2   b
3   c

Compared to the other solutions given, this one has the following advantages:

If desired, the function may be put into a script as follows:

#!/usr/bin/env bash

split() {
    # ...
}

split "$@"

sandeepkunkunuru , Oct 23, 2017 at 16:10

works and neatly modularized. – sandeepkunkunuru Oct 23 '17 at 16:10

Prospero , Sep 25, 2011 at 1:09

This is the simplest way to do it.
spo='one;two;three'
OIFS=$IFS
IFS=';'
spo_array=($spo)
IFS=$OIFS
echo ${spo_array[*]}

rashok , Oct 25, 2016 at 12:41

IN="bla@some.com;john@home.com"
IFS=';'
read -a IN_arr <<< "${IN}"
for entry in "${IN_arr[@]}"
do
    echo $entry
done

Output

bla@some.com
john@home.com

System : Ubuntu 12.04.1

codeforester , Jan 2, 2017 at 5:37

IFS is not getting set in the specific context of read here and hence it can upset rest of the code, if any. – codeforester Jan 2 '17 at 5:37

shuaihanhungry , Jan 20 at 15:54

you can apply awk to many situations
echo "bla@some.com;john@home.com"|awk -F';' '{printf "%s\n%s\n", $1, $2}'

also you can use this

echo "bla@some.com;john@home.com"|awk -F';' '{print $1,$2}' OFS="\n"

ghost , Apr 24, 2013 at 13:13

If no space, Why not this?
IN="bla@some.com;john@home.com"
arr=(`echo $IN | tr ';' ' '`)

echo ${arr[0]}
echo ${arr[1]}

eukras , Oct 22, 2012 at 7:10

There are some cool answers here (errator esp.), but for something analogous to split in other languages -- which is what I took the original question to mean -- I settled on this:
IN="bla@some.com;john@home.com"
declare -a a="(${IN/;/ })";

Now ${a[0]} , ${a[1]} , etc, are as you would expect. Use ${#a[*]} for number of terms. Or to iterate, of course:

for i in ${a[*]}; do echo $i; done

IMPORTANT NOTE:

This works in cases where there are no spaces to worry about, which solved my problem, but may not solve yours. Go with the $IFS solution(s) in that case.

olibre , Oct 7, 2013 at 13:33

Does not work when IN contains more than two e-mail addresses. Please refer to same idea (but fixed) at palindrom's answerolibre Oct 7 '13 at 13:33

sorontar , Oct 26, 2016 at 5:14

Better use ${IN//;/ } (double slash) to make it also work with more than two values. Beware that any wildcard ( *?[ ) will be expanded. And a trailing empty field will be discarded. – sorontar Oct 26 '16 at 5:14

jeberle , Apr 30, 2013 at 3:10

Use the set built-in to load up the $@ array:
IN="bla@some.com;john@home.com"
IFS=';'; set $IN; IFS=$' \t\n'

Then, let the party begin:

echo $#
for a; do echo $a; done
ADDR1=$1 ADDR2=$2

sorontar , Oct 26, 2016 at 5:17

Better use set -- $IN to avoid some issues with "$IN" starting with dash. Still, the unquoted expansion of $IN will expand wildcards ( *?[ ). – sorontar Oct 26 '16 at 5:17

NevilleDNZ , Sep 2, 2013 at 6:30

Two bourne-ish alternatives where neither require bash arrays:

Case 1 : Keep it nice and simple: Use a NewLine as the Record-Separator... eg.

IN="bla@some.com
john@home.com"

while read i; do
  # process "$i" ... eg.
    echo "[email:$i]"
done <<< "$IN"

Note: in this first case no sub-process is forked to assist with list manipulation.

Idea: Maybe it is worth using NL extensively internally , and only converting to a different RS when generating the final result externally .

Case 2 : Using a ";" as a record separator... eg.

NL="
" IRS=";" ORS=";"

conv_IRS() {
  exec tr "$1" "$NL"
}

conv_ORS() {
  exec tr "$NL" "$1"
}

IN="bla@some.com;john@home.com"
IN="$(conv_IRS ";" <<< "$IN")"

while read i; do
  # process "$i" ... eg.
    echo -n "[email:$i]$ORS"
done <<< "$IN"

In both cases a sub-list can be composed within the loop is persistent after the loop has completed. This is useful when manipulating lists in memory, instead storing lists in files. {p.s. keep calm and carry on B-) }

fedorqui , Jan 8, 2015 at 10:21

Apart from the fantastic answers that were already provided, if it is just a matter of printing out the data you may consider using awk :
awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"

This sets the field separator to ; , so that it can loop through the fields with a for loop and print accordingly.

Test
$ IN="bla@some.com;john@home.com"
$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"
> [bla@some.com]
> [john@home.com]

With another input:

$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "a;b;c   d;e_;f"
> [a]
> [b]
> [c   d]
> [e_]
> [f]

18446744073709551615 , Feb 20, 2015 at 10:49

In Android shell, most of the proposed methods just do not work:
$ IFS=':' read -ra ADDR <<<"$PATH"                             
/system/bin/sh: can't create temporary file /sqlite_stmt_journals/mksh.EbNoR10629: No such file or directory

What does work is:

$ for i in ${PATH//:/ }; do echo $i; done
/sbin
/vendor/bin
/system/sbin
/system/bin
/system/xbin

where // means global replacement.

sorontar , Oct 26, 2016 at 5:08

Fails if any part of $PATH contains spaces (or newlines). Also expands wildcards (asterisk *, question mark ? and braces [ ]). – sorontar Oct 26 '16 at 5:08

Eduardo Lucio , Apr 4, 2016 at 19:54

Okay guys!

Here's my answer!

DELIMITER_VAL='='

read -d '' F_ABOUT_DISTRO_R <<"EOF"
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"
NAME="Ubuntu"
VERSION="14.04.4 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.4 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
EOF

SPLIT_NOW=$(awk -F$DELIMITER_VAL '{for(i=1;i<=NF;i++){printf "%s\n", $i}}' <<<"${F_ABOUT_DISTRO_R}")
while read -r line; do
   SPLIT+=("$line")
done <<< "$SPLIT_NOW"
for i in "${SPLIT[@]}"; do
    echo "$i"
done

Why this approach is "the best" for me?

Because of two reasons:

  1. You do not need to escape the delimiter;
  2. You will not have problem with blank spaces . The value will be properly separated in the array!

[]'s

gniourf_gniourf , Jan 30, 2017 at 8:26

FYI, /etc/os-release and /etc/lsb-release are meant to be sourced, and not parsed. So your method is really wrong. Moreover, you're not quite answering the question about spiltting a string on a delimiter.gniourf_gniourf Jan 30 '17 at 8:26

Michael Hale , Jun 14, 2012 at 17:38

A one-liner to split a string separated by ';' into an array is:
IN="bla@some.com;john@home.com"
ADDRS=( $(IFS=";" echo "$IN") )
echo ${ADDRS[0]}
echo ${ADDRS[1]}

This only sets IFS in a subshell, so you don't have to worry about saving and restoring its value.

Luca Borrione , Sep 3, 2012 at 10:04

-1 this doesn't work here (ubuntu 12.04). it prints only the first echo with all $IN value in it, while the second is empty. you can see it if you put echo "0: "${ADDRS[0]}\n echo "1: "${ADDRS[1]} the output is 0: bla@some.com;john@home.com\n 1: (\n is new line) – Luca Borrione Sep 3 '12 at 10:04

Luca Borrione , Sep 3, 2012 at 10:05

please refer to nickjb's answer at for a working alternative to this idea stackoverflow.com/a/6583589/1032370 – Luca Borrione Sep 3 '12 at 10:05

Score_Under , Apr 28, 2015 at 17:09

-1, 1. IFS isn't being set in that subshell (it's being passed to the environment of "echo", which is a builtin, so nothing is happening anyway). 2. $IN is quoted so it isn't subject to IFS splitting. 3. The process substitution is split by whitespace, but this may corrupt the original data. – Score_Under Apr 28 '15 at 17:09

ajaaskel , Oct 10, 2014 at 11:33

IN='bla@some.com;john@home.com;Charlie Brown <cbrown@acme.com;!"#$%&/()[]{}*? are no problem;simple is beautiful :-)'
set -f
oldifs="$IFS"
IFS=';'; arrayIN=($IN)
IFS="$oldifs"
for i in "${arrayIN[@]}"; do
echo "$i"
done
set +f

Output:

bla@some.com
john@home.com
Charlie Brown <cbrown@acme.com
!"#$%&/()[]{}*? are no problem
simple is beautiful :-)

Explanation: Simple assignment using parenthesis () converts semicolon separated list into an array provided you have correct IFS while doing that. Standard FOR loop handles individual items in that array as usual. Notice that the list given for IN variable must be "hard" quoted, that is, with single ticks.

IFS must be saved and restored since Bash does not treat an assignment the same way as a command. An alternate workaround is to wrap the assignment inside a function and call that function with a modified IFS. In that case separate saving/restoring of IFS is not needed. Thanks for "Bize" for pointing that out.

gniourf_gniourf , Feb 20, 2015 at 16:45

!"#$%&/()[]{}*? are no problem well... not quite: []*? are glob characters. So what about creating this directory and file: `mkdir '!"#$%&'; touch '!"#$%&/()[]{} got you hahahaha - are no problem' and running your command? simple may be beautiful, but when it's broken, it's broken. – gniourf_gniourf Feb 20 '15 at 16:45

ajaaskel , Feb 25, 2015 at 7:20

@gniourf_gniourf The string is stored in a variable. Please see the original question. – ajaaskel Feb 25 '15 at 7:20

gniourf_gniourf , Feb 25, 2015 at 7:26

@ajaaskel you didn't fully understand my comment. Go in a scratch directory and issue these commands: mkdir '!"#$%&'; touch '!"#$%&/()[]{} got you hahahaha - are no problem' . They will only create a directory and a file, with weird looking names, I must admit. Then run your commands with the exact IN you gave: IN='bla@some.com;john@home.com;Charlie Brown <cbrown@acme.com;!"#$%&/()[]{}*? are no problem;simple is beautiful :-)' . You'll see that you won't get the output you expect. Because you're using a method subject to pathname expansions to split your string. – gniourf_gniourf Feb 25 '15 at 7:26

gniourf_gniourf , Feb 25, 2015 at 7:29

This is to demonstrate that the characters * , ? , [...] and even, if extglob is set, !(...) , @(...) , ?(...) , +(...) are problems with this method! – gniourf_gniourf Feb 25 '15 at 7:29

ajaaskel , Feb 26, 2015 at 15:26

@gniourf_gniourf Thanks for detailed comments on globbing. I adjusted the code to have globbing off. My point was however just to show that rather simple assignment can do the splitting job. – ajaaskel Feb 26 '15 at 15:26

> , Dec 19, 2013 at 21:39

Maybe not the most elegant solution, but works with * and spaces:
IN="bla@so me.com;*;john@home.com"
for i in `delims=${IN//[^;]}; seq 1 $((${#delims} + 1))`
do
   echo "> [`echo $IN | cut -d';' -f$i`]"
done

Outputs

> [bla@so me.com]
> [*]
> [john@home.com]

Other example (delimiters at beginning and end):

IN=";bla@so me.com;*;john@home.com;"
> []
> [bla@so me.com]
> [*]
> [john@home.com]
> []

Basically it removes every character other than ; making delims eg. ;;; . Then it does for loop from 1 to number-of-delimiters as counted by ${#delims} . The final step is to safely get the $i th part using cut .

[Nov 08, 2018] Utilizing multi core for tar+gzip-bzip compression-decompression

Nov 08, 2018 | stackoverflow.com

Ask Question up vote 163 down vote favorite 67


user1118764 , Sep 7, 2012 at 6:58

I normally compress using tar zcvf and decompress using tar zxvf (using gzip due to habit).

I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I notice that many of the cores are unused during compression/decompression.

Is there any way I can utilize the unused cores to make it faster?

Warren Severin , Nov 13, 2017 at 4:37

The solution proposed by Xiong Chiamiov above works beautifully. I had just backed up my laptop with .tar.bz2 and it took 132 minutes using only one cpu thread. Then I compiled and installed tar from source: gnu.org/software/tar I included the options mentioned in the configure step: ./configure --with-gzip=pigz --with-bzip2=lbzip2 --with-lzip=plzip I ran the backup again and it took only 32 minutes. That's better than 4X improvement! I watched the system monitor and it kept all 4 cpus (8 threads) flatlined at 100% the whole time. THAT is the best solution. – Warren Severin Nov 13 '17 at 4:37

Mark Adler , Sep 7, 2012 at 14:48

You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz:
tar cf - paths-to-archive | pigz > archive.tar.gz

By default, pigz uses the number of available cores, or eight if it could not query that. You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can request better compression with -9. E.g.

tar cf - paths-to-archive | pigz -9 -p 32 > archive.tar.gz

user788171 , Feb 20, 2013 at 12:43

How do you use pigz to decompress in the same fashion? Or does it only work for compression? – user788171 Feb 20 '13 at 12:43

Mark Adler , Feb 20, 2013 at 16:18

pigz does use multiple cores for decompression, but only with limited improvement over a single core. The deflate format does not lend itself to parallel decompression. The decompression portion must be done serially. The other cores for pigz decompression are used for reading, writing, and calculating the CRC. When compressing on the other hand, pigz gets close to a factor of n improvement with n cores. – Mark Adler Feb 20 '13 at 16:18

Garrett , Mar 1, 2014 at 7:26

The hyphen here is stdout (see this page ). – Garrett Mar 1 '14 at 7:26

Mark Adler , Jul 2, 2014 at 21:29

Yes. 100% compatible in both directions. – Mark Adler Jul 2 '14 at 21:29

Mark Adler , Apr 23, 2015 at 5:23

There is effectively no CPU time spent tarring, so it wouldn't help much. The tar format is just a copy of the input file with header blocks in between files. – Mark Adler Apr 23 '15 at 5:23

Jen , Jun 14, 2013 at 14:34

You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use.

For example use:

tar -c --use-compress-program=pigz -f tar.file dir_to_zip

ranman , Nov 13, 2013 at 10:01

This is an awesome little nugget of knowledge and deserves more upvotes. I had no idea this option even existed and I've read the man page a few times over the years. – ranman Nov 13 '13 at 10:01

Valerio Schiavoni , Aug 5, 2014 at 22:38

Unfortunately by doing so the concurrent feature of pigz is lost. You can see for yourself by executing that command and monitoring the load on each of the cores. – Valerio Schiavoni Aug 5 '14 at 22:38

bovender , Sep 18, 2015 at 10:14

@ValerioSchiavoni: Not here, I get full load on all 4 cores (Ubuntu 15.04 'Vivid'). – bovender Sep 18 '15 at 10:14

Valerio Schiavoni , Sep 28, 2015 at 23:41

On compress or on decompress ? – Valerio Schiavoni Sep 28 '15 at 23:41

Offenso , Jan 11, 2017 at 17:26

I prefer tar - dir_to_zip | pv | pigz > tar.file pv helps me estimate, you can skip it. But still it easier to write and remember. – Offenso Jan 11 '17 at 17:26

Maxim Suslov , Dec 18, 2014 at 7:31

Common approach

There is option for tar program:

-I, --use-compress-program PROG
      filter through PROG (must accept -d)

You can use multithread version of archiver or compressor utility.

Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:

$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive
$ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archive

Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need specify additional parameters, then use pipes (add parameters if necessary):

$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz
$ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gz

Input and output of singlethread and multithread are compatible. You can compress using multithread version and decompress using singlethread version and vice versa.

p7zip

For p7zip for compression you need a small shell script like t