The command batch is one of the three commands that constitute simple batch scheduler
in Unix. The other two are jobs and jobs.
There is also script atrun which allow to specify the load threshold below which batch
jobs are allowed to run.
The batch utility
is a primitive batch system built-in all major Unixes and Linux. It is based on
at command and executes commands when system load levels permit.
How to run commands as in a queue - Super User
The at utility, best known for running commands at a specified time, also has a queue
feature and can be asked to start running commands now. It reads the command to run from standard
input.
echo 'command1 --option arg1 arg2' | at -q myqueue now
echo 'command2 ...' | at -q myqueue now
The
batch command is equivalent to at -q b -m now (-m
meaning the command output, if any, will be mailed to you, like cron does). Not all unix variants
support queue names (-q myqueue); you may be limited to a single queue called b.
Linux's at is limited to single-letter queue names.
But default this is when the load average drops below 0.8, or the value specified
in the invocation ofatrun-lload_avg] [-d]
The latter is a shell script containing invocation of /usr/sbin/atd
with the -s option (exists for compatibility with other Unixes).
In Linux it is extremely primitive and alternative batch systems should
be used for anything more complex then submitting series of jobs with equal
priorities.
By default batch submit jobs into b queue, the queue
exists specifically for batch jobs (query a is used for at command
submissions and query c for crontab submission).
Since queue c is reserved for cron jobs, it can not be
used with the -q option.
Execution
of submitted jobs can be delayed by limits on the number of jobs allowed
to run concurrently. Submission jobs into
several queries permit running several jobs streams in parallel.
batch [-V] [-q queue] [-f file]
[-mv] [TIME] executes commands
when system load levels permit; in other words, when the load average
drops below 1.5, or the value specified or the value specified in the invocation of atd.
Options:
-V prints the version number to standard error.
-q queue uses the specified queue. A queue designation
consists of a single letter; valid queue designations range from
a to z. and A to Z. The b queue is the
default for batch. Queues with higher letters run with
increased niceness. The special queue "=" is reserved for jobs which
are currently running. If a job is submitted to an uppercase letter queue , it is treated as if it had been submitted to batch
at the specific time. Once the time is reached, the batch processing rules with respect to
load average apply
-m Send mail to the user when the job has completed
-f file Reads the job from file rather than
standard input.
-v Shows the time the job will be executed. Times displayed
will be in the format "1997-02-20 14:50" unless the environment variable
POSIXLY_CORRECT is set; then, it will be "Thu Feb 20 14:50:00
1996".
TIME There are several important relative time frames used ( see at
command for details)
now - Indicates
the current day and time. Invoking at - now will submit an at-job
for potentially immediate execution.
today
tomorrow
midnight - Indicates
the time 12:00 am (00:00).
noon - Indicates
the time 12:00 pm.
The optional increment after time specification in
at command permit to specify offset from the time. It should be
a number preceded by a plus sign (+) with one of the following
suffixes:
minutes,
hours,
days,
weeks,
months,
years.
The spacing is quite flexible as long as there are no ambiguities.
For example:
0815am Jan 24
8 :15amjan24
now "+ 1day"
5 pm FRIday
'17
utc+
30minutes'
The singular forms are also accepted, for example
now + 1 minute
The keyword next can be used as an equivalent to an increment
+ 1. For example,:
/usr/bin/at -q b [-p project] now
/usr/xpg4/bin/at -q b -m [-p project] now
At the same time at is quite different animal than cron: "at"
preserves the environment in which it was invoked, while cron does not (it
executes command in its own "cron" environment, and you should not expect
that PATH and other valuables will be preserved).
The at utility is pipable: it can reads commands from standard
input and submit a job to be executed immediately (like in example below)
or at a later time.
echo "perl myjob"
| at now
The at-job is executed in a separate invocation of the shell, running
in a separate process group with no controlling terminal, except that the
environment variables, current working directory, file creation mask (see
umask(1)), and system resource limits (for sh and ksh
only, see
ulimit(1)) in effect when the at utility is executed is retained
and used when the at-job is executed.
When the at-job is submitted, the at_job_id and scheduled
time are written to standard error. The at_job_id is an identifier
that is a string consisting solely of alphanumeric characters and the period
character. The at_job_id is assigned by the system when the job
is scheduled such that it uniquely identifies a particular job.
User notification and the processing of the job's standard output and
standard error are described under the -m option.
Permissions
Like with cron two files that list users one per line and are similar
to cron control files control the behavior of the command:
/usr/lib/cron/at.allow
/usr/lib/cron/at.deny
Rules
Users are permitted to use at and batch (see
below) if their name appears in the file /usr/lib/cron/at.allow
If this file exists, all users that need to use at commadn should be explicitly
listed.
If that file does not exist, the file /usr/lib/cron/at.deny
is checked to determine if the user should be denied access to
at.
If neither file exists, only a user with the solaris.jobs.user
authorization is allowed to submit a job.
If onlyat.deny exists and is empty, global usage is
permitted.
cron and at jobs are not be executed if the
user's account is locked. Only accounts which are not locked as defined
in
shadow(4) will have their job or process executed. For example
webservd is a "locked" user under
Solaris 10 and you need to unlock it before it is possible to use at
for submitting jobs from this user. Un such cases /var/cron/log will
contain the entry like the folllowing:
! bad user (webservd) Fri Apr 21 14:47:49
2006
That can also happen with human accounts if password aging was turned
on. Apparently if the password expires cron jobs do not run.
Bourne shell.
sh(1) is used to execute the at-job.
-ffile
Specifies the path of a file to be used as the source of the at-job,
instead of standard input.
-l
(The letter ell.) Reports all jobs scheduled for the invoking user if
no at_job_id operands are specified. If at_job_ids
are specified, reports only information for these jobs.
-m
Sends mail to the invoking user after the at-job has run, announcing
its completion. Standard output and standard error produced by the at-job
are mailed to the user as well, unless redirected elsewhere. Mail is
sent even if the job produces no output.
If -m is
not used, the job's standard output and standard error is provided to
the user by means of mail, unless they are redirected elsewhere; if
there is no such output to provide, the user is not notified of the
job's completion.
-pproject
Specifies under which project the at or batch job
is run. When used with the -l option, limits the search
to that particular project. Values for project is interpreted
first as a project name, and then as a possible project ID, if entirely
numeric. By default, the user's current project is used.
-qqueuename
Specifies in which queue to schedule a job for submission. When used
with the -l option, limits the search to that particular
queue. Values for queuename are limited to the lower case
letters a through z. By default, at-jobs are scheduled
in queue a. In contrast, queue b is reserved for batch
jobs. Since queue c is reserved for cron jobs, it can not be
used with the -q option.
-rat_job_id
Removes the jobs with the specified at_job_id operands that
were previously scheduled by the at utility.
-ttime
Submits the job to be run at the time specified by the time
option-argument, which must have the format as specified by the
touch(1) utility.
OPERANDS
The following operands are supported:
at_job_id
The name reported by a previous invocation of the at utility
at the time the job was scheduled.
timespec
Submit the job to be run at the date and time specified. All of the
timespec operands are interpreted as if they were separated
by space characters and concatenated. The date and time are interpreted
as being in the timezone of the user (as determined by the TZ
variable), unless a timezone name appears as part of time
below.
In the "C" locale, the following describes the three parts
of the time specification string. All of the values from the LC_TIME
categories in the "C" locale are recognized in a case-insensitive manner.
time
The time can be specified as one, two or four digits.
One- and two-digit numbers are taken to be hours, four-digit numbers
to be hours and minutes. The time can alternatively be specified
as two numbers separated by a colon, meaning hour:minute.
An AM/PM indication (one of the values from the am_pm keywords
in the LC_TIME locale category) can follow the time; otherwise,
a 24-hour clock time is understood. A timezone name of GMT, UCT,
or ZULU (case insensitive) can follow to specify that the time is
in Coordinated Universal Time. Other timezones can be specified
using the TZ environment variable. The time
field can also be one of the following tokens in the "C" locale:
midnight
Indicates the time 12:00 am (00:00).
noon
Indicates the time 12:00 pm.
now
Indicate the current day and time. Invoking atnow submits an at-job for potentially immediate execution
(that is, subject only to unspecified scheduling delays).
date
An optional date can be specified as either a month name
(one of the values from the mon or abmon keywords
in the LC_TIME locale category) followed by a day number
(and possibly year number preceded by a comma) or a day of the week
(one of the values from the day or abday keywords
in the LC_TIME locale category). Two special days are recognized
in the "C" locale:
today
Indicates the current day.
tomorrow
Indicates the day following the current day.
If no date is given, today is assumed if
the given time is greater than the current time, and tomorrow
is assumed if it is less. If the given month is less than the current
month (and no year is given), next year is assumed.
increment
The optional increment is a number preceded by a plus
sign (+) and suffixed by one of the following: minutes,
hours, days, weeks, months,
or years. (The singular forms are also accepted.) The keyword
next is equivalent to an increment number of + 1.
For example, the following are equivalent commands:
at 2pm + 1 week
at 2pm next week
USAGE
The format of the at command line shown here is guaranteed
only for the "C" locale. Other locales are not supported for midnight,
noon, now, mon, abmon, day,
abday, today, tomorrow, minutes,
hours, days, weeks, months, years,
and next.
Since the commands run in a separate shell invocation, running in a separate
process group with no controlling terminal, open file descriptors, traps
and priority inherited from the invoking environment are lost.
This sequence, which demonstrates redirecting standard error to a pipe,
is useful in a command procedure (the sequence of output redirection specifications
is significant):
$ at now + 1 hour <<!
diff file1 file2 2>&1 >outfile | mailx mygroup
!
batch
This sequence can be used at a terminal:
$ batch
sort <file >outfile
<EOT>
Example 6 Redirecting Output
This sequence, which demonstrates redirecting
standard error to a pipe, is useful in a command procedure (the sequence
of output redirection specifications is significant):
Use
the Bash shell in Linux to manage foreground and background processes. You can use Bash's job control functions and
signals to give you more flexibility in how you run commands. We show you how.
How to Speed Up a Slow PC
https://imasdk.googleapis.com/js/core/bridge3.401.2_en.html#goog_863166184
All About Processes
Whenever a program is executed in a Linux or Unix-like operating
system, a process is started. "Process" is the name for the internal representation of the executing program in the
computer's memory. There is a process for every active program. In fact, there is a process for nearly everything that
is running on your computer. That includes the components of your
graphical
desktop environment
(GDE) such as
GNOME
or
KDE
,
and system
daemons
that
are launched at start-up.
Why
nearly
everything
that is running? Well, Bash built-ins such as
cd
,
pwd
,
and
alias
do
not need to have a process launched (or "spawned") when they are run. Bash executes these commands within the instance
of the Bash shell that is running in your terminal window. These commands are fast precisely because they don't need to
have a process launched for them to execute. (You can type
help
in
a terminal window to see the list of Bash built-ins.)
Processes can be running in the foreground, in which case they take
over your terminal until they have completed, or they can be run in the background. Processes that run in the background
don't dominate the terminal window and you can continue to work in it. Or at least, they don't dominate the terminal
window if they don't generate screen output.
A Messy Example
We'll start a simple
ping
trace
running
. We're going to
ping
the
How-To Geek domain. This will execute as a foreground process.
ping www.howtogeek.com
We get the expected results, scrolling down the terminal window. We
can't do anything else in the terminal window while
ping
is
running. To terminate the command hit
Ctrl+C
.
Ctrl+C
The visible effect of the
Ctrl+C
is
highlighted in the screenshot.
ping
gives
a short summary and then stops.
Let's repeat that. But this time we'll hit
Ctrl+Z
instead
of
Ctrl+C
.
The task won't be terminated. It will become a background task. We get control of the terminal window returned to us.
ping www.howtogeek.com
Ctrl+Z
The visible effect of hitting
Ctrl+Z
is
highlighted in the screenshot.
This time we are told the process is stopped. Stopped doesn't mean
terminated. It's like a car at a stop sign. We haven't scrapped it and thrown it away. It's still on the road,
stationary, waiting to go. The process is now a background
job
.
The
jobs
command
will
list the jobs
that have been started in the current terminal session. And because jobs are (inevitably) processes,
we can also use the
ps
command
to see them. Let's use both commands and compare their outputs. We'll use the
T
option
(terminal) option to only list the processes that are running in this terminal window. Note that there is no need to use
a hyphen
-
with
the
T
option.
jobs
ps T
The
jobs
command
tells us:
[1]
:
The number in square brackets is the job number. We can use this to refer to the job when we need to control it with
job control commands.
+
:
The plus sign
+
shows
that this is the job that will be acted upon if we use a job control command without a specific job number. It is
called the default job. The default job is always the one most recently added to the list of jobs.
Stopped
:
The process is not running.
ping
www.howtogeek.com
: The command line that launched the process.
The
ps
command
tells us:
PID
:
The process ID of the process. Each process has a unique ID.
TTY
:
The pseudo-teletype (terminal window) that the process was executed from.
STAT
:
The status of the process.
TIME
:
The amount of CPU time consumed by the process.
COMMAND
:
The command that launched the process.
These are common values for the STAT column:
D
:
Uninterruptible sleep. The process is in a waiting state, usually waiting for input or output, and cannot be
interrupted.
I
:
Idle.
R
:
Running.
S
:
Interruptible sleep.
T
:
Stopped by a job control signal.
Z
:
A zombie process. The process has been terminated but hasn't been "cleaned down" by its parent process.
The value in the STAT column can be followed by one of these extra
indicators:
<
:
High-priority task (not nice to other processes).
N
:
Low-priority (nice to other processes).
L
:
process has pages locked into memory (typically used by real-time processes).
s
:
A session leader. A session leader is a process that has launched process groups. A shell is a session leader.
l
:
Multi-thread process.
+
:
A foreground process.
We can see that Bash has a state of
Ss
.
The uppercase "S" tell us the Bash shell is sleeping, and it is interruptible. As soon as we need it, it will respond.
The lowercase "s" tells us that the shell is a session leader.
The ping command has a state of
T
.
This tells us that
ping
has
been stopped by a job control signal. In this example, that was the
Ctrl+Z
we
used to put it into the background.
The
ps
T
command has a state of
R
,
which stands for running. The
+
indicates
that this process is a member of the foreground group. So the
ps
T
command is running in the foreground.
The bg Command
The
bg
command
is used to resume a background process. It can be used with or without a job number. If you use it without a job number
the default job is brought to the foreground. The process still runs in the background. You cannot send any input to it.
If we issue the
bg
command,
we will resume our
ping
command:
bg
The
ping
command
resumes and we see the scrolling output in the terminal window once more. The name of the command that has been
restarted is displayed for you. This is highlighted in the screenshot.
But we have a problem. The task is running in the background and
won't accept input. So how do we stop it?
Ctrl+C
doesn't
do anything. We can see it when we type it but the background task doesn't receive those keystrokes so it keeps pinging
merrily away.
In fact, we're now in a strange blended mode. We can type in the
terminal window but what we type is quickly swept away by the scrolling output from the
ping
command.
Anything we type takes effect in the foregound.
To stop our background task we need to bring it to the foreground
and then stop it.
The fg Command
The
fg
command
will bring a background task into the foreground. Just like the
bg
command,
it can be used with or without a job number. Using it with a job number means it will operate on a specific job. If it
is used without a job number the last command that was sent to the background is used.
If we type
fg
our
ping
command
will be brought to the foreground. The characters we type are mixed up with the output from the
ping
command,
but they are operated on by the shell as if they had been entered on the command line as usual. And in fact, from the
Bash shell's point of view, that is exactly what has happened.
fg
And now that we have the
ping
command
running in the foreground once more, we can use
Ctrl+C
to
kill it.
Ctrl+C
We Need to Send the Right Signals
That wasn't exactly pretty. Evidently running a process in the
background works best when the process doesn't produce output and doesn't require input.
But, messy or not, our example did accomplish:
Putting a process into the background.
Restoring the process to a running state in the background.
Returning the process to the foreground.
Terminating the process.
When you use
Ctrl+C
and
Ctrl+Z
,
you are sending signals to the process. These are
shorthand
ways
of using the
kill
command.
There are
64
different signals
that
kill
can
send. Use
kill
-l
at the command line to list them.
kill
isn't
the only source of these signals. Some of them are raised automatically by other processes within the system
Here are some of the commonly used ones.
SIGHUP
:
Signal 1. Automatically sent to a process when the terminal it is running in is closed.
SIGINT
:
Signal 2. Sent to a process you hit
Ctrl+C
.
The process is interrupted and told to terminate.
SIGQUIT
:
Signal 3. Sent to a process if the user sends a quit signal
Ctrl+D
.
SIGKILL
:
Signal 9. The process is immediately killed and will not attempt to close down cleanly. The process does not go down
gracefully.
SIGTERM
: Signal
15. This is the default signal sent by
kill
.
It is the standard program termination signal.
SIGTSTP
: Signal
20. Sent to a process when you use
Ctrl+Z
.
It stops the process and puts it in the background.
We must use the
kill
command
to issue signals that do not have key combinations assigned to them.
Further Job Control
A process moved into the background by using
Ctrl+Z
is
placed in the stopped state. We have to use the
bg
command
to start it running again. To launch a program as a running background process is simple. Append an ampersand
&
to
the end of the command line.
Although it is best that background processes do not write to the
terminal window, we're going to use examples that do. We need to have something in the screenshots that we can refer to.
This command will start an endless loop as a background process:
while true; do echo "How-To Geek Loop
Process"; sleep 3; done &
We are told the job number and process ID id of the process. Our
job number is 1, and the process id is 1979. We can use these identifiers to control the process.
The output from our endless loop starts to appear in the terminal
window. As before, we can use the command line but any commands we issue are interspersed with the output from the loop
process.
ls
To stop our process we can use
jobs
to
remind ourselves what the job number is, and then use
kill
.
jobs
reports that our process is job number 1. To use that number with
kill
we
must precede it with a percent sign
%
.
jobs
kill %1
kill
sends the
SIGTERM
signal,
signal number 15, to the process and it is terminated. When the Enter key is next pressed, a status of the job is shown.
It lists the process as "terminated." If the process does not respond to the
kill
command
you can take it up a notch. Use
kill
with
SIGKILL
,
signal number 9. Just put the number 9 between the
kill
command
the job number.
kill 9 %1
Things We've Covered
Ctrl+C
:
Sends
SIGINT
,
signal 2, to the process -- if it is accepting input -- and tells it to terminate.
Ctrl+D
: Sends
SISQUIT
,
signal 3, to the process -- if it is accepting input -- and tells it to quit.
Ctrl+Z
: Sends
SIGSTP
,
signal 20, to the process and tells it to stop (suspend) and become a background process.
jobs
:
Lists the background jobs and shows their job number.
bg
job_number
:
Restarts a background process. If you don't provide a job number the last process that was turned into a background
task is used.
fg
job_number
:
brings a background process into the foreground and restarts it. If you don't provide a job number the last process
that was turned into a background task is used.
commandline
&
:
Adding an ampersand
&
to
the end of a command line executes that command as a background task, that is running.
kill %
job_number
:
Sends
SIGTERM
,
signal 15, to the process to terminate it.
kill 9
%
job_number
:
Sends
SIGKILL
,
signal 9, to the process and terminates it abruptly.
In this quick tutorial, I want to look at
the
jobs
command
and a few of the ways that we can manipulate the jobs running on our systems. In short, controlling jobs lets you
suspend and resume processes started in your Linux shell.
Jobs
The
jobs
command
will list all jobs on the system; active, stopped, or otherwise. Before I explore the command and output, I'll create
a job on my system.
I will use the
sleep
job
as it won't change my system in any meaningful way.
First, I issued the
sleep
command,
and then I received the
Job number
[1].
I
then immediately stopped the job by using
Ctl+Z
.
Next, I run the
jobs
command
to view the newly created job:
[tcarrigan@rhel ~]$ jobs
[1]+ Stopped sleep 500
You can see that I have a single stopped job
identified by the job number
[1]
.
Other options to know for this command
include:
-l - list PIDs in addition to default info
-n - list only processes that have changed since the last notification
-p - list PIDs only
-r - show only running jobs
-s - show only stopped jobs
Background
Next, I'll resume the
sleep
job
in the background. To do this, I use the
bg
command.
Now, the
bg
command
has a pretty simple syntax, as seen here:
bg [JOB_SPEC]
Where JOB_SPEC can be any of the following:
%n - where
n
is the job number
%abc - refers to a job started by a command beginning with
abc
%?abc - refers to a job started by a command containing
abc
%- - specifies the previous job
NOTE
:
bg
and
fg
operate
on the current job if no JOB_SPEC is provided.
I can move this job to the background by
using the job number
[1]
.
[tcarrigan@rhel ~]$ bg %1
[1]+ sleep 500 &
You can see now that I have a single running
job in the background.
[tcarrigan@rhel ~]$ jobs
[1]+ Running sleep 500 &
Foreground
Now, let's look at how to move a background
job into the foreground. To do this, I use the
fg
command.
The command syntax is the same for the foreground command as with the background command.
fg [JOB_SPEC]
Refer to the above bullets for details on
JOB_SPEC.
I have started a new
sleep
in
the background:
[tcarrigan@rhel ~]$ sleep 500 &
[2] 5599
Now, I'll move it to the foreground by using
the following command:
[tcarrigan@rhel ~]$ fg %2
sleep 500
The
fg
command
has now brought my system back into a sleep state.
The end
While I realize that the jobs presented here
were trivial, these concepts can be applied to more than just the
sleep
command.
If you run into a situation that requires it, you now have the knowledge to move running or stopped jobs from the
foreground to background and back again.
The
Task Spooler
project allows you
to queue up tasks from the shell for batch execution. Task Spooler is simple to use and requires no
configuration. You can view and edit queued commands, and you can view the output of queued commands at any
time.
Task Spooler has some similarities with other delayed and batch execution projects, such as "
at
."
While both Task Spooler and at handle multiple queues and allow the execution of commands at a later point, the
at project handles output from commands by emailing the results to the user who queued the command, while Task
Spooler allows you to get at the results from the command line instead. Another major difference is that Task
Spooler is not aimed at executing commands at a specific time, but rather at simply adding to and executing
commands from queues.
The main repositories for Fedora, openSUSE, and Ubuntu do not contain packages for Task Spooler. There are
packages for some versions of Debian, Ubuntu, and openSUSE 10.x available along with the source code on the
project's homepage. In this article I'll use a 64-bit Fedora 9 machine and install version 0.6 of Task Spooler
from source. Task Spooler does not use autotools to build, so to install it, simply run
make; sudo make
install
. This will install the main Task Spooler command
ts
and its manual page into /usr/local.
A simple interaction with Task Spooler is shown below. First I add a new job to the queue and check the
status. As the command is a very simple one, it is likely to have been executed immediately. Executing ts by
itself with no arguments shows the executing queue, including tasks that have completed. I then use
ts -c
to get at the stdout of the executed command. The
-c
option uses
cat
to display the
output file for a task. Using
ts -i
shows you information about the job. To clear finished jobs
from the queue, use the
ts -C
command, not shown in the example.
$ ts echo "hello world"
6
$ ts
ID State Output E-Level Times(r/u/s) Command [run=0/1]
6 finished /tmp/ts-out.QoKfo9 0 0.00/0.00/0.00 echo hello world
The
-t
option operates like
tail -f
, showing you the last few lines of output and
continuing to show you any new output from the task. If you would like to be notified when a task has
completed, you can use the
-m
option to have the results mailed to you, or you can queue another
command to be executed that just performs the notification. For example, I might add a tar command and want to
know when it has completed. The below commands will create a tarball and use
libnotify
commands to create an
inobtrusive popup window on my desktop when the tarball creation is complete. The popup will be dismissed
automatically after a timeout.
$ ts tar czvf /tmp/mytarball.tar.gz liberror-2.1.80011
11
$ ts notify-send "tarball creation" "the long running tar creation process is complete."
12
$ ts
ID State Output E-Level Times(r/u/s) Command [run=0/1]
11 finished /tmp/ts-out.O6epsS 0 4.64/4.31/0.29 tar czvf /tmp/mytarball.tar.gz liberror-2.1.80011
12 finished /tmp/ts-out.4KbPSE 0 0.05/0.00/0.02 notify-send tarball creation the long... is complete.
Notice in the output above, toward the far right of the header information, the
run=0/1
line.
This tells you that Task Spooler is executing nothing, and can possibly execute one task. Task spooler allows
you to execute multiple tasks at once from your task queue to take advantage of multicore CPUs. The
-S
option allows you to set how many tasks can be executed in parallel from the queue, as shown below.
$ ts -S 2
$ ts
ID State Output E-Level Times(r/u/s) Command [run=0/2]
6 finished /tmp/ts-out.QoKfo9 0 0.00/0.00/0.00 echo hello world
If you have two tasks that you want to execute with Task Spooler but one depends on the other having already
been executed (and perhaps that the previous job has succeeded too) you can handle this by having one task wait
for the other to complete before executing. This becomes more important on a quad core machine when you might
have told Task Spooler that it can execute three tasks in parallel. The commands shown below create an explicit
dependency, making sure that the second command is executed only if the first has completed successfully, even
when the queue allows multiple tasks to be executed. The first command is queued normally using
ts
.
I use a subshell to execute the commands by having
ts
explicitly start a new bash shell. The
second command uses the
-d
option, which tells
ts
to execute the command only after
the successful completion of the last command that was appended to the queue. When I first inspect the queue I
can see that the first command (28) is executing. The second command is queued but has not been added to the
list of executing tasks because Task Spooler is aware that it cannot execute until task 28 is complete. The
second time I view the queue, both tasks have completed.
$ ts bash -c "sleep 10; echo hi"
28
$ ts -d echo there
29
$ ts
ID State Output E-Level Times(r/u/s) Command [run=1/2]
28 running /tmp/ts-out.hKqDva bash -c sleep 10; echo hi
29 queued (file) && echo there
$ ts
ID State Output E-Level Times(r/u/s) Command [run=0/2]
28 finished /tmp/ts-out.hKqDva 0 10.01/0.00/0.01 bash -c sleep 10; echo hi
29 finished /tmp/ts-out.VDtVp7 0 0.00/0.00/0.00 && echo there
$ cat /tmp/ts-out.hKqDva
hi
$ cat /tmp/ts-out.VDtVp7
there
You can also explicitly set dependencies on other tasks as shown below. Because the
ts
command
prints the ID of a new task to the console, the first command puts that ID into a shell variable for use in the
second command. The second command passes the task ID of the first task to ts, telling it to wait for the task
with that ID to complete before returning. Because this is joined with the command we wish to execute with the
&&
operation, the second command will execute only if the first one has finished
and
succeeded.
The first time we view the queue you can see that both tasks are running. The first task will be in the
sleep
command that we used explicitly to slow down its execution. The second command will be
executing
ts
, which will be waiting for the first task to complete. One downside of tracking
dependencies this way is that the second command is added to the running queue even though it cannot do
anything until the first task is complete.
$ FIRST_TASKID=`ts bash -c "sleep 10; echo hi"`
$ ts sh -c "ts -w $FIRST_TASKID && echo there"
25
$ ts
ID State Output E-Level Times(r/u/s) Command [run=2/2]
24 running /tmp/ts-out.La9Gmz bash -c sleep 10; echo hi
25 running /tmp/ts-out.Zr2n5u sh -c ts -w 24 && echo there
$ ts
ID State Output E-Level Times(r/u/s) Command [run=0/2]
24 finished /tmp/ts-out.La9Gmz 0 10.01/0.00/0.00 bash -c sleep 10; echo hi
25 finished /tmp/ts-out.Zr2n5u 0 9.47/0.00/0.01 sh -c ts -w 24 && echo there
$ ts -c 24
hi
$ ts -c 25
there
Wrap-up
Task Spooler allows you to convert a shell command to a queued command by simply prepending
ts
to the command line. One major advantage of using ts over something like the
at
command is that
you can effectively run
tail -f
on the output of a running task and also get at the output of
completed tasks from the command line. The utility's ability to execute multiple tasks in parallel is very
handy if you are running on a multicore CPU. Because you can explicitly wait for a task, you can set up very
complex interactions where you might have several tasks running at once and have jobs that depend on multiple
other tasks to complete successfully before they can execute.
Because you can make explicitly dependant tasks take up slots in the actively running task queue, you can
effectively delay the execution of the queue until a time of your choosing. For example, if you queue up a task
that waits for a specific time before returning successfully and have a small group of other tasks that are
dependent on this first task to complete, then no tasks in the queue will execute until the first task
completes.
Run the commands listed in the ' my-at-jobs.txt ' file at 1:35 AM. All output from the job will be mailed to the
user running the task. When this command has been successfully entered you should receive a prompt similar to the example below:
commands will be executed using /bin/sh
job 1 at Wed Dec 24 00:22:00 2014
at -l
This command will list each of the scheduled jobs in a format like the following:
1 Wed Dec 24 00:22:00 2003
...this is the same as running the command atq .
at -r 1
Deletes job 1 . This command is the same as running the command atrm 1 .
atrm 23
Deletes job 23. This command is the same as running the command at -r 23 .
But processing each line until the command is finished then moving to the next one is very
time consuming, I want to process for instance 20 lines at once then when they're finished
another 20 lines are processed.
I thought of wget LINK1 >/dev/null 2>&1 & to send the command
to the background and carry on, but there are 4000 lines here this means I will have
performance issues, not to mention being limited in how many processes I should start at the
same time so this is not a good idea.
One solution that I'm thinking of right now is checking whether one of the commands is
still running or not, for instance after 20 lines I can add this loop:
Of course in this case I will need to append & to the end of the line! But I'm feeling
this is not the right way to do it.
So how do I actually group each 20 lines together and wait for them to finish before going
to the next 20 lines, this script is dynamically generated so I can do whatever math I want
on it while it's being generated, but it DOES NOT have to use wget, it was just an example so
any solution that is wget specific is not gonna do me any good.
wait is the right answer here, but your while [ $(ps would be much
better written while pkill -0 $KEYWORD – using proctools that is, for legitimate reasons to
check if a process with a specific name is still running. – kojiro
Oct 23 '13 at 13:46
I think this question should be re-opened. The "possible duplicate" QA is all about running a
finite number of programs in parallel. Like 2-3 commands. This question, however, is
focused on running commands in e.g. a loop. (see "but there are 4000 lines"). –
VasyaNovikov
Jan 11 at 19:01
@VasyaNovikov Have you readall the answers to both this question and the
duplicate? Every single answer to this question here, can also be found in the answers to the
duplicate question. That is precisely the definition of a duplicate question. It makes
absolutely no difference whether or not you are running the commands in a loop. –
robinCTS
Jan 11 at 23:08
@robinCTS there are intersections, but questions themselves are different. Also, 6 of the
most popular answers on the linked QA deal with 2 processes only. – VasyaNovikov
Jan 12 at 4:09
I recommend reopening this question because its answer is clearer, cleaner, better, and much
more highly upvoted than the answer at the linked question, though it is three years more
recent. – Dan Nissenbaum
Apr 20 at 15:35
For the above example, 4 processes process1 .. process4 would be
started in the background, and the shell would wait until those are completed before starting
the next set ..
Wait until the child process specified by each process ID pid or job specification
jobspec exits and return the exit status of the last command waited for. If a job spec is
given, all processes in the job are waited for. If no arguments are given, all currently
active child processes are waited for, and the return status is zero. If neither jobspec
nor pid specifies an active child process of the shell, the return status is 127.
So basically i=0; waitevery=4; for link in "${links[@]}"; do wget "$link" & ((
i++%waitevery==0 )) && wait; done >/dev/null 2>&1 – kojiro
Oct 23 '13 at 13:48
Unless you're sure that each process will finish at the exact same time, this is a bad idea.
You need to start up new jobs to keep the current total jobs at a certain cap .... parallel is the answer.
– rsaw
Jul 18 '14 at 17:26
I've tried this but it seems that variable assignments done in one block are not available in
the next block. Is this because they are separate processes? Is there a way to communicate
the variables back to the main process? – Bobby
Apr 27 '17 at 7:55
This is better than using wait , since it takes care of starting new jobs as old
ones complete, instead of waiting for an entire batch to finish before starting the next.
– chepner
Oct 23 '13 at 14:35
For example, if you have the list of links in a file, you can do cat list_of_links.txt
| parallel -j 4 wget {} which will keep four wget s running at a time.
– Mr.
Llama
Aug 13 '15 at 19:30
I am using xargs to call a python script to process about 30 million small
files. I hope to use xargs to parallelize the process. The command I am using
is:
Basically, Convert.py will read in a small json file (4kb), do some
processing and write to another 4kb file. I am running on a server with 40 CPU cores. And no
other CPU-intense process is running on this server.
By monitoring htop (btw, is there any other good way to monitor the CPU performance?), I
find that -P 40 is not as fast as expected. Sometimes all cores will freeze and
decrease almost to zero for 3-4 seconds, then will recover to 60-70%. Then I try to decrease
the number of parallel processes to -P 20-30 , but it's still not very fast. The
ideal behavior should be linear speed-up. Any suggestions for the parallel usage of xargs
?
You are most likely hit by I/O: The system cannot read the files fast enough. Try starting
more than 40: This way it will be fine if some of the processes have to wait for I/O. –
Ole Tange
Apr 19 '15 at 8:45
I second @OleTange. That is the expected behavior if you run as many processes as you have
cores and your tasks are IO bound. First the cores will wait on IO for their task (sleep),
then they will process, and then repeat. If you add more processes, then the additional
processes that currently aren't running on a physical core will have kicked off parallel IO
operations, which will, when finished, eliminate or at least reduce the sleep periods on your
cores. – PSkocik
Apr 19 '15 at 11:41
1- Do you have hyperthreading enabled? 2- in what you have up there, log.txt is actually
overwritten with each call to convert.py ... not sure if this is the intended behavior or
not. – Bichoy
Apr 20 '15 at 3:32
I'd be willing to bet that your problem is python . You didn't say what kind of processing is
being done on each file, but assuming you are just doing in-memory processing of the data,
the running time will be dominated by starting up 30 million python virtual machines
(interpreters).
If you can restructure your python program to take a list of files, instead of just one,
you will get a huge improvement in performance. You can then still use xargs to further
improve performance. For example, 40 processes, each processing 1000 files:
This isn't to say that python is a bad/slow language; it's just not optimized for startup
time. You'll see this with any virtual machine-based or interpreted language. Java, for
example, would be even worse. If your program was written in C, there would still be a cost
of starting a separate operating system process to handle each file, but it would be much
less.
From there you can fiddle with -P to see if you can squeeze out a bit more
speed, perhaps by increasing the number of processes to take advantage of idle processors
while data is being read/written.
What is the constraint on each job? If it's I/O you can probably get away with
multiple jobs per CPU core up till you hit the limit of I/O, but if it's CPU intensive, its
going to be worse than pointless running more jobs concurrently than you have CPU cores.
My understanding of these things is that GNU Parallel would give you better control over
the queue of jobs etc.
As others said, check whether you're I/O-bound. Also, xargs' man page suggests using
-n with -P , you don't mention the number of
Convert.py processes you see running in parallel.
As a suggestion, if you're I/O-bound, you might try using an SSD block device, or try
doing the processing in a tmpfs (of course, in this case you should check for enough memory,
avoiding swap due to tmpfs pressure (I think), and the overhead of copying the data to it in
the first place).
I want the ability to schedule commands to be run in a FIFO queue. I DON'T want them to be
run at a specified time in the future as would be the case with the "at" command. I want them
to start running now, but not simultaneously. The next scheduled command in the queue should
be run only after the first command finishes executing. Alternatively, it would be nice if I
could specify a maximum number of commands from the queue that could be run simultaneously;
for example if the maximum number of simultaneous commands is 2, then only at most 2 commands
scheduled in the queue would be taken from the queue in a FIFO manner to be executed, the
next command in the remaining queue being started only when one of the currently 2 running
commands finishes.
I've heard task-spooler could do something like this, but this package doesn't appear to
be well supported/tested and is not in the Ubuntu standard repositories (Ubuntu being what
I'm using). If that's the best alternative then let me know and I'll use task-spooler,
otherwise, I'm interested to find out what's the best, easiest, most tested, bug-free,
canonical way to do such a thing with bash.
UPDATE:
Simple solutions like ; or && from bash do not work. I need to schedule these
commands from an external program, when an event occurs. I just don't want to have hundreds
of instances of my command running simultaneously, hence the need for a queue. There's an
external program that will trigger events where I can run my own commands. I want to handle
ALL triggered events, I don't want to miss any event, but I also don't want my system to
crash, so that's why I want a queue to handle my commands triggered from the external
program.
That will list the directory. Only after ls has run it will run touch
test which will create a file named test. And only after that has finished it will run
the next command. (In this case another ls which will show the old contents and
the newly created file).
Similar commands are || and && .
; will always run the next command.
&& will only run the next command it the first returned success.
Example: rm -rf *.mp3 && echo "Success! All MP3s deleted!"
|| will only run the next command if the first command returned a failure
(non-zero) return value. Example: rm -rf *.mp3 || echo "Error! Some files could not be
deleted! Check permissions!"
If you want to run a command in the background, append an ampersand ( &
).
Example: make bzimage & mp3blaster sound.mp3 make mytestsoftware ; ls ; firefox ; make clean
Will run two commands int he background (in this case a kernel build which will take some
time and a program to play some music). And in the foregrounds it runs another compile job
and, once that is finished ls, firefox and a make clean (all sequentially)
For more details, see man bash
[Edit after comment]
in pseudo code, something like this?
Program run_queue:
While(true)
{
Wait_for_a_signal();
While( queue not empty )
{
run next command from the queue.
remove this command from the queue.
// If commands where added to the queue during execution then
// the queue is not empty, keep processing them all.
}
// Queue is now empty, returning to wait_for_a_signal
}
//
// Wait forever on commands and add them to a queue
// Signal run_quueu when something gets added.
//
program add_to_queue()
{
While(true)
{
Wait_for_event();
Append command to queue
signal run_queue
}
}
The easiest way would be to simply run the commands sequentially:
cmd1; cmd2; cmd3; cmdN
If you want the next command to run only if the previous command exited
successfully, use && :
cmd1 && cmd2 && cmd3 && cmdN
That is the only bash native way I know of doing what you want. If you need job control
(setting a number of parallel jobs etc), you could try installing a queue manager such as
TORQUE but that
seems like overkill if all you want to do is launch jobs sequentially.
You are looking for at 's twin brother: batch . It uses the same
daemon but instead of scheduling a specific time, the jobs are queued and will be run
whenever the system load average is low.
Apart from dedicated queuing systems (like the Sun Grid Engine ) which you can also
use locally on one machine and which offer dozens of possibilities, you can use something
like
command1 && command2 && command3
which is the other extreme -- a very simple approach. The latter neither does provide
multiple simultaneous processes nor gradually filling of the "queue".
task spooler is a Unix batch system where the tasks spooled run one after the other. The
amount of jobs to run at once can be set at any time. Each user in each system has his own
job queue. The tasks are run in the correct context (that of enqueue) from any shell/process,
and its output/results can be easily watched. It is very useful when you know that your
commands depend on a lot of RAM, a lot of disk use, give a lot of output, or for whatever
reason it's better not to run them all at the same time, while you want to keep your
resources busy for maximum benfit. Its interface allows using it easily in scripts.
For your first contact, you can read an article at linux.com , which I like
as overview, guide and
examples(original url) .
On more advanced usage, don't neglect the TRICKS file in the package.
Features
I wrote Task Spooler because I didn't have any comfortable way of running batch
jobs in my linux computer. I wanted to:
Queue jobs from different terminals.
Use it locally in my machine (not as in network queues).
Have a good way of seeing the output of the processes (tail, errorlevels, ...).
Easy use: almost no configuration.
Easy to use in scripts.
At the end, after some time using and developing ts , it can do something
more:
It works in most systems I use and some others, like GNU/Linux, Darwin, Cygwin, and
FreeBSD.
No configuration at all for a simple queue.
Good integration with renice, kill, etc. (through `ts -p` and process
groups).
Have any amount of queues identified by name, writting a simple wrapper script for each
(I use ts2, tsio, tsprint, etc).
Control how many jobs may run at once in any queue (taking profit of multicores).
It never removes the result files, so they can be reached even after we've lost the
ts task list.
I created a GoogleGroup for the program. You look for the archive and the join methods in
the taskspooler google
group page .
Alessandro Öhler once maintained a mailing list for discussing newer functionalities
and interchanging use experiences. I think this doesn't work anymore , but you can
look at the old archive
or even try to subscribe .
How
it works
The queue is maintained by a server process. This server process is started if it isn't
there already. The communication goes through a unix socket usually in /tmp/ .
When the user requests a job (using a ts client), the client waits for the server message to
know when it can start. When the server allows starting , this client usually forks, and runs
the command with the proper environment, because the client runs run the job and not
the server, like in 'at' or 'cron'. So, the ulimits, environment, pwd,. apply.
When the job finishes, the client notifies the server. At this time, the server may notify
any waiting client, and stores the output and the errorlevel of the finished job.
Moreover the client can take advantage of many information from the server: when a job
finishes, where does the job output go to, etc.
Download
Download the latest version (GPLv2+ licensed): ts-1.0.tar.gz - v1.0
(2016-10-19) - Changelog
Look at the version repository if you are
interested in its development.
Андрей
Пантюхин (Andrew Pantyukhin) maintains the
BSD port .
Eric Keller wrote a nodejs web server showing the status of the task spooler queue (
github project
).
Manual
Look at its manpage (v0.6.1). Here you also
have a copy of the help for the same version:
usage: ./ts [action] [-ngfmd] [-L <lab>] [cmd...]
Env vars:
TS_SOCKET the path to the unix socket used by the ts command.
TS_MAILTO where to mail the result (on -m). Local user by default.
TS_MAXFINISHED maximum finished jobs in the queue.
TS_ONFINISH binary called on job end (passes jobid, error, outfile, command).
TS_ENV command called on enqueue. Its output determines the job information.
TS_SAVELIST filename which will store the list, if the server dies.
TS_SLOTS amount of jobs which can run at once, read on server start.
Actions:
-K kill the task spooler server
-C clear the list of finished jobs
-l show the job list (default action)
-S [num] set the number of max simultanious jobs of the server.
-t [id] tail -f the output of the job. Last run if not specified.
-c [id] cat the output of the job. Last run if not specified.
-p [id] show the pid of the job. Last run if not specified.
-o [id] show the output file. Of last job run, if not specified.
-i [id] show job information. Of last job run, if not specified.
-s [id] show the job state. Of the last added, if not specified.
-r [id] remove a job. The last added, if not specified.
-w [id] wait for a job. The last added, if not specified.
-u [id] put that job first. The last added, if not specified.
-U <id-id> swap two jobs in the queue.
-h show this help
-V show the program version
Options adding jobs:
-n don't store the output of the command.
-g gzip the stored output (if not -n).
-f don't fork into background.
-m send the output by e-mail (uses sendmail).
-d the job will be run only if the job before ends well
-L <lab> name this task with a label, to be distinguished on listing.
Thanks
To Raúl Salinas, for his inspiring ideas
To Alessandro Öhler, the first non-acquaintance user, who proposed and created the
mailing list.
Андрею
Пантюхину, who created the BSD
port .
To the useful, although sometimes uncomfortable, UNIX interface.
To Alexander V. Inyukhin, for the debian packages.
To Pascal Bleser, for the SuSE packages.
To Sergio Ballestrero, who sent code and motivated the development of a multislot version
of ts.
To GNU, an ugly but working and helpful ol' UNIX implementation.
After some research I found out that Solaris logs crontab messages
to /var/cron/log (which is actually pretty predictable logging for Solaris).
The log entries for the updating of the sunfreeware mirror looked something
like this:
> CMD: /usr/local/bin/update-sunfreeware
> ftp 21022 c Fri Jul 30 06:00:00 2004
! bad user (ftp) Fri Jul 30 06:00:00 2004
So we are talking about a bad user here. Well actually the user is all
there and running in /etc/passwd and /etc/shadow. But hey wait, the
FTP account is locked. Well I found that normal behaviour, but guess
what, crontab expects a password there, else the account is not good
and is a bad user!
The Last but not LeastTechnology is dominated by
two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt.
Ph.D
FAIR USE NOTICEThis site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
You can use PayPal to to buy a cup of coffee for authors
of this site
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society.We do not warrant the correctness
of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be
tracked by Google please disable Javascript for this site. This site is perfectly usable without
Javascript.