Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

Enterprise Job schedulers

(aka Workload Automation)

News	Enterprise Unix System Administration	Recommended Links	Typical set of features	Open Source Implementations	Price-based classification	Critique of Gartner Job Schedulers review
Open Source Job Scheduler	GNU Batch	Quartz	Oracle Scheduler	Sun Grid Engine	OpenPBS	Maui Scheduler
HPC schedulers	Tivoli Workload Scheduler	Control-M from BMC Software	OpsWise Automation Center	UC4 Automation Engine	Tidal Enterprise Scheduler	APX / PCC
Cron and Crontab commands	Unix System Monitoring	Unix Configuration Management Tools	Perl Admin Tools and Scripts		Humor	Etc

Introduction
Fire and forget schedulers. Extending cron
Typical set of features
Open Source Implementations
Major proprietary players
Price-based classification of proprietary offerings
Critique of Gartner Job Schedulers review

Introduction

Enterprise job scheduling (or as it now often called Workload automation ) is the execution of a series of computer jobs/scripts with additional feedback about the results and ability to resubmit failed jobs, launch jobs in selected sequence depending of previous job ended, etc.

Enterprise job schedulers typically provide a graphical user interface and a central console for managing job streams and monitoring of job execution on remote servers including centralized viewing of remote job logs. The latter is just a hidden transfer of files from remote server to the "mothership" achievable via SSH, but still is the functionality that is valued greatly in enterprise environment.

If you need to save money, "grid engines" such as SGE (or Torque, Pbs, openlava ) can be used as enterprise job schedulers with small additional scripting of linking them to cron on the headnode. All other functions are already implemented on a very high quality level. Actually Rocks Cluster Distribution that comes with SGE can be used as "mini-datacenter" outside cluster applications. It comes with SGE. But this is quite another topic (see Hybrid Cloud ).

Increasingly job schedulers are required to be able to perform the event-based job invocation, for example launching job on appearance of disappearance of a particular file in a particular filesystem, or change of characteristics of this filesystem (i.e. low free space). The latter capability is close to the litmus test for a modern enterprise scheduler. The term "workload automation" is close to the term "scheduling" and has several distinct meanings:

A fancy way to say "a job scheduler with event based job launch capabilities".
Various, but rarely successful, attempts to integrate functions of monitoring systems within scheduler framework, blurring the distinction between job schedulers and Monitoring systems . It is true that functionality of enterprise scheduler and monitoring systems substantially intersect. Periodic running of probes by monitoring system can be considered as a special case of scheduling -- lightweight scheduling.
Integrated implementation of the tandem of Job scheduler and Monitoring system like ITM 6.x and TWS in Tivoli when monitoring software provides for event console, flow of events for event based scheduling as well as event correlation capabilities.

We will concentrate on Unix-based schedulers (which means that the server part of scheduler is on Unix/Linux; the clients can be deployed on any platform) and will touch the following topics:

Typical set of features
Open Source Implementations
Price-based classification of enterprise schedulers
Critique of Gartner Job Schedulers review

For a brief overview of Windows job scheduler market see Microsoft report Job Scheduling on Windows

For large enterprises scheduler should work across different operating system platforms and application environments such as Oracle, SAP/R3, Websphere, etc. Enterprise class schedulers also should be capable of launching jobs on event(s) occurrence, not only based on calendaring functions. For example they should be capable to launch job on file creation, deletion, etc. In a sense functionality of enterprise-level scheduler intersects greatly with functionality of monitoring system and most enterprise-class monitoring systems have rudimentary scheduling features. Two examples are Openview and old ("classic") Tivoli.

Architecturally schedulers can be subdivided into two subgroups:

Peer to peer, when each instance of the server has full capabilities and can function at least till the end of scheduling day (typically 24 houses, but nor necessary starting at midnight) autonomously if "mothership" instance dies. This is preferable solution as it increases reliability.
Client-server when client cannot function without connection to the mothership node at all and "mothership node" should be put on cluster or have a fail-over server.

Enterprise job schedulers are rather expensive class of software. According to IDC report (cited in CA Named Global Market Share Leader in Job Scheduling Software) the market for job scheduling software licenses in 2009 was $1.5 billion. Over the past 25 years, since the first multiplatform job schedulers appeared on the market, the market became pretty mature and solutions compete not so much on features, but on price and name recognition (the level of support for all major players deteriorated to the extent that it can't be considered the main consideration in the selection of enterprise scheduler)

Over the past 25 years, since the first multiplatform job schedulers appeared on the market, the market became pretty mature and solutions compete not so much on features, but on price and name recognition (the level of support for all major players deteriorated to the extent that it can't be considered the main consideration in the selection of enterprise scheduler)

We can distinguish two different but overlapping types of scheduling:

System-level job scheduling generally provide OS level alerts (like "filesystem is 80% full"; "CPU usage exceeds 200%", "swap file is almost full", etc) and OS-level maintenance type of services and does not include detailed knowledge of application-specific details. Example is clearing of a filesystem or extending of the filesystem in case filesystem became almost full.
Applications level scheduling. Most enterprise class applications require some batch jobs to be performed periodically and often contain an internal job scheduler. Examples include SAP, PeopleSoft, and Oracle. An external enterprise job scheduler can provide more uniform approach to application automation than an internal scheduler (although Oracle scheduler, paradoxically is as close to enterprise class schedulers as one can get and is available to database owners for free). It also allow integration of other enterprise application (for example backup) with the particular application.

Fire and forget schedulers. Extending cron

One reason for proliferation of various job schedulers is that Unix cron daemon and at command have a pretty limited functionality. In this sense this is a black mark for Linux kernel development team.

Corn operates on the principle of fire & forget. The same is true for Window scheduler. In the large enterprise environment the main weakness of these tools is lack of centralization (that actually much less a problem that many enterprise vendors would like to think of due to availability of ssh ;-) as well as their limited ability to monitor jobs success/failures and to provide dependency-based scheduling. The latter is the situation, when the execution of next job in a job stream depends on the success or failure of one of several previous jobs, dependent on each other (often in mainframe tradition such jobs are called steps -- the term which came from JCL -- the batch language for OS/360).

Cron can be extended to satisfy richer scheduling requirements, including controlling multiple servers, via SGE and that path is very attractive for small and medium enterprises which do not have resources to acquire and maintain complex commercial scheduler. In this case each cron job is sge submission script, which can have internal dependency checking logic. Also each additional piece of software increases overhead and waist resources. So if one can do it cleanly, it is better to enhance the use existing classic Unix tools. A simple jobstream status-based dependency mechanism is easy to implement in shell. In this case the start of each job can depend of existence of a global and unique for each jobstream "job stream status-file" that was created (or not, in case of failure) during some previous step. This is not a rocket science.

The classic PR why enterprise schedulers are indispensible is based on "backup job dependency example" when failure of a particular job needs to accounted in all subsequent steps (cleanup should be postponed, etc). Using suggested above job stream based status file mechanism (via memory mapped filesystem) this functionality can be more or less easily implemented in both cron and via at command.

Remote scheduling is really important but with the universal adoption of SSH it can be done on remote computers as easily as on local. For example, if a backup job fails it did not create the success file, then each job dependant on it should checks the existence of the file and exit if the file is not found. In more general way one can implement script "envelope" -- a special script that send messages at the beginning and at the end of each step to the monitoring system of your choice. Using at commands allow you to cancel or move to a different time all at commands dependent on successful completion of the backup.

Still some centralization of job control across multiple servers is very beneficial. There should be some messaging (event) mechanism that create the console were you can see "the whole picture". Also a central console is badly needed to view and correlate the events that happened on various job streams if there are many of them. The possibility to cancel/reschedule jobs from central server is also very beneficial -- but it can be implemented via ssh.

Another typical complain about classic cron is that its calendaring function is not sophisticated enough to understand national holidays, furlough, closure, maintenance periods, planned shut downs, etc. Avoiding running workload on holidays or specific days (e.g. an inventory day) is relatively easy to implement. One way is to use the concept of "scheduling day" with particular start time and length -- typically 24 hours (in some cases it is beneficial to limit length of scheduling day to just one shift or extend it to 48 hours period).

This "start of scheduling day" script generates a sequence of at commands that need to be run during the particular day. So schedule is recreated each scheduling day from the central location.

That allows to centralize all calendaring checks at the "beginning of the scheduling day" script which can be run on the central location and propagate sequences of at command via ssh on all servers. After this one time "feed" servers become autonomous and execute the generated sequence of jobs, providing built-in redundancy mechanism based on independence of local cron/at daemons from each other: failure of the central server does not affect execution of jobs of satellite servers until the current scheduling day ends.

For backups is also important to spread the load across the timeslot available (usually the second half of the second shift and night shift, say till 5 AM). This requires complex dependencies and can greatly benefit from the integration with network monitoring system as you need to understand how bandwidth is used. Badly scheduled backup oven overflow to the beginning of the next day and create network bottlenecks. This is the most typical problem for large enterprises were backup is usually very poorly organized and understood and nobody cares if redundant data are backups. If some small sites use remote backup additional software is often needed (typically Aspera or similar UDP-based transfere packages are used) to speed up the transmitting over WAN (with its increased latency that slow down TCP/IP)

Again, I would like to stress that for small companies a modest multi-server central scheduling functionality it is not difficult to achieve using classic Unix tools and any suitable HPC scheduler such as Son of Grid Engine, Torque, Slurm, etc

Social issues in enterprise job scheduler implementation

Job scheduler is a tool that should be a part of each respectable Unix distribution and should be maintained and operated by Unix group. This is the most effient way to have sophistcated acheduling as system administrators understand both OS and enterprise architecture on the level rarely found on other IT groups. But for large organizations there are social issues which make this scenario (that is often successfully used in small startups) impossible. Social environment in large enterprises pushes scheduling into domain of specific monitoring and scduling group with a separate management. This is an unfortunate trend, if judges on its technical merits alone.

But in large organizations traditionally scheduling of jobs is the domain of so called "operations manager", the person who often has "mainframe" roots and who manages data center operators which provide 24x7 coverage. With operators typically working in three shifts (sometimes in 2 shifts and the first shift is covered by sysadmin group). Often this staff, especially on night shifts, is not very technical and need crutches to modify scheduling or resolve even minor problem (typically backup problems). If we assume that you can hire five less technically savvy operators with just basic Windows skills and networking (on the level of A+ certification) and save $20K a year for each in salary, $60K a year maintenance payment (which among other things provide 24x7 coverage from the vendor) for a complex "fool proof" enterprise scheduler leaves you with some money for the capitalization costs. Assuming five year amortization period to break even, the initial license can cost as much as $200K (5*20*5-60*5=200). The key problem here is how not to fall for a flashy brand name and get a lemon.

It should be stressed that starting from approximately 2008 growth of the open source tools provided the opportunity to implement a usable enterprise scheduler and monitoring system using much less money and pay just for adaptation and maintenance without (or with minimal) initial capitalization costs. One way to get enterprise scheduling capabilities is to rely on monitoring system (say Openview) and extend it to full enterprise scheduler capabilities with open source tools. The other way is to use grid engine such as SGE and add to it scheduling component (which actually can be custom written or adapted from some an open source package). SGE has most features of expensive enterprise schedule with the exception of scheduling capabilities (it is essentially a very powerful, extremely flexible batch processor that on Linux is available for free).

While many operators have rudimentary Linux skills, learning enterprise flavors of Unix is generally anathema to a typical datacenter operator. At the same time typical datacenter represents a complex mixture of networking equipment (often from several vendors such as Nortel and Cisco) and Windows, Linux and "legacy" Unix servers. The latter often belong to several (from two to five) distinct Unix flavors, which are often the result of previous acquisitions. Full Unix stack that includes servers running Suse, Red Hat, AIX, HP-UX and Solaris is not that uncommon, especially for enterprises engaged in frequent acquisitions/divestitures. And it is far beyond human capabilities to comprehend.

This situation created a cottage industry serving to the need of those people using nice graphical interface and ad-hoc "scheduling constrains" languages. From pure technical standpoint 80% of functionality of those "super-duper" schedulers is just a waist of money and "reinventing the wheel" in comparison with using regular scripting language, or extending database functionality for scheduling tasks (like it was done in Oracle Scheduler). But as we demonstrated technical issues is only part of the picture. Convenience of some operations (viewing jobs reports, viewing output of jobs rescheduling failed jobs, nice messaging console, etc) also have great value.

Some large companies save money entering into agreements of co-developing some promising, but yet not known on the marketplace scheduler. This is also viable solution that works perfectly well and saves the particular company company a lot of money.

Generally enterprise schedulers are just one example of the complexity of modern IT environment which has its own set of irrationalities, not that different from the situation in Alice in Wonderland which remains the best guide into datacenter psychology ;-). Often you need to run fast just to stay in the same place. And Red Queen hypothesis (which is not related to Alice in Wonderland ) is also applicable:

The Red Queen hypothesis, also referred to as Red Queen's, Red Queen's race or the Red Queen Effect, is an evolutionary hypothesis which proposes that organisms must constantly adapt, evolve, and proliferate not merely to gain reproductive advantage, but also simply to survive while pitted against ever-evolving opposing organisms in an ever-changing environment, and intends to explain two different phenomena: the constant extinction rates as observed in the paleontological record caused by co-evolution between competing species[1] and the advantage of sexual reproduction (as opposed to asexual reproduction) at the level of individuals.[2]
Leigh Van Valen proposed the hypothesis to explain the "Law of Extinction",[1] showing that, in many populations, the probability of extinction does not depend on the lifetime of the population, instead being constant over millions of years for a given population. This could be explained by the coevolution of species. Indeed, an adaptation in a population of one species (e.g. predators, parasites) may change the natural selection pressure on a population of another species (e.g. prey, hosts), giving rise to an antagonistic coevolution. If this positive feedback occurs reciprocally, a potential dynamic coevolution may result.[3]

Typical set of features

It is difficult to enumerate the set of core features that an enterprise scheduler should support, because Unix is immensely flexible and most other systems whcih already exist in enterprise environment (especially complex monitoring system like OpenView) are typically underutilized.

So any effort in enumeration of such features is actually not a very productive task. So let's leave it to Microsoft ;-). Microsoft report Job Scheduling on Windows lists the following features as a core set:

Cross-platform support. The job scheduler should be able to manages jobs and schedules on all platforms available at the enterprise datacenters (often Windows plus five or six flavors of Unix; sometimes plus mainframes (z/OS) and OS/400. Predecessor and successor jobs (in a workflow) may run on different platforms.
Varity of types of scheduling jobs invocation
- On-demand scheduling. Users and operators submit individual jobs, which will be scheduled alongside those jobs already scheduled to run, and which can act as predecessors to trigger dependent jobs.
- Calendar and cyclical scheduling. Job definition includes calendar schedules to run jobs on a regular, predetermined cycle such as daily, weekly, monthly, yearly, or (for example) the second Tuesday in the month.
- Deadline scheduling. Job definition includes calendar schedules to specify a date and time for job completion up to one year or more in the future.
- Job dependency invocation. Job execution is conditional on successful completion of other jobs (predecessors and successors), which provides simple static workflow definitions that are contained within a group (or network) of job definitions.
Workload analysis. Numerous analysis tools are available, such as queue displays, performance displays, schedule displays, schedule trace-back, configurable reports, historical reports (for auditing), and summaries. Among most important capabilities
- Event console that reports about job failures, late jobs, etc
- Ability to view jobs logs from scheduler GUI.
Dynamic resource balancing. The scheduler determines the job mix on each processing node dynamically, to balance the use of critical hardware and software resources including servers, CPUs, memory, IO subsystems, applications and databases. This is especially important for backups which tax both the server and networking infrastructure.
Integration with monitoring event streams. The scheduler should be able to use various events, for example filesystem full or task completion events to trigger jobs or schedules.
Multiple calendars and time zones. Job schedules are based on one of several customizable calendars (different public holidays, different time zones, local business deadlines).
Scheduling of enterprise applications. Jobs that run under the control of packages such as SAP, Oracle, and PeopleSoft are defined, scheduled, and monitored from the central scheduling console and integrated into cross-application workflows. Schedule, monitor, and control Java-based, SQL, or .NET-based workload
Single point of control. An operator monitors and manages workloads that run on multiple nodes of a heterogeneous processing network from a central console.

As this site is developed for organizations who need to save money some compromise might be better that adopting an extreme position: "I want it all". I think that in enterprise scheduling area the right compromise is closely repeated to the adoption of open source scheduler like Open Source Job Scheduler and integrating it with good commercial or open source monitoring system. The second avenue is finding some low cost commercial provider and entering into co-development agreement. Oracle scheduler is free if you have license for Oracle database and also can serve as a good starting point. Another promising avenue is to expend monitoring system (such as OpenView) capabilities as I mentioned above.

Please be aware that the field is crowded with a lot of old, overpriced junk with questionable functionality. Also snail oil selling is a pretty powerful factor in this business. If you are buying enterprise scheduler without trying one of two open source solution most probably you will be duped. Complexity is this area is such that you have no chanced to understand the trap you are getting into and fully understand that befits and drawbacks of the solution you are buying. Even talking with existing customers does not change this situation much although it can prevent outright blunders. I know several companies that bought junk not one but two or three times (with different top honchos) and lost million of dollars in the process. Which of course for a large enterprise is rounding error and does not change the situation much, but still contradict my sense of fairness. Many large enterprise in the past bought IBM products because of saying you can't be fired buying IBM and were royally screwed. Some did the same with CA products

Standardization is enterprise schedulers is absent: each scheduling manager implementation reinvents the wheel in their own idiosyncratic manner and use some ad-hoc language for scheduling. They unusually error of the side of overcomplexity as they try to differentiate themselves from the competitors.

For example, in addition to "pure" scheduling most provide centralized ftp/sftp distribution of jobs, some configuration management capabilities and intrude into monitoring space by providing some kind of "job monitoring console". It's really sad that most job scheduling developers even don't suspect about existence of modern and pretty powerful monitoring systems...

As with any additional software product you introduce in enterprise environment, there is no free lunch. You need to use central server (a several servers) for job scheduling and instantly you need a failover capabilities (cron is actually distributed solution by its nature, so failure of one server does not affect scheduling on other server). better solution allow failing of central scheduler for a 24 hours (scheduling period), In this case schedule loaded at at the beginning of the scheduling day (typically midnight) will be executed locally until its exhaustion.

The positive side of enterprise schedulers is that by integrating job processing on central console and using central server one can achieve much higher level of workload automation then with locally edited cron jobs. Moreover a lot of computer maintenance tasks can be reformulated in terms of periodic jobs and job streams (collection of jobs performing particular task).

Sophisticated visual presentation of job streams and summarizing the results of jobs runs in a intuitive manner also can save time and increase productivity. For several hundred jobs it is difficult to parse output without automatic tools. Operators can't read through individual logs fast enough, and this is generally a boring, error prone procedure. Automation of rerunning of failed jobs is very important for large job stream (say over 500 jobs a day).

Usually specialized job scheduling systems provide some additional useful functionality beyond scheduling. For example, more advanced job schedulers contain change management and record keeping/report generation capabilities. As a set of several hundred jobs is usually a mess, change management is very important. Along with centralized repository, change management permit tracking changes: whenever a change is made, a record related to this change is created.

It typically includes the description of the change and the name of the person who made and/or approved it. Of course it can be implemented via Subversion and similar tools.

Another standard feature of enterprise job scheduling is that it handles computer jobs off across multiple servers/workstation which can be in different time zones. You can also chain jobs so that the next job in the chain starts immediately or after short delay after the previous job finished successfully (in case of failure other job or no job can be launched or the whole job stream can be aborted). Chaining is historically one of the oldest feature of job scheduling and was present in JCL of System/360 -- a dinosaur of scripting languages.

Open Source Implementations

Among open source implementations, the following are the leaders of the pack:

Open Source Job Scheduler Open Source Job Scheduler is developed by Software-und-Organisations-Service (SOS) GmbH in Berlin, Germany. Versions are available for Linux, Solaris, HP-UX (PA-RISC, IA64), AIX, and Windows. Like all open source products it provide excellent opportunities to understand whether the product can meet the demands of the organization and get your feet wet in a new, previously unknown to you, area (which might help to prevent costly mistakes in the future). While evaluation period for most products is limited to 90 days, here it is unlimited. See Job Scheduling à la Carte " Linux Magazine
Jobs can be configured from the Job Scheduler Editor GUI or using XML files which can be edited directly.

The Job Editor GUI is a Java-based application that you can use to configure the various jobs, chains, and other aspects of the system. All of the server configuration information is stored in XML files, and all you need to do is open up the respective XML file in the Job Editor to make your changes.

The day-to-day operation is handled by the Job Scheduler operations GUI (Operations GUI), which is Web-based (AJAX) application accessible from any web browser. You can monitor jobs, start, stop, handle errors, and so on. The Job Scheduler also provides an API that allows you to manage and control jobs externally. The API supports Perl, VBScript, JavaScript, and Java.
Sun Grid Engine (now called simply Grid engine). It has an open source version (as well as expensive enterprise version by Univa which not better then the open source versions but comes with tech support). SGE is a very powerful batch system. Far more powerful than facilities provided by many enterprise schedulers. It essentially represents a new standard for Linux batch systems and is highly recommended. Several solutions provided by SGE are very elegant. That includes well thought out command like interface, excellent GUI interface, pseudo comments in submission files, return of output to the headnode, etc. It is used mainly for computational jobs, especially in computational cluster environment, but nothing prevent extending it to full enterprise scheduling capabilities using combination of cron and monitoring system (such as Nagios)
GNU Batch by Xi Software - A job scheduler, originally sold as proprietary software, released as free software in February 2009. It is implemented in ANSI C.
Quartz is written in Java and mainly integrated with J2EE applications, but less suitable as a standalone Job Scheduling system. Quartz is licensed under more flexible then GPL Apache 2.0 license. Among features listed in documentation:
Runtime Environments
- Quartz can run embedded within another free standing application
- Quartz can be instantiated within an application server (or servlet container), and participate in XA transactions
- Quartz can run as a stand-alone program (within its own Java Virtual Machine), to be used via RMI
- Quartz can be instantiated as a cluster of stand-alone programs (with load-balance and fail-over capabilities)
Job Scheduling
Jobs are scheduled to run when a given Trigger occurs. Triggers can be created with nearly any combination of the following directives:
- at a certain time of day (to the millisecond)
- on certain days of the week
- on certain days of the month
- on certain days of the year
- not on certain days listed within a registered Calendar (such as business holidays)
- repeated a specific number of times
- repeated until a specific time/date
- repeated indefinitely
- repeated with a delay interval
Jobs are given names by their creator and can also be organized into named groups. Triggers may also be given names and placed into groups, in order to easily organize them within the scheduler. Jobs can be added to the scheduler once, but registered with multiple Triggers. Within a J2EE environment, Jobs can perform their work as part of a distributed (XA) transaction.
Job Execution
- Jobs can be any Java class that implements the simple Job interface, leaving infinite possibilities for the work your Jobs can perform.
- Job class instances can be instantiated by Quartz, or by your application's framework.
- When a Trigger occurs, the scheduler notifies zero or more Java objects implementing the JobListener and TriggerListener interfaces (listeners can be simple Java objects, or EJBs, or JMS publishers, etc.). These listeners are also notified after the Job has executed.
- As Jobs are completed, they return a JobCompletionCode which informs the scheduler of success or failure. The JobCompletionCode can also instruct the scheduler of any actions it should take based on the success/fail code - such as immediate re-execution of the Job.
Job Persistence
- The design of Quartz includes a JobStore interface that can be implemented to provide various mechanisms for the storage of jobs.
- With the use of the included JDBCJobStore, all Jobs and Triggers configured as "non-volatile" are stored in a relational database via JDBC.
- With the use of the included RAMJobStore, all Jobs and Triggers are stored in RAM and therefore do not persist between program executions - but this has the advantage of not requiring an external database.
Transactions
- Quartz can participate in JTA transactions, via the use of JobStoreCMT (a subclas of JDBCJobStore).
- Quartz can manage JTA transactions (begin and commit them) around the execution of a Job, so that the work performed by the Job automatically happens within a JTA transaction.
Clustering
- Fail-over.
- Load balancing.
- Quartz's built-in clustering features rely upon database persistence via JDBCJobStore (described above).
- Terracotta extensions to Quartz provide clustering capabilities without the need for a backing database.
Listeners & Plug-Ins
- Applications can catch scheduling events to monitor or control job/trigger behavior by implementing one or more listener interfaces.
- The Plug-In mechanism can be used add functionality to Quartz, such keeping a history of job executions, or loading job and trigger definitions from a file.
- Quartz ships with a number of "factory built" plug-ins and listeners.
Condor-G is grid-oriented and requires The Globus Toolkit
Cluster schedulers are very similar to SGE, but each has specific strong and wek points that might be relevant to your particular situation:
- TORQUE Resource Manager Terascale Open-Source Resource and QUEue Manager. It is a community effort based on the original PBS project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC, the US DOE, Sandia, PNNL, UB, TeraGrid, and many other leading edge HPC organizations. TORQUE can integrate with the open source Maui Cluster Scheduler or the commercial Moab Workload Manager to improve overall utilization, scheduling and administration on a cluster.
- The Maui Job Scheduler. Is an advanced job scheduler for cluster systems. It allows site administrators extensive control over which jobs are considered eligible for for scheduling, how the jobs are prioritized, and where these jobs are run. Maui supports advance reservations, QoS levels, backfill, and allocation management.
- OAR OAR is a resource manager (or batch scheduler) for large clusters written in Perl developed in INRIA. You can download an OAR Live CD to try OAR without installing it. It is actively developed. in 2009 OAR took part to the Google Summer of Code. The program is now over, students' work is available here.
  This batch system is based on a database (MySql), a script language (Perl) and an optional scalable administrative tool (component of Taktuk framework). It is composed of modules which interact only with database and are executed as independent programs. So formally, there is no API, the system is completely defined by the database schema. This approach eases the development of specific modules. Indeed, each module (such as schedulers) may be developed in any language having a database access library.
Oddjob Oddjob is a free open source Java job scheduler. Oddjob provides 'no programming required' scheduling with a business orientated approach that brings control back to the user.

Major proprietary players

Major proprietary players (that does not mean that any of them is superior to modern implementations; actually the opposite is true to not too diverse client configuration, say, 4 flavors of Unix and Windows) include (see Gartner Magic Quadrant for Job Scheduling if you think that Gartner is not fully out of touch with the reality ;-):

Control-M from BMC Software Gartner considers them a leader. Your mileage may vary
Tidal Enterprise Scheduler
UC4 Automation Engine
Orsyp
ASG
CA-7, AutoSys, Unicenter Workload Control Center (WCC) by Computer Associates It stands pretty high on "ability to execute" in Gartner chart; please note that by the "ability to execute" Gardian means mainly the market penetration. The CA portfolio includes CA-7, AutoSys, Unicenter Workload Control Center (WCC), as well as ESP and dSeries from the Cybermation acquisition. See The_evolution_of_job_scheduling
IBM Tivoli Workload Scheduler (TWS). Outdated and overpriced. Market share might be high but so what ? Avoid it.

Price-based classification of proprietary offerings

Commercial Job Schedulers also can be classified by price ( adapted from Job Scheduling Tools):

< $10,000

Open Source Job Scheduler While product is free, the support cost money. Pricing is transparent and starts with 4500 EUR for 10 servers (48h response) and 6750 EUR for 24h response

24X7 Scheduler from SoftTree Technolgies

AutoMate from Network Automation, Inc

Event Manager for Oracle

Oracle Scheduler (for users of Oracle 11 R2 or higher it is free; all you need to pay is Oracle maintenance costs on your database). This is a pretty powerful scheduler which allowing the database to schedule external jobs which run on a remote server. There is no separate licensing for agents installed on nodes so one server license is essentially all that you need. The clients (remote servers) doesn't have to have an Oracle client or database installation. all they need is an Oracle Scheduler Agent installation. This agent is responsible for executing the jobs and communicating with the database server that initiated the job. (
http://www.oracle-base.com/articles/11g/SchedulerEnhancements_11gR1.php )
There are two interfaces for the Oracle Scheduler:

UI which is part of Oracle Enterprise Manager (free with Oracle Enterprise license)

API (DBMS_SCHEDULER package).

It is a free feature shipped with Oracle database. Oracle Scheduler is implemented via the procedures and functions in the DBMS_SCHEDULER PL/SQL package.
The Scheduler provides sophisticated, flexible enterprise scheduling functionality, which you can use to:

Run PL/SQL anonymous blocks, PL/SQL stored procedures, and Java stored procedures.

Run executables that are external to the database (external executables), such as applications, shell scripts, and batch files. You can run external executables on the local system or on remote systems. Remote systems do not require an Oracle Database installation; they require only a Scheduler agent. Scheduler agents are available for all platforms supported by Oracle Database and some additional platforms.

Schedule job execution using the following methods:

Time-based scheduling You can schedule a job to run at a particular date and time, either once or on a repeating basis. You can define complex repeat intervals, such as "every Monday and Thursday at 3:00a.m except on public holidays" or "the last Wednesday of each business quarter."

Event-based scheduling The Scheduler enables you to start jobs in response to system or business events. Your applications can detect events and then signal the Scheduler. Depending on the type of signal sent, the Scheduler starts a specific job. Examples of event-based scheduling include starting jobs when a file arrives on a system, when inventory falls below predetermined levels, or when a transaction fails.

Dependency scheduling The Scheduler can run tasks based on the outcome of one or more previous tasks. You can define complex dependency chains that include branching and nested chains.

Prioritize jobs based on business requirements.
The Scheduler enables control over resource allocation among competing jobs, thus aligning job processing with your business needs. This is accomplished in the following ways:

Jobs that share common characteristics and behavior can be grouped into larger entities called job classes. You can prioritize among the classes by controlling the resources allocated to each class. This enables you to ensure that your critical jobs have priority and have enough resources to complete. For example, if you have a critical project to load a data warehouse, then you can combine all the data warehousing jobs into one class and give priority to it over other jobs by allocating it a high percentage of the available resources.

The Scheduler takes prioritization of jobs one step further, by providing you the ability to change the prioritization based on a schedule. Because your definition of a critical job can change over time, the Scheduler enables you to also change the priority among your jobs over that time frame. For example, you may consider the extract, transfer, and load (ETL) jobs used to load a data warehouse to be critical jobs during non-peak hours but not during peak hours. However, jobs that must run during the close of a business quarter may need to take priority over the ETL jobs. In these cases, you can change the priority among the job classes by changing the resource allocated to each class. See "Creating Job Classes" and "Creating Windows" for more information.

Manage and monitor jobs
There are multiple states that a job undergoes from its creation to its completion. Scheduler activity is logged and information such as the status of the job and the last run time of the job can be easily tracked. This information is stored in views and can be easily queried using Enterprise Manager or SQL. These views provide valuable information about jobs and their execution that can help you schedule and better manage your jobs. For example, a DBA can easily track all jobs that failed for a particular user. See "Monitoring and Managing the Scheduler".

Execute and manage jobs in a clustered environment
A cluster is a set of database instances that cooperates to perform the same task. Oracle Real Application Clusters (RAC) provides scalability and reliability without any change to your applications. The Scheduler fully supports execution of jobs in such a clustered environment. To balance the load on your system and for better performance, you can also specify the database service where you want a job to run. See "Using the Scheduler in Real Application Clusters Environments" for more information.

Sun Grid Engine SGE is typically used on a computer farm or high-performance computing (HPC) cluster and is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs. It also manages and schedules the allocation of distributed resources such as processors, memory, disk space, and software licenses. This is a highly recommended batch system that needs to be enhanced to provide full enterprise scheduling capabilities

$10,000 to $25,000

Global ECS from Vinzant Software

Single point-of-control for monitoring and managing enterprise-wide job streams.

Controller/Agent model uses the power of TCP/IP to simplify communications in a distributed enterprise environment.

Global ECS has many capabilities that allow for a 'Management by Exception' approach to automating your production environment.

Multiple Method Scheduling (MMS) allows for simple programming and management of tasks with widely varying repetition schedules.

Role based security model.

Launches and controls any command line, including graphical and text programs, batch files, command files and macros.

Captures return codes to detect job success or failure and allow the system to take appropriate actions.

Controls sequential execution and branching with sophisticated job dependencies.

Full support for file and resource dependencies.

GECS System Events to assist in scheduling and monitoring the production environment.

Full featured browser-based client for remote console access.

Active Batch from Advanced Systems Concepts (Two Speedwell Avenue, West Tower - Third Floor
Morristown, NJ 07960). Company was founded in 1981 so the product in almost 30 years old. Looks like initially it was Windows and OpenVMS biased. According to documentation:

The ActiveBatch Job Scheduler incorporates an event driven architecture that supports the triggering of jobs and job plans using a wide array of event types that include:

File Events (creation, modification, deletion or appearance)

Email's (Inclusive or Exclusive words, Subject Line, Sender, etc)

Web Services

Microsoft Message Queue's(MSMQ)

Oracle Database Triggers*

System Startup

Windows Management Instrumentation (WMI) to include over 15,000 event types including events like writing to the Event and Application Logs

SNMP Trapping

Completion Status of Predecessor Jobs and Plans (Success, Failure or Conditional)

Enterprises utilize the ActiveBatch event-driven architecture to respond swiftly by triggering jobs and job plans as the need arises rather than simply relying on date and time scheduling. As a result workflows can be designed to ensure that business processes can be initiated when specific IT events occur saving time and reducing costs.

The Job Scheduler monitors the various events that the automation designer incorporates into their jobs or job plans so they begin execution when the event is declared. Events can be calls to a Web service, the receipt of a message (email, file drop, message queue, and so on). They can be triggered by network events, WMI events, and a variety of other triggers. Once received, the plan begins execution unless a counterbalancing constraint is encountered. This permits the automation designer the ability to create a very responsive, interactive Plan, or Workflow.

ActiveBatch supports File Triggering on both Microsoft Windows as well as Non Windows systems upon the creation, modification or deletion of a specific file. Email triggers allow for actioning on inclusive or exclusive words or strings or attachments in addition to other values which are evaluated to allow the specified email to trigger a workflow. The email attachment can then be saved as a file for downstream processing. Using email triggers workflows can be initiated immediately thereby reducing wait time and making results of the workflow available for consideration for improve decision making.

The Shortcut Guide to IT Workload Automation and Job Scheduling offers clients powerful content in four chapters:

Chapter 1: Challenges of IT Workload Automation and Job Scheduling Download PDF

Chapter 2: IT Workload Automation and Job Scheduling Download PDF

Chapter 3: IT Workload Automation and Job Design and Configuration Download PDF

Chapter 4: Best Practices for IT Workload Automation and Job Scheduling Download PDF

$25,000 to $50,000

UC4 from UC4 Software GmbH privately held company. Very secretive about its product. No access to tech documentation to non-customers. No Trial.

OpsWise Automation Center
Enterprise Management Associates (EMA analysis of 13 leading vendors places OpsWise Automation Center in the top tier of Workload Autom.ation solutions. OpsWise receives top honors for "Best Interface Design" and identified as the most cost-effective solution in the radar report.

Workload Automation Startup, OpsWise Software today announced Enterprise Management Associates (EMA), a leading IT management research and consulting firm, has named OpsWise a Value Leader in its newest EMA Radar Report™ titled, "Workload Automation Q1 2010 - An EMA Radar Report™".

... ... ...

"It is very rare to find a brand-new solution so highly rated in a software segment traditionally dominated by the larger systems management vendors", states Mann "When reviewing OpsWise, we were immediately impressed by the product's ease of use, cutting-edge technology, and cost model. They are definitely a company to watch."

PRweb (Jan 20, 2010)

$50,000 to $100,000

AppWorx from AppWorx Corporation

EnterpriseSCHEDULE from ISE Inc

UNIJOB from Orsyp. Secretive company that does not release manual on the Web.

The vendor launched a new product, UNIJOB, in addition to Dollar Universe, to particularly target enterprises that do not need a full-blown enterprise job scheduler (including business applications), but need the ability to discover and import jobs from operating-system-specific utilities. This ensures that the job-scheduling environment, even outside IT production, meets compliance requirements; becomes more robust, and is easier to use and manage.

Dollar Universe has a peer-to-peer architecture that lends itself to high availability; thus, an instance of Dollar Universe is installed on each server that has applications and processes that need to be automated.

Orsyp has developed a new user console, Univiewer, that is able to administer and monitor jobs in UNIJOB. Univiewer also can view/monitor jobs in other third-party job-scheduling tools.

Cronacle. This is a product of Redwood Software, founded in 1993. Redwood lists 3100 customers, which includes clients of their document management tools. Cronacle supports workloads on OpenVMS, OS/400, UNIX, Windows, and z/OS, and integrates with SAP, PeopleSoft, Oracle, and others. Cronacle, is sold by SAP AG as "SAP Central Process Scheduling by Redwood" and is part of every SAP NetWeaver installation.^[3]

$100,000 to $200,000

Tidal Enterprise Scheduler from Tidal Software On May 20, 2009, Cisco acquired privately held Tidal Software, Inc. Based in Palo Alto, Calif., and Houston, Tidal Software is the creator of intelligent application management and automation solutions that will advance Cisco's data center strategy by enhancing product and service delivery offerings. That moved the product into higher price category.
This is a secretive company and technical documentation is not available to non-clients. Website contains only PR materials.

Dollar Universe Dollar Universe supports virtualized environments, event-based scheduling, Java and Web services environment across multiple and diverse platforms and applications. The feature set includes

Event-driven scheduling

Single point of control and real-time intervention

Real-time monitoring

Business Unit/Geographic-centric monitoring

Sophisticated error management and notification

Object-Oriented job definition

Workload balancing

Production simulation tools

Virtualization

Remote access with Web Console

Windows and Motif interfaces

Staging for safe push to production

Web services

Production data Reporting

Documentation Reporting

Audit trails

Debug command-line interface

More then $200,000

Tivoli Workload Scheduler (TWS) This is one of the oldest enterprise schedulers on the market.

Unicenter AutoSys. Also pretty old product. CA entered the job-scheduling field almost three decades ago. In a typical for CA fashion they acquired multiple competing/overlapping products: CA-7, AutoSys, Unicenter Workload Control Center (WCC), as well as ESP and dSeries from the Cybermation acquisition. See The_evolution_of_job_scheduling

Control-M from BMC Software

Critique of Gartner Job Schedulers review

In 2009 Gartner Magic Quadrant for Job Scheduling Report paper Gartner provides some rating of existing job schedulers which, of course, should be taken with a grain of salt.

Actually the lower the scheduler is graded in Gartner report the more promising it looks for large enterprise deployment, if for no other reason as more modern architecture and lower cost ;-). Here is their rating chart called Magic Quadrant. And it really has some magical, difficult to be rationally explained positioning of schedulers vendors:

Like in most Gartner reports they use pseudo-scientific terminology (like "ITWAB vision") and research report structure which actually hides the features of the products not reveal real strengths and weaknesses of products present on the market.

It looks like the axis Y ("Ability to Execute") of this quadrant is strongly correlated with the market share/installed user base. CA, Tivoli and BMC are three market leaders with CA having the biggest market share of all three.

Axis X ("completeness of vision") does not look meaningful. Gartner defines it as

On the Completeness of Vision (horizontal) axis, we evaluate how well a vendor or product will do in the future relative to Gartner's scenario for where a market is headed, which is based on our understanding of emerging technologies and the requirements of leading-edge clients. It also evaluates vendors on their ability to convincingly articulate logical statements about current and future market directions, innovations, customer needs and competitive forces, and how well they map to the Gartner position.

I doubt that the positioning presented reflects future prospects of the products. All three major vendor are pretty vulnerable in my opinion as the current economic environment puts tremendous stress on the price and "big three" are still charging "mainframe" prices for the product (BMC Control-M actually started as mainframe product).

At the same time Axis X in my opinion is somewhat correlated with the availability of monitoring solutions from the same vendor (and related integration of event streams). That may be accidental. Still BMC is a definite leader in the integration of its monitoring solution with job scheduling. IBM also is also good in this area if we assume integration with TEC.

Their criteria for leaders looks pretty unconvincing:

Vendors positioned in the Leaders quadrant have a large, satisfied installed base and a high degree of visibility in the market (for example, frequent consideration and success in competitive situations). They offer robust, highly scalable applications, and have the strategic vision to address evolving enterprise requirements in the areas of:

ITWAB

Integration with packaged applications

Support of new application and composite application architectures

Integration with BSM tools, provisioning and RBA tools

Ability to handle physical and virtual environments

Integration with service desk tools

Proven critical path analysis capability

Forecasting capability

Workload life cycle management capability

Support agent-based and agentless architectures

Ability to perform end-to-end automation, preferably with a single product

It is clear that in Gartner list No.1 is a bogus criteria. No.2 in fuzzy (does availability of open API qualified you for "integration with packaged applications" or you need explicit "bridges"). No.3 is also by-and-large bogus and reflect Gartner unhealthy infatuation with new and meaningless terms and technologies. For example, support of Web services or "cloud computing" ( what ever this means) does not automatically make a particular scheduler better.

No.4 is again fuzzy and so on as there as devil here in details (this is especially true for provisioning: a support of ftp dowloads is actually rpovisioning support, if you thing about it). Events/alerts management, the key architectural issue for schedulers is not even a separate category in Gartner's criteria for leaders.

For example they characterize IBM TWS product in the following way:

Strengths

IBM has the ability to form relationships and has access to senior executives in large organizations. This enables IBM to sell software suites or bundles that include workload automation technology.

IBM has been one of the first vendors to launch a product based on the ITWAB vision, the Tivoli Dynamic Workload Broker. It has the ability to schedule based on policies (such as CPU type and utilization goals) across virtual and physical resource groups. Furthermore, it integrates with other IBM products, such as Tivoli Provisioning Manager, to provision additional physical and virtual resources.

The workload automation vision is part of the overall dynamic infrastructure vision, in which it becomes the enabler for delivering execution services complying to specific high-level characteristics, such as the type of workload to support, the response time needed and the energy consumption to attain. To enable this, IBM has provided programmatic access to job-scheduling interfaces using Web services, and using WebSphere as the integration platform.

IBM has invested in embedding scheduling components from the workload automation technology across the IBM portfolio (for example, Tivoli Data Management and WebSphere). The first solution that will embed this technology is the Tivoli common reporting infrastructure used by products such as IBM Tivoli Monitoring. This infrastructure will be used to generate batch reports about IT services across solutions.

Cautions

Although IBM has continued investing in the workload automation area, their Workload products, on their own, do not have the ability to meet requirements of organizations that need deep functionality across a heterogeneous computing environment with a single product. Therefore, multiple and separate products may be needed for different environments.

Gartner inquiries reflect that IBM's Tivoli Workload Scheduler (TWS) is not widely adopted in Microsoft environments.

Although IBM has made some progress, it needs to improve integration and the installed base of its workload automation products with non-IBM software and applications, such as SAP and Oracle, and productize best practices to improve implementation time.

IBM has been unable to get TWS customers to move to a single Web-based console for the full range of functionality (operational control, administration, user interface). IBM started shipping this functionality in 2007, and completed its delivery in December 2008.

You can compare this assessment with the assessment I made in Tivoli Workload Scheduler page.

You can also compare them with more reasonable (although far from perfect) set of requirements recommended in Tidal evaluation guide (Job Scheduling Evaluation Guide):

This detailed evaluation guide provides a framework for benchmarking features and functions of enterprise job schedulers including:

Calendaring

Dependencies, Queues and Prioritization

Alert Management

Application and Database Support

Framework / Network Management Integration

Security and Audit Trails

Fault Tolerance and Auto Recovery

Event Driven Processing

Scalability

You can see that while detecting some elements of reality ("not widely adopted in Microsoft environments"), the review is a plain vanilla deception and obfuscation. Gartner definitely tries to be politically correct and not offend any large company.

Actually it completely ignores the fundamental weakness of TWL connected with the outdated architectural aspects of the product. Among them is outdated command line interface; very weak GUI interface, integration with TEC using named pipes (Tivoli TEC was abandoned by IBM in 2012). Very weak integration with other products in Tivoli line (TWS was an acquisition not in-house development for Tivoli). Instead of getting to the core strengths and weaknesses of the product Gartner concentrates on some superficial notions.

In reality TWS is expensive and pretty user unfriendly product burdened by legacy issues. Initial cost is ridiculous. Maintenance costs are high even for alarge company. Licensing is petty and vindictive (they count cores, not servers) . A reasonable estimate would be ~$300K in one year maintenance costs for medium size deployment with less then 500 endpoints: IBM actually counts cores so simplifying their arcade value unit structure one can assume that a typical 8 core server Intel-server (two quad CPUs) counts as 8 licensing units (with some minor discount for SPARC and Intel CPUs). Just of the base of the price TWS future looks bleak despite the installed base as sooner or later companies realize that they can pay one third of less of the price for all functionality they need.

Moreover the initial cost of TWS is just the tip of the iceberg and should probably be doubled as you need licensing of at least Tivoli Advanced Monitoring 6.x and, preferably, TCM (for delivery of TWS instances to endpoint) to make it work in large corporate environment. In no way it can be classified as challenger (unless in very specific sense because it is really a challenge to install and maintain) -- this is a real dinosaur of scheduling applications and the only thing IBM can try with TWS is to defend its turf (sometimes using tricks like automatic extension of the contract). My impression is that defections to less costly/more modern products were pretty common in 2008-2009 even among large enterprises.

Sometime language abuse in the report looks really funny:

" The workload automation vision is part of the overall dynamic infrastructure vision, in which it becomes the enabler for delivering execution services complying to specific high-level characteristics, such as the type of workload to support, the response time needed and the energy consumption to attain."

In reality this is an obscure way to say "I do not know the subject in-depth , but I want to make an impression that I am a respectable, knowledgeable analyst" ;-). There is something a little bit Orwellian about the obscurity of the phase above and it reminds me Soviet Politburo reports, or for those who never read such, Greenspan's Fed statements.

Dr. Nikolai Bezroukov

Top Visited <p>Your browser does not support iframes.</p>					Switchboard
					Latest
					Past week
					Past month

NEWS CONTENTS

20130319 : AirBNB Opensources Chronos, a Cron Replacement ( AirBNB Opensources Chronos, a Cron Replacement, Mar 19, 2013 )
20130319 : Slashdot ( Slashdot, )
20100430 : Issues in Selecting a Job Management System by Omar Hassaine ( January 2002 )
20100430 : Load Sharing Facility by Tom Bialaski ( June 30, 1999 )
20091118 : Workload Automation Differentiators ( Opswise Software )
20091118 : Cron: The Good, The Bad & The Ugly ( November 8, 2009 , OpsWise )
20091118 : Looking at new scheduling solutions (BMC, Title or UC4) (1st Response) ( Toolbox for IT Groups )
20091118 : Maui Cluster Scheduler ( Maui Cluster Scheduler, )
20090724 : Saturn network job scheduler freshmeat.net ( Saturn network job scheduler freshmeat.net, Jul 24, 2009 )
20090724 : PowerPoint presentation ( PowerPoint presentation, )
20090724 : Do we really need end-to-end automation- by Wolfgang Tonninger ( Beyond Job Scheduling )
20090724 : Document Downloads at cosbatch.com ( Document Downloads at cosbatch.com, )
20090724 : Job Scheduling Evaluation Guide ( Job Scheduling Evaluation Guide, )
20090724 : Open Source Job Schedulers in Java ( Open Source Job Schedulers in Java, )
20090724 : Vinzant Software - Global ECS ( Vinzant Software - Global ECS, )
20090724 : Open source workflow job scheduling for SuperScheduler ( Open source workflow job scheduling for SuperScheduler, )
20090724 : Introduction to Enterprise Job Scheduling The Service Level Management Learning Community by Elizabeth Ferrarini ( Introduction to Enterprise Job Scheduling The Service Level Management Learning Community, )
20090724 : Open Source Job Scheduler - Wikipedia, the free encyclopedia ( Open Source Job Scheduler - Wikipedia, the free encyclopedia, )
20090724 : Open Source Job Scheduler Overview by JAMES MOHR ( Open Source Job Scheduler Overview, )
20090724 : Job Scheduling Tools - Job Scheduler Reviews - Job Scheduler Reviews ( Job Scheduling Tools - Job Scheduler Reviews - Job Scheduler Reviews, )
20090724 : Job Schedulers - Enterprise Applications - Network Computing ( Apr 1, 2005 - By Mike DeMaria )
20090724 : Job Scheduling, Batch Job, Workload, Enterprise Jobs, Process Scheduling, etc ( Job Scheduling, Batch Job, Workload, Enterprise Jobs, Process Scheduling, etc, )
20090724 : Enterprise Job Scheduling and Application Performance by Tidal Software ( Enterprise Job Scheduling and Application Performance by Tidal Software, )
20090724 : Twitter Weekly Updates for 2009-01-25 ( Twitter Weekly Updates for 2009-01-25, )
210227 : schedulix Enterprise Job Scheduling ( schedulix Enterprise Job Scheduling, )
210227 : Sun Grid Engine for Dummies ( Nov 30, 2009 , DanT's Grid Blog )

Old News ;-)

[Mar 19, 2013] AirBNB Opensources Chronos, a Cron Replacement

It does not make any sense as cron replacement as it is a cluster software. Even so, the question is "Does it make sense in view of existence of the open source version of Sun Grid Engine ?"

Slashdot

First time accepted submitter victorhooi writes "AirBNB has open-sourced Chronos- a scheduler built around Apache Mesos (a cluster manager). The scheduler is distributed and fault-tolerant, and allows specifying jobs in ISO8601 repeating notation, as well as creating dependent jobs. There's also a snazzy web interface to track and manage jobs, as well as a RESTful API."It's under the Apache License as seems to be the fashion with businesses releasing software nowadays. It looks like it might be useful if you have to manage a lot of machines with interconnected recurring processes; I know I wish this had existed a few years ago.

kangsterizer

Re:Unnecessary. (Score:4, Insightful)

Which is exactly why its not a cron replacement. Anybody who think this == cron had NO clue of what they were doing when they were using cron, and still doesn't.

... cron on steroid ?

Would "Cron on Steroid" satisfy you?

Re:

No, but this seems to be aimed more at Control-M and other scheduling 'frameworks'. Not that these features are enough to challenge CTM, but it's still targeting that one more than it's targeting cron.

Re:

This is basic set theory.

If ProductB does *at least* everything ProductA does then it can be a ProductA replacement.

If ProductB does more things, that's not relevant to its use as a replacement.

Only if ProductA has features that ProductB cannot duplicate does ProductB fail to be a possible replacement.

So what features does cron have that this does not?

manu0601

Keep it simple, stupid (Score:5, Insightful)

This is not a replacement for cron. On an isolated machine, it would be foolish to trade cron for such a complicated beast. On many nodes, I understand it has benefits.

[Apr 30, 2010] Issues in Selecting a Job Management System by Omar Hassaine

January 2002

This article addresses the problems usually faced when selecting the most appropriate job management system (JMS) to deploy at HPC sites. The article describes the three most popular offerings available on the Sun platform and provides a classification of the most important features to use as a basis in selecting a JMS. A JMS comparison and useful set of recommendations are included.

[Apr 30, 2010] Load Sharing Facility by Tom Bialaski

June 30, 1999

How LSF can be used as a resource management tool for running technical batch applications such as simulations.

[Nov 18, 2009] Workload Automation Differentiators

Some of their points, especially critique of legacy vendors makes sense. They provide a Free Trial

Opswise Software

We hope that you will find every facet of our company different than our competition. We think we can realize this vision through our values, beliefs, and solutions. Sure, we are 100% web-based, the only true enterprise-wide solution available, and stunningly easy to use are all significant differentiators, but to better understand how we got here, and more importantly how we will continue to lead the evolution, please read on:

Attitude - while the company is comprised of Automation Experts, we believe that the answers to great software lies in the challenges our customers face on a day-to-day basis. We listen and ask questions of our customers to better understand their problems and their vision for their data center. We love to learn about challenges and help implement simple and innovative solutions.

Integrity - we are up-front when we don't understand a question and admit when our solution does not have all the necessary functionality to solve every problem. We do believe that we can show you the fastest, least expensive, and best route to success, all we ask is for a chance to prove it. Just ask for a demo and you will see the difference.

Technology - while we are quite confident that we have by far the best technology platform in our market segment, Yet, we are also acutely aware that we do not have all the functionality of products that have been around 30 years. We share our roadmap with our customers and let them prioritize what is in the next release and we do releases quarterly based on this input. No waiting 2 years for a feature that will make your life easier. Since our first release in 2008 we have added significant new features while maintaining compatibility with the older release, making upgrades as painless as possible.

Licensing - we are not going to try and force you to be "all in up front". You buy and use the software the way you want. Modern - no client/server heritage with a web front-end here. No API set with a SOAP veneer. We are native web from start to finish and of course industry compliant (AJAX, SOAP, WSDL, RSS, LDAP), but more importantly, your users will notice that the technology works - whether it is auto-complete in the browser (AJAX), integrations with other infrastructure (SOAP, JMS, JDBC, SMS, ...) or single signon in complex environments (LDAP). We firmly that you will find OpsWise Automation Center to be the most powerful, flexible, and the simplest solution available today.

Work Smarter, Not Harder

In times of change, savvy IT executives scrutinize the status quo and identify opportunities to cut costs while improving the IT organization for the long-term.

Hard dollar savings from day one

OpsWise Automation Center is intended to save you money on day one. First, our pricing is lower and more flexible than the major vendors and will typically cost half of your current maintenance fees. Second, implementation will take a fraction of the time. Third we will all but eliminate all future upgrade costs. Fourth, we are much, much easier to use and you will get more out of our solution than the competitors.

Productivity now more than ever

Legacy vendors continue to acquire companies for market share attempting to appease Wall Street. These acquisitions are rarely in the best interest of the customer. Broken integrations and multiple User Interfaces hurt customer productivity. IT organizations can't afford tools that hinder more than help. Give us a hard problem that will help you run your business better and we will help you automate it.

Implement in less time than our competitors upgrades

Our customers don't waste time installing complex software because our solution is built with standards and Web 2.0, from day one. Implementation begins typically in less than a half a day, not after weeks or months of struggling with traditional installation headaches.

Flexibility is a key business differentiator

Current economic conditions will force business and IT services to be agile. The simplicity and flexibility of Web 2.0 technology allows our customers to easily customize the user interface to meet unique business needs.

Speed of Deployment

Start your Engines
Our customers get configured and running in hours - not days and weeks. This enables you to spend your time evaluating the product and building workflows as opposed to on hold with technical support with someone who may or may not be able to fix your problem.

Upgrades should be painless

Enterprise software vendors have historically required customers to shoulder the burden and costs of upgrades and maintenance. These same vendors are forcing significant, expensive and time-consuming upgrades during lean economic times. Most IT organizations today can't absorb the loss of valuable time and resources, additional infrastructure expenditures, excessive consulting costs and staff retraining. Not to mention, the endless cycle of patches and maintenance. Opswise makes our upgrade process seamless; adding new features and improved usability – not a long drawn out process – with new fees attached.

Modern Architecture

Web 2.0 based Technology is More Intuitive
Automation Center is a pure internet based solution; all you need is a web browser and an internet connection. There's no client software to maintain and install, ever!

It's been designed from the ground up to be an internet application with the look, feel, and behavior of any other web application. Inspired by Yahoo!, Google and Amazon.com we want to provide software that empowers users with software that "just works"!

Extensive use of advanced web technologies such as AJAX means that we can offer the kind of real time interactivity that has traditionally required a fat GUI client, and we can do it all from inside of your web browser. Want to see details of a record? Try mousing over it and watch the system pull the information back as a tooltip. Need to choose a task? Try typing a few letters and watch the system start auto completing possible matches.

OpsWise believes that the future lies in open systems communicating over standard protocols such as SOAP and web services. We've designed our product from beginning with open standards in mind, supporting everything from bulk data feeds to real-time interactive SOAP operations, both into, and out of our system.

Usable

There is a radical shift occurring in IT that has been made possible by wide spread adoption of the web. Examples are all around us and have become so common that it is easy to miss how reliant we have become on the web for day-to-day activities. We often do not give the web and its underlying complexity credit for how it simplifies complex tasks.

Empowering

IT is now a key provider of competitive advantage particularly in industries like financial services, energy, retail, pharmaceuticals and government. Managing the web of services that support the business is critical for successful service management. The more information that organizations can give to IT the better IT will be. For today's customers, the line between back office and front office is quickly blurring. Our customers are using Automation Center to schedule traditional back office business processes, but also to manage front office revenue producing activities. One tool for both activities greatly reduces the learning curve and greatly increases their flexibility to move faster in this rapidly changing world. Plus, it saves them money by using one tool for both activities.

Frequent Software Updates

With OpsWise, you don't have to wait years to get upgrades. We upgrade our software quarterly, with input directly from the customer base. Participate interactively with us to build a product roadmap that gets you and your fellow customers to that next level in Workload Automation.

Our easy upgrade process eliminates the risk of upgrade.

Security & Auditing Inherent in Design

Opswise Automation Center is drastically different than legacy products. We combine our years of IT experience with world-class technology to ensure you are protected. We deliver confidence in three key areas:

Communications Security

Our comprehensive defense and response systems include firewall protection, VPN tunneling, multiple layers of encryption (SSL/TLS), and LDAP security. We provide secure integrations to prominent 3rd party systems and data sources.

Application Security

We deliver acute application security functions focused on user authentication, access control and auditing. Automation Center is governed by encrypted password protection, role-based security and contextual security. Each interaction is logged for auditing purposes.

Audit & Compliance

We deploy governance strategies designed to ensure customer privacy, meet auditing standards including SAS-70 Type II, and help support regulation initiatives.

10 Ways OpsWise is Better

Fast Installation

Get up and running in hours, not days. Drop in the agents and they auto-register with the server.

Ease of Use

Point-and-Click, drag-and-drop. It's that simple to use Automation Center. Ease of Use is a huge focus for every feature we design in OpsWise. From easy to navigate menus, to auto-fill forms; you can get up to speed and using the product within hours, not days. The money you'll save in not having to send employees for training courses will pay for the software.

100% Web-Based

Every single user and administrative task can be accessed via the web browser. No applets to download. 100% HTML and JavaScript. Access it from ALL popular web browsers and mobile devices.

Create IT Workflows

With the integrated diagramming tool, you can build Dynamic workflows and see them run in real-time. Use logic to take different paths in workflows based on task status.

Integrated Reporting Tool

Dozens of report templates to choose from or build your own. Graphical report writer enables you to build text reports, pie charts, and bar charts. Schedule reports to be distributed, and build interactive dashboard widgets to drop on your customizable home page.

Security from end-to-end

Using SSL from the browser all the way to the agent, you can secure your communication end-to-end. Firewall traversal is made easy with our architecture.

Comprehensive Auditing

A huge step forward in improving your IT compliance initiatives! Every single user command action is audited - including before and after changes. Easy to archive audit data. Use our integrated reporting tool to build Audit reports quickly!

Bring it to your next platform

100% Cross-platform. The same code runs on Linux, Windows, UNIX, or z/OS. To migrate from one platform to another isn't a conversion. Transfer your data to a new database and start up on another server. It's that simple.

Hassle-free licensing

We don't charge for platform switches, we don't charge for test and development licenses. We also provide workload-based pricing where you can deploy an unlimited amount of our software, and only pay for what you use.

Switching? Guaranteed savings

Looking to switch from your legacy scheduler? Not only will you achieve all of the advantages listed and more, but we guarantee that we can drop your maintenance bill by 50%.

Cron: The Good, The Bad & The Ugly

November 8, 2009 | OpsWise

A number of inquiries we get relating to OpsWise Automation Center, is from organizations who are growing, and as a result, it's simply time to replace cron.

For decades, administrators of UNIX and UNIX-like systems such as Linux have relied on cron, the built-in systems scheduler, in order to schedule and automate tasks. In this post, we will explore cron's strengths and weaknesses as well look at the challenges that this approach as opposed to an enterprise automation approach, such as OpsWise!

CRON the Good

There's a lot of good about cron, for instance:

As an embedded tool, it's free

The implementation of CRON is available across a large number of operating systems, and is consistently implemented

Cron is immediately available as a service, no software to install

From a security perspective, users can have their own individual crontab files and do their own scheduling

From a scheduling perspective, it provides good flexibility in terms of scheduling capabilities and especially good at running cyclic jobs (e.g. runs every 20 minutes)

CRON the Bad

Scheduling not sophisticated enough to understand holidays, so avoiding running workload on holidays or specific days (e.g. an inventory day) requires significant manual intervention

Stuck to the server cron is running on</.LI>

No workflow logic, you cannot take different processing directions based on the success, failure, or error code recieved

Stuck on time-only triggering. Much of the time, automation can be driven off of file-based activity or on RDBMS activity. With cron, you are stuck with just time

CRON the Ugly

Maintenance headache as you add more and more servers. Want to run the same scheduled task on twenty different servers? That's twenty different servers you need to log into, create a cron tab entry, test, and debug

No central point of control - users have to manually log in and look at the output of their workload in order to determine success or failure

No notifications automatically if something goes wrong, which could be hours, or days. You are blind to processing errors.

No audit trail for workload = compliance nightmare. If you have initiatives such as SOX, HIPPA, or PCI, a lack of evidence is problematic. The collection process, while possible, is very time consuming

OpsWise as a CRON replacement

OpsWise addresses the Bad and the Ugly of cron, providing you with:

Definition of tasks and workflows across all of your severs, and centralized monitoring of job status from a single web-based enterprise console

Notifications - get an email notification directly to your smartphone everytime something goes, wrong, including diagnostic output so you can pinpoint the error

Manage cross-platform dependencies across servers and operating systems through a simple drag-and-drop workflow editor

Automatically react to file activity instead of being dependent on time

Full Audit Trail, Audit Reports, and Role-based security makes being compliant and tracking compliance data a breeze

More sophisticated scheduling, including calendaring. We even provide a unique cron-compatibility mode for those used to scheduling using cron's five positional parameters. This also makes migrating your data from cron far easier than with other solutions And, of course, many, many more benefits.

Looking at new scheduling solutions (BMC, Title or UC4) (1st Response)

Toolbox for IT Groups

Its better to have third party scheduler as it gives greater control and options for running jobs. Informatica scheduler is good to some extent, but not flexible to adhoc needs. Following are few tools available in the market which you can explore...

1. Autosys - Recommended
2. Tivoli Workload Scheduler(IBM) - Recommended
3. Tidal
4. Control-M - Recommended(BMC)
5. Approwx
6. UC4 - Recommended
7. Crontab - Unix Native utility
8. Cronacle
9. Dollar Universe
10. Automan
11. Flux
12. Load Leveler(IBM)
13. Task Forest
14. PTC Scheduler
15. Quartz
16. SAP Central Process Scheduling
17. Xgrid (Apple)
18. Visual Tom
19. Indesca
20. TLOS Scheduler
21. Macro Scheduler
22. Bicsuite
23. Automation Center
24. Automation Anywhere
25. Batchman
/Rajiv

Maui Cluster Scheduler

Maui is an advanced job scheduler for use on clusters and supercomputers. It is a highly optimized and configurable tool capable of supporting a large array of scheduling policies, dynamic priorities, extensive reservations, and fairshare. It is currently in use at hundreds of leading government, academic, and commercial sites throughout the world. It improves the manageability and efficiency of machines ranging from clusters of a few processors to multi-teraflop supercomputers.
Maui is a community project* and may be downloaded, modified, and distributed. It has been made possible by the support of Cluster Resources, Inc and the contributions of many individuals and sites including the U.S. Department of Energy, PNNL, the Center for High Performance Computing at the University of Utah (CHPC), Ohio Supercomputing Center (OSC), University of Southern California (USC), SDSC, MHPCC, BYU, NCSA, and many others.

Features:
Maui extends the capabilities of base resource management systems by adding the following features:

Extensive job priority policies and configurations

Multi-resource admin and job advance reservation support

Metascheduling interface QOS support including service targets and resource and function access control

Extensive fairness policies

Multi-attribute fairshare

Configurable node allocation policies

Multiple configurable backfill policies

Detailed system diagnostic support

Allocation manager support and interface

Extensive resource utilization tracking and statistics

Non-intrusive 'Test' modes

Advanced built-in HPC simulator for analyzing workload, resource, and policy changes

Maui interfaces with numerous resource management systems supporting any of the following scheduling API's
PBS Scheduling API - TORQUE, OpenPBS and PBSPro
Loadleveler Scheduling API - Loadleveler (IBM)
SGE Scheduling API - Sun Grid Engine (Sun)*
BProc Scheduling API - BProc (Scyld)**
SSS XML Scheduling API*
LSF Scheduling API - LSF (Platform)
Wiki FlatText Scheduling API (Wiki)
*partial support or under development
**supported under Clubmask

Maui is currently supported on all known variants of Linux, AIX, OSF/Tru-64, Solaris, HP-UX, IRIX, FreeBSD, and other UNIX platforms.
The Maui scheduler is mature, fully documented, and supported. It continues to be agressively developed and possesses a very active and growing user community. Its legacy of pushing the scheduling envelope continues as we promise to deliver the best possible scheduler supporting systems software will allow.

[Jul 24, 2009] Saturn network job scheduler freshmeat.net

Some alpha-level Perl-based implementation.

Saturn is a job scheduler for networks that makes it to control local and remote job scheduling through simple commands similar to crontab, or through a graphical interface.

PowerPoint presentation Job Scheduler Development Roadmap

Do we really need end-to-end automation- by Wolfgang Tonninger

From UC4 development team blog

Beyond Job Scheduling

In our daily working lives we have numerous such business processes running constantly across our enterprise. Some of these processes are highly manual – requiring phone calls between individuals, emails, file transfers, data transferred using memory sticks etc – others are partly automated using for example some form of time and date based batch-processing tools. But in summary, they do everything but smoothly connect people, departments, applications and servers – creating islands of automation, which you may attack with specific tools in order to help manage them, and a series of manual stages.

The danger of operating in this way in today's challenging and highly competitive business environment is that the manual processes which inevitably require human intervention carry the risk of being error prone as well as costly on resources. Having some islands of automation that are disconnected from each other also decreases our business efficiency.

Document Downloads at cosbatch.com

Several PDF documents with information about the system.

Job Scheduling Evaluation Guide

Overview :

IT groups must support many applications and servers across multiple platforms that frequently operate independently of each other. However, coordinating job scheduling across all these applications and networks is often required to optimize resource utilization. The traditional approach of applying more staff, toolkits, and rudimentary scheduling software to cobble together automated batch processing solutions becomes cost-prohibitive, inefficient, and error-prone, as the number of moving parts increases and the environment becomes more heterogeneous.

Enterprise job scheduling products enable datacenters to solve this problem by simplifying both complex and routine tasks. This detailed evaluation guide provides a framework for benchmarking features and functions of enterprise job schedulers including:

Calendaring

Dependencies, Queues and Prioritization

Alert Management

Application and Database Support

Framework / Network Management Integration

Security and Audit Trails

Fault Tolerance and Auto Recovery

Event Driven Processing

Scalability

Read this evaluation guide to learn how to compare competing products and determine which is appropriate for your needs.

Open Source Job Schedulers in Java

Vinzant Software - Global ECS

Global ECS allows you to graphically schedule, automate and control complex job streams for multiple platforms in a heterogeneous distributed production environment. It delivers absolute power and control while utilizing simple to use and well-designed graphical management tools. Global ECS allows you to easily define complex jobs and batches that may require a large number of conditional parameters such as job, file and/or resource dependencies. It also supports multiple calendars, alerting and user definable recovery actions. By utilizing Global ECS technology, you will realize single point administration and real-time monitoring that will allow you to gain control of your enterprise-wide scheduling environment.

Key Features:

Single point-of-control for monitoring and managing enterprise-wide job streams.

Controller/Agent model uses the power of TCP/IP to simplify communications in a distributed enterprise environment.

Global ECS has many capabilities that allow for a 'Management by Exception' approach to automating your production environment.

Multiple Method Scheduling (MMS) allows for simple programming and management of tasks with widely varying repetition schedules.

Role based security model.

Launches and controls any command line, including graphical and text programs, batch files, command files and macros.

Captures return codes to detect job success or failure and allow the system to take appropriate actions.

Controls sequential execution and branching with sophisticated job dependencies.

Full support for file and resource dependencies.

GECS System Events to assist in scheduling and monitoring the production environment.

Full featured browser-based client for remote console access.

Benefits

Global ECS is used by some of the most demanding data centers in the world. Our solutions have been used since 1987 to automate, integrate and accelerate business application processing. GECS can be deployed quickly ensuring a very fast return on investment. It has been proven over and over again to lower the total cost of operations.

Key Benefits:

Reduces costs and maximizes productivity across your enterprise computing environment.

Eliminates costly human error in your mission critical production environment.

Higher application service levels.

Accelerated delivery of business-critical reports and information.

Frees up manpower to allocate to other important projects.

Lowers application production costs.

Provides a framework for future automation requirements for your development team.

Open source workflow job scheduling for SuperScheduler

Posted by: Wei Jiang on October 30, 2008 DIGG Workflow job scheduling is available for SuperScheduler and SuperWatchdog at http://www.acelet.com/super/SuperScheduler/index.html.
SuperScheduler is a full-featured task scheduler for all system and application job scheduling. Super Scheduler is the twin software of Super Watchdog, which is event-action task scheduler for event monitoring. Super Scheduler is entirely written in Java. It is an open source project with GPL license.

Introduction to Enterprise Job Scheduling The Service Level Management Learning Community by Elizabeth Ferrarini

www.nextslm.org A Crash Course in Cutting IT Ops Costs

Since 2001 in every industry, IT has come under intense pressure to make organizations perform more efficiently while still contributing to the bottom line. Nowhere is this more apparent in financial services where batch job scheduling has become the critical component to IT success.

The U.S. Securities and Exchange Commission has asked that all stock trades be cleared on what's called the trade day plus one or T+1 by June 2005. This requirement will force a switch from Wall Street's traditional batch processing systems to a real-time processing network that never crashes, according to a Computerworld article. The article adds that while upgrading to comply with T+1 will cost about $8 billion, the financial services industry will see savings of about $2.7 billion a year. In addition, this industry will have lower costs, lower error rates, and higher productivity while graining the ability to handle greater transaction volume.

Based on the ROI figures for the financial services industry, it comes clear that there are significant monetary benefits to be gained from implementing an automated job scheduling solution. It also becomes clear that beyond the direct addition to the bottom line through cost savings there are benefits of freeing up systems resources allowing them to be used more productively (e.g. talented human resources can be put to better use on more important IT projects).

Taken together, automating job scheduling on the surface can offer significant benefits to enterprises of every size. Since not all job schedulers are created equal and don't yield the same benefits, you need to understand how the different types of schedulers work and what attributes to look for in an automated job scheduler. This article will provide you with a crash course on the subject.

How Job Schedulers Work

Job scheduling comprises one of the most important components in a production-computing environment. Job schedulers do many things. They initiate and help manage long, complex jobs, such as payroll runs and inventory reports. They also launch and monitor applications.

Most computer environments use some kind of job scheduler. With the large distributed computing environments, some job schedulers have not scaled to meet the challenges of enterprise computing. Mainframe schedulers enjoy a reputation for power and robustness, but can be limited to working on mainframes. Unix schedulers, on the other hand, have a reputation for being severely limited in functions, but have cross-platform abilities which mainframe schedulers lack.

When beginning to manage batch workloads in open systems environments, most companies launch their first jobs using manual methods. This technique is understandable and appropriate. However, this technique quickly breaks down when the number of machines and batch jobs increases

For example, Unix and NT systems provide job launchers. These native tools allow users to launch jobs at specific times and specific dates. These commands provide a basis for scheduling, yet on their own do not deliver a solution for complex scheduling requirements. They rely on operators manually submitting jobs from a workstation. This technique is costly, and potentially unreliable and error prone.

In distributed systems, the job launchers in Unix and NT systems provide simple job launching capability. They offer the ability to start a batch job at a specific time, based upon an adequate set of time and date matching criteria. They perform simple job scheduling tasks such as kicking off a backup every Saturday.

The biggest weakness of these native tools is their inability to monitor and to correlate the execution of one job with the results of another. If a backup job fails, these tools don't know it should suspend the jobs that update the tape catalogs or deletes yesterday's old files. If the backup finishes early, these tools can't move up jobs that are to be executed upon completion of the backup.

Also, these native tools can only start jobs that are time-dependent. This procedure makes it difficult to create a job that runs when a file disappears or when a system resource has a certain threshold.

Job launching of configuration files are difficult to maintain. Even minor changes to a job's start time are time consuming and error prone. And there are no layered tools to make job creation easier. Remember, these tools are simple job launching tools designed for low volume environments. They lack all critical features required for complex, large systems.

To make up for this deficiency, many systems administrators create their own job management system. They use these native tools to initiate a job controller and create scripts that detect failure conditions, initiate other jobs, and provide some degree of checkpoint and restart capabilities.

While these solutions often work adequately for small job streams, they rarely scale to handle job loads of complex network environments. They also lack sophisticated user interfaces and reporting tools that allow users to keep audit trails of job streams.

More importantly, home-grown job schedulers quickly turn into full-time programming commitments. As dependence increases on the tool, more and more features get added. The result is usually a varied mix of scripts, programs, and Unix utilities that only a few people actually understand. This causes a situation prone to problems.

Mainframe job scheduling is the complete opposite of Unix job scheduling. Mainframe tools provide robust scheduling capabilities that handle huge, complex job streams with ease. Mainframe schedulers group jobs into collections, treating the collection as a single entity whose execution, success, or failure can be tracked and used to trigger other jobs or collections of jobs. Users start jobs and job collection using time triggers or other criteria, such as creation of a file, mounting a tape, or the shutdown of a database. The job scheduler is aware of almost all activity within the system and can respond accordingly.

Using screen-oriented user interfaces, system operators can track the status of jobs, noting which are running long and which are completing. Using this interface, operators can suspend jobs, delay execution, restart jobs, and track schedule slippage. It's possible to alert an operator if a job exceeds a maximum run time, or if a job failed to start due to not met execution criteria.

Mainframe schedulers also offer good reporting tools. They create execution logs and report job failure and success. Analyzing these reports over a period of time lets users see trends, such as accounting job streams that take longer and longer to backup jobs that begin to press against the limits of back windows.

What to Look For in an Automated Job Scheduler

With the increase in jobs in all businesses and the need to have these jobs run more quickly, it makes sense and pays dividends to automate job scheduling. Automating job scheduling yields several tangible benefits:

Reduces personnel costs while freeing up those human resources for more important and more profitable projects

Launches jobs on time, thus improving efficiency and reduces potential for human error

Optimizes resources allowing more work to be accomplished. A properly functioning job scheduling solution also allows new resources to be added or existing resources to be reconfigured with minimal impact on IT operations.

Whether it's an NT/2000, Unix, mainframe, or something else, there are specific capabilities a good automated job scheduler should have.

A good scheduler supports non-temporal job triggers such as file creation of system alerts. Users must be able to suspend job stream, slip a schedule to another time of day, and cancel a single instance of a job without affecting its overall schedule. There should be no limit to the number of jobs that can be created, and the system should be easy to use with 10 jobs as it is with 10,000 jobs.

And the job scheduler should be not only a technical asset, but a business asset. It should reduce costs, increase productivity, and maximize efficiency so that IT can fulfill its mission of adding value to the business.

Several computing job scheduling architectures have emerged for heterogeneous, distributed environments: collaborative; master and agent; and variations of master and agent which include master, submaster, agent, and console, master and agent. Because there are many similarities between master and agent and its variations, one need complete the collaboraton with the master agent architectures.

Master and Agent Architecture

The traditional architecture for job scheduling solutions is the master and agent architectures. Schedulers using this model generally evolved from mainframe concepts. This architecture involved putting a full implementation of the job schedulers on one server, the master, and putting agents on a series of other series, the agents.

In the master and agent configuration, jobs are set up, scheduled and administered from the master server. The actual work is done on the agents. The agents communicate with the master throughout the job run as the master passes parameters and other critical data to the agent. Jobs might be partitioned among agents. As the job is passed from server to server, communications must be maintained between agents and master. This makes network availability critical to successful completion of jobs.

On the one hand, the master and agent central administration allows tight control over jobs. This benefit comes at the cost of central, top-down, rather than inflexible tree structure. On the other hand, the most significant limitation of master and agent systems is the requirement for the master and agents to remain in sync. When the network or central server is interrupted, how long will it take to reconstruct your activity? The well-known volatility of distributed networks is an important consideration when considering schedulers based on master/agent architecture.

A second area of concern is performance. In master and agent environments, communication continually flows between the master and each of the agents. As the work workload increases so does the network traffic. As the traffic increases, the potential for overload expands.

Another aspect to consider is scalability. A master can only support a limited number of agents, and this depends on the number of jobs to be run. Creating a new master or instance creates a new and separate administration. The more instances you create, the more management you need. When you create a new instance, you need to recreate all jobs. The process can take days, weeks, or even months. The process itself can lead to errors and failures at any point along the way. While the new instances can be managed by the same administrator, within reason, the inability to administer the entire job scheduling environment from a single point increases complexity, and the likelihood of confusion and errors.

This lack of scalability can affect your overall costs drastically. When you create a new master, you need to add new hardware at the master and agent levels. In a large enterprise, this could quickly grow to a $1 million problem.

Collaborative Architecture

Designed for distributed environments, the collaborative architecture leverages the combined computing power of networks. In collaborative architecture environments, a full copy of the job scheduler is carried out on every server on the network. With this technique, once a server is given parameters for a job, it can run independently.

Each server runs jobs independently of all others. Communication occurs for coordination and updates. It effectively uses network resources to combine mainframe-like robustness with distributed flexibility.

Administration in collaborative environments is flexible. You can manage your job scheduling from either a central point or at the local level.

Since the collaborative architecture was designed for distributed environments, it has many benefits. With a full working copy of the software on every server, network downtime has diminished affect. Jobs continue to run even during network outages. The same applies to individual servers. If one server crashes, all other servers in the network continue their jobs. Any interdependent jobs are held until the crashed server resumes activity.

Since jobs can run locally, network communications and overhead decrease. This decrease translates into improved network and system performance.

In a collaborative environment scaling can be limited to the size of your network. Some job schedulers might be able to handle 500 servers each running 1,000 jobs for a total of 500,000 jobs. Replicating jobs is straightforward. Based on logical views of jobs and the environment, even the most complex jobs can be replicated in minutes.

Another distinct advantage is the most efficient use of hardware resources. Typically, in a collaborative architecture, your total job scheduling overhead is about one percent of central processing unit resources on each server in the network. In master and agent profiles, you need a dedicated server for the job scheduler itself plus a backup server in case the master fails. This feature is in addition to resources used on each server. Because of the limits on scalability, each time you expand to a new master configuration, you need to add hardware and software for the job scheduling server.

CIO - Measure Your ROI!

The pressure on IT to produce promised savings and efficiencies from new technologies they implement will only increase. In an era of fiscal belt tightening, these pressures increase even more. To this end, automated job scheduling can alleviate some of these pressures while adding value to the business. The time has come to measure that value in terms of return on investment.

Open Source Job Scheduler - Wikipedia, the free encyclopedia

Open Source Job Scheduler Overview by JAMES MOHR

Job Scheduler a la carte -- an article for Linux magazine

Planning and scheduling jobs can mean a lot of work, especially if they are spread across multiple machines. Here's a tool to make that task a lot easier.
The ability to perform a certain task at a specific time or at regular intervals is a necessary task for sys admins. The original cron daemon offers an easy method for job scheduling on Unix-based systems. Although cron has seen a number of improvements over the years, even the newer versions are designed for very basic scheduling. An administrator who wants to do anything unusual must either create a wrapper script or build the additional functionality into whatever script is started by cron.

Job Scheduling Tools - Job Scheduler Reviews - Job Scheduler Reviews

There are hundreds of job scheduling solutions to choose from today. Below are the top job schedulers broken out by entry level price. We are in the process of reviewing job scheduling solutions and will be publishing the results as they become available.

< $5,000

24X7 Scheduler from SoftTree Technolgies

Automate from Network Solutions

$5,000 to $15,000

Global ECS from Vinzant Software

Active Batch from Advanced Systems Concepts

$15,000 to $50,000

UC4 from UC4 Software GmbH

Tidal Enterpirse Scheduler from Tidal Software

>$50,000

Control-M from BMC Software

AppWorx from AppWorx Corporation

EnterpriseSCHEDULE from ISE Inc

Dollar Universe from ORSYP

Job Schedulers - Enterprise Applications - Network Computing

Apr 1, 2005 - By Mike DeMaria

Job Schedulers Getting the Job Done
Stop relying on homemade solutions to keep your systems running on time. The six job scheduling programs we tested automate almost every administrative, maintenance and business process an enterprise requires. Here are a few items on your to-do list:

Management wants the 15 servers' log files rotated and burned to CD the day following each business day. Make sure no backups failed.

Sales wants all orders placed over the e-commerce system automatically tallied at day's end, e-mailed to the sales VP and faxed to the distribution plant. Notify the sales managers if any of the tasks fail along the way.

The database administrator has set up his systems to start generating an extensive report at 3 a.m. so it's available by 10. If the report is going to be late, make sure he knows about it.

You could cobble together some programs to control these jobs. But your best bet is a good job-scheduling suite--one that lets you schedule, automate and monitor any number of tasks like these without your having to do the legwork. Jobs can include just about any administrative, maintenance or business processes, such as restarting services, rotating logs, backing up data, deleting temporary files, e-mailing invoices, sending notices of past-due balances and placing orders with business partners.
Of course, the tasks listed above aren't much of a challenge for an enterprise-class job scheduler. Does your company need an application to handle more complex tasks? Maybe you need to run several jobs in sequence on multiple machines. You want to run a job on the database server to query accounts, upload the output to an e-mail server and send e-mail to account holders, for instance. These job streams must run across several systems and departments. The job scheduler should handle failures midstream, even across machines. And a job failure and its resulting error code on Server A should influence which job runs next on Server B.

We asked eight vendors to send us their job-scheduling software for testing in our Syracuse University Real-World Labs®. Argent, BMC Software, Computer Associates International, Cybermation, Tidal Software and Vexus Consulting accepted our challenge. Hewlett-Packard said it doesn't have a product fitting our criteria, and IBM declined to submit its Tivoli software.

Five of the products we tested--the exception is Vexus Avatar--work similarly. A central scheduling server interacts with a database to store and schedule jobs. When it's time for a job to run, the scheduling server contacts a lightweight agent program, which signals the endpoints that will perform the job. The agent then executes a script and can return status codes and error information to the scheduling server. Additional jobs can be launched, a series of jobs can abort, or the scheduler can wait for an operator to take over. These products support failover to a backup scheduling server. Avatar works a bit differently, with each endpoint using a small, local scheduling server. As such, jobs can run independently without communicating to a central server at all.

Computer Associates and BMC make the most advanced products in this field. Although the difference in their Report Card scores is minuscule, CA Unicenter AutoSys Job Management 4.5 received our Editor's Choice award because it has slightly better access control and a simpler management interface, and it supports a few more client platforms.

Computer Associates

BMC Software

Tidal Software

Cybermation

Argent Software

Vexus Consulting

Read On

Introduction to Enterprise Job Scheduling

Job Scheduling - what's old is new again, sort of

Unix Basics: Job Scheduling with Cron

Management, job control and reporting capabilities accounted for 95 percent of each product's Report Card grade. We saved only 5 percent of the score for price: We feel the other factors are more important, and the vendors' differing business models--based on such factors as usage time, number of processors and operating systems--make a fair pricing comparison difficult. If your large organization relies on job scheduling as a critical business process, high availability and scalability matter more than price. Conversely, in small environments with only a handful of servers, factors like scalability and role-based administration may not matter at all.

Our management category covers role-based administration, job-priority scheduling, management user interface and agent platform support. Role-based administration is especially important for installing a large job-scheduling product, creating groups and users, and granting access. Tidal Enterprise Scheduler, Argent Job Scheduler and CA Unicenter AutoSys can pull users and groups from the corporate directory.

We put heavy emphasis on the main tasks of setting up schedules and prioritizing jobs. Scheduling tasks include creating and combining multiple calendars, dealing with holidays, and deciding when jobs should run. Prioritizing refers to controlling how many resources jobs use. Jobs that are time-sensitive or critical should get higher priority.

Job Scheduler Features
Click to Enlarge

All the products we tested let you configure permit and deny calendars. A job runs whenever the permit calendars dictate--every Monday night, once per quarter or every business day, for instance. Deny calendars keep jobs from running and overrule the permit calendars if a job appears on both.

We were most impressed with Tidal Enterprise Scheduler's and Vexus Avatar's management interfaces, which located existing jobs and set job parameters easily.

Agent platform support is diverse. Every vendor supports Windows NT and above, and all but Avatar support Hewlett-Packard HP-UX, IBM AIX, Linux and Sun Solaris. Smaller and niche systems, like OpenVMS, Compaq Tru64 Unix and Dequent Dynix, get some support. Only CA's Unicenter AutoSys supports Mac OS X Server, a Unix derivative. Schedulers from CA, BMC, Cybermation and Tidal all offer mainframe support.

Job Control

In testing these programs, we focused on job-control tasks, such as prerequisite checks, job creation and error recovery. Job schedulers don't actually create the batch files to run on the end nodes; that's up to the IT staff. Instead, schedulers kick off a script, batch file or executable at the designated time. These programs must check for prerequisite conditions before running the job--say, checking disk space before starting a backup--and handling error recovery if the job fails. We were disappointed with the prerequrement capabilities of our top players, CA Unicenter AutoSys and BMC Control-M. They were limited to file checks and checking the status of a previously run job. By comparison, Cybermation's ESP Espresso had the best procedures for checking prerequisites. It could detect file presence and size changes; monitor the event log, text strings, processes, services, CPU utilization and disk usage; and perform SQL queries.

Creating a single job is relatively simple. Just give it a name and specify which command to run on which server. For job creation, we graded each package's ability to create and visualize complex environments with multiple jobs running on various servers. CA's Job Visualization add-on lets you see all jobs programmed into Unicenter AutoSys. When you click on any job, you can see all possible paths to that job and all paths after it. Without Visualization, you can't graphically see relationships between jobs at all.

In the error-recovery subcategory, BMC Control-M excels. If a job fails, Control-M provides a range of options, such as rerunning the job, changing variables, sending out alerts and running other jobs. You can set multiple branching options for error conditions and make extensive use of command error codes. Argent Job Scheduler likewise has good error recovery: You can act upon a range of error codes, rather than individual codes. Argent's product also tries to rerun a job a specified number of times over a set number of minutes or hours.

Cron + Perl != Job Scheduling

Why spend a quarter-million dollars on these programs when you can use cron, Perl scripts, SSH and a little shell programming?

If you just need to run basic standalone jobs, cron might be sufficient. However, cron has limitations that a dedicated job-scheduling product can overcome. Cron is like an alarm clock. At a certain time, it wakes up, kicks off a job and goes back to sleep until the next job comes along. You can't correlate one job to the next. Cron can't tell that your log-file rotation failed at 2 a.m. and you shouldn't delete the old log files at 4. It can't tell if a job has finished early or late, or whether to move the rest of the jobs up or down. Homemade solutions, meanwhile, may involve setting up checkpoints and writing additional script files, and ultimately will pose scalability and longevity problems. It's also hard to see the status of cron jobs when you're dealing with many interdependent servers.

The job schedulers we tested are a bridge between mainframe and Unix environments. You can now get mainframe job-management functionality on your Unix, Linux or Windows server. Vendors that have mainframe backgrounds or support mainframes in their job-scheduling suite fared better than the rest.

CA entered the job-scheduling field almost three decades ago, so it's no surprise that this version of Unicenter AutoSys Job Management reflects years of enhancements and experience. The package has an easy-to-use graphical interface for Unix and Windows admins, as well as a Web client for operators and technicians. The interface is one reason Unicenter AutoSys edged out BMC's functionally similar Control-M.

Management settings are found in the administration program, a Web-based operator's site, and, optionally, eTrust Access Control. We used the Web interface to create simple jobs, kick off new jobs and check the status of scheduled events. The eTrust software provides granular read, write and execute permissions on all aspects of the job-management suite. We could control jobs, calendars, machine access and reports on a user or group basis. Credentials are provided by native Windows authentication. AutoSys includes eTrust, but configuring that program isn't simple. If you don't install it, though, you can't take advantage of group access controls under Windows.

We easily managed and scheduled related jobs from the main administration interface. To create a job, we specified a job name, the job owner, the command, dependencies and the machine. We liked the product's use of boxes for grouping jobs--a kind of batch job of batch jobs. Kicking off a box runs all jobs inside that box simultaneously, unless a job depends on the completion of a previous job.

Unicenter AutoSys has a unique method of failing over to a secondary job scheduling server: It uses a third machine for control purposes. This third server requires almost no resources--just a simple agent program that receives heartbeats from the job-scheduling servers. If you have the primary and backup scheduling server in different locations, the third machine determines if the primary actually failed, or if the secondary machine's network connection died. The three machines send one another heartbeat messages to confirm that the remote systems are up and the job scheduler is running. If the secondary machine can't reach the primary but can reach the third machine, it takes over as master. However, if the secondary can't reach the primary or third, it assumes that the problem is on the secondary's end and doesn't take over. Switching back from the secondary to the primary requires manual intervention.

You can configure Unicenter AutoSys not to use a third machine. In environments where job scheduling is critical or the backup server is separately located, the third machine is a powerful tool.

Unicenter AutoSys Job Management 4.5. Computer Associates International, (888) 423-1000, (631) 342-6000. www.ca.com

BMC and CA job schedulers are nearly equal in function and earned similar scores in our Report Card. BMC's Control-M for Distributed Systems offers excellent calendaring capabilities, late-job predictions and error recovery. However, the product is harder to administer than Unicenter AutoSys. Control-M has a Web interface for running and viewing jobs that helped raise its management score, but we prefer AutoSys' interface.

As with CA's product, we could use Control-M to bundle multiple jobs into a larger group. Creating a dependency between two jobs, such as running a log-rotate script before running a log-backup program, is as simple as dragging one job on top of another. Unfortunately, we couldn't specify file-existence, disk-usage, process-running or any other system-level dependencies. These functions must be handled within the batch scripts.

Predefined templates, which BMC calls skeletons, simplify job creation--individually or en masse--by propagating configuration fields automatically. Updates to the templates are sent to all jobs that use it.

Control-M provides multiple exit conditions for failures. The job can be rerun, a global variable modified, alerts thrown, e-mails sent or other jobs started. We specified the exit code and used text strings to define failures, and could detect and act when the Unix file copy command returned "No such file." Control-M integrates with BMC Patrol for alert management and automatic failover, though you don't need the Patrol suite for basic alerting. Unfortunately, without Patrol, failover from one scheduling server to a backup is a manual process. Built into Control-M is an alert console, similar to CA's scheduler. Alerts, which may be e-mailed, can be indicated as acknowledged and handled.

Control-M uses time heuristics to determine if a stream of jobs will be late. The software keeps tabs on how long each job is supposed to run. If a job upstream is running behind schedule, threatening to make a downstream job late past a certain threshold, the application triggers an alert. An operator can abort the job stream, halt other less vital jobs or find out why things are running late. The other products we tested determine lateness on a job-by-job basis.

Control-M. BMC Software, (800) 841-2031, (713) 918-8800. www.bmc.com

The excellent management interface on Enterprise Scheduler made this package one of our favorites to use. Tidal's suite also has the best documentation, with real examples and step-by-step instructions, but we want better reporting, job creation, visualization and error recovery.

To assign access control, Enterprise Scheduler uses security policies, which contain all the permissions you want to set. You assign users or groups to the policy from Active Directory. With this setup, it was easy to modify policies later and have the changes affect all relevant users.

As expected, Tidal's scheduler can create dependencies based on previous job status and global variables. The variables may be a string, number, date or Boolean value. A job can modify or read the variables. You also can create file dependencies, such as a log-rotation job that acts only when a log file larger than a certain size is present. Within a list of dependencies, we required all to evaluate as true, or at least one. However, we couldn't specify for at least two dependencies to be true, or create complex "and/or" statements.

The system can detect errors in three ways. Enterprise Scheduler supports exit codes, even letting you specify a range of exit codes for success or failure. A program's output can be piped to another program, and the error code of the piped program is used to determine success. Finally, you can do a pattern match on a program's output for success or failure. Although these functions can be handled inside the batch job, having it available inside the job scheduler is a bonus.

Tidal Enterprise Scheduler 5.0. Tidal Software, (877) 558-4325, 650-475-4600. www.tidalsoftware.com

A mixed bag, Cybermation ESP Espresso does the essential tasks of creating jobs and checking dependencies well, but this package needs better alerting and scheduling features. Its interface is great for some tasks and poor for others. It's easy to create jobs and visualize the job stream, for example, but the general system interface is difficult to use.

The job-creation component is put together well. Icons representing jobs can be dragged and dropped onto a canvas like a Visio diagram. This lets you see dependencies and the order of operations for a job; you can also monitor jobs in real time from this view.

ESP has the best dependency checking of the products we tested. We could monitor the event log, text strings in a file, processes running, services, CPU utilization and disk usage. We also could perform SQL queries. You can launch jobs based on the results of these monitors.

Access-control settings are available on a per-user or group basis. We copied permissions from one group to another and from one user to another. Permissions can be granted to jobs and calendars as well. For example, we let a user modify all calendars except for the "payroll" one.

ESP Espresso 4.2. Cybermation, (905) 707-4400. www.cybermation.com

With strong reporting and alerting engines, the Windows-only Argent Job Scheduler may be a good fit for small and midsize environments. To appeal to large enterprises, though, it needs better support for dependency checking, role-based administration and job-stream visualization.

The software pulls user and group information from the Active Directory domain or local computer when no domain is present. You can grant read, write, execute and control permissions per machine or per job class. A job class is a grouping of individual jobs; you can't set controls on individual jobs or calendars. This setup makes it difficult to visualize job dependencies, even though job classes can be used to organize job streams.

Argent's reporting and alerting, sent over e-mail or pagers, are better than the competition's. The alerts can play a sound for a set length of time, execute a command on the scheduling server's command line or send a Windows message alert to any Windows client. Alerts also can be sent to Argent Guardian, a separate monitoring and alerting product. You don't need Guardian for basic alerting. Reports can be created and sent on a calendar schedule as well.

The Argent Job Schftware, (860) 674-1700. www.argent.com

Vexus Avatar has two benefits: It's simple and inexpensive. Avatar doesn't offer many configuration options, and creating jobs with it is straightforward. The price runs as low as $500 per CPU. Vexus claims Avatar can compete at the same level as Unicenter AutoSys or Control-M, but we found the product's distributed setup a disadvantage in large environments. Avatar is best-suited for those looking for small-scale or simple batch management, or on machines that have limited communications with a central job scheduler.

Avatar is vastly different from the other products we tested. Instead of working with a model of dumb agents with a central scheduling server, Avatar agents operate independently. A full, yet lightweight, scheduling server is installed on every endpoint, which holds its own calendar, queues and job definitions. Agents are administered through a separate Web client, which connects to an Avatar server for administration only, not for triggering tasks. Jobs can be triggered across machines, and you can add multiple Avatar servers to a management interface, but there's no central machine for the actual job creation.

Avatar doesn't provide a mechanism for shared security settings, calendar, groups or jobs. Users authenticate against the local system password file or directory. It does not support groups, so access controls are set for individual users or for every authenticated user. Each machine can set "allow" or "deny" rules for file transfers, executing and monitoring jobs and setting dependencies on a per-host or -user basis. Jobs can be triggered across machines, but Avatar lacks intramachine visualization tools.

Avatar Job Scheduling Suite 4.5.5. Vexus Consulting Group, (416) 985-8554. www.vexus.ca

Michael J. DeMaria is an associate technology editor based at Network Computing's Syracuse University's Real-World Labs®. Write to him at [email protected].

Keeping your systems running on time requires coordination of tasks, from backing up servers to generating sales reports. Although you can concoct an application to start operations and keep tabs on each job, the six job-scheduling suites we tested at Network Computing's Syracuse University Real-World Labs® automate the process.

We tested products from Argent, BMC Software, Computer Associates International, Cybermation, Tidal Software and Vexus Consulting, and graded their job-control and -reporting capabilities, as well as the ease with which we could manage the programs.

All the suites but Vexus' Avatar require a central server and database, in which the server uses an agent to trigger the start of each job on an endpoint. Avatar stores a scheduler on each endpoint and is better suited to small-scale operations.

The CA and BMC products earned very close Report Card scores for their sophisticated scheduling abilities. But CA's Unicenter AutoSys Job Management 4.5 won out for Editor's Choice because its management interface and extensive platform support were slightly better than those of BMC's Control-M.

To test the job schedulers, we used a dual 2.4-GHz Pentium Xeon system with 1 GB of RAM, running Windows 2000 Server SP4 as our job-scheduling server. A second machine ran as a backup. A 600-MHz Pentium III system with 256 MB of RAM running Windows 2000 Server SP4 served as our clients. If the vendor didn't supply a built-in database, we installed Microsoft SQL 2000 Service Pack 3a. For Linux tests, we used a Red Hat 9 system and installed some components of Vexus Avatar.

Our scheduled jobs comprised batch files and command-line executables, and we added a sleep command to some of the batch files to make some jobs run late. To determine how well a product can handle failure conditions, we made batch files terminate with nonzero exit codes.

All Network Computing product reviews are conducted by current or former IT professionals in our Real-World Labs® or partner labs, according to our own test criteria. Vendor involvement is limited to assistance in configuration and troubleshooting. Network Computing schedules reviews based solely on our editorial judgment of reader needs, and we conduct tests and publish results without vendor influence.

Job Scheduling, Batch Job, Workload, Enterprise Jobs, Process Scheduling, etc

Replacing your scheduling solution is easier than you think. With BMC CONTROL-M's proven migration methodologies and IT Workload Automation leadership you can accomplish a whole lot more. BMC CONTROL-M, Yes it Does, Yes you Can!

Enterprise Job Scheduling and Application Performance by Tidal Software

Automating Job Scheduling and other datacenter tasks is a requirement for the complex datacenter. Yet, the integration between IT Systems using existing tools must be custom-coded or left undone, leaving the datacenter with increased security risk and a reliance on business users to monitor system health. A planned approach to job scheduling and datacenter automation should not only eliminate these risks and inefficiencies, but also reach across heterogeneous environments as a service platform for business process automation and infrastructure activities.

Tidal Software offers the easiest-to-use enterprise job scheduler – Tidal Enterprise Scheduler™ - for completely automating even the most complex batch processing across the complete enterprise. Customers consistently cite Tidal's enterprise job scheduling software's ease-of-use, support for complex heterogeneous environments and flexibility as reasons for selecting Tidal.

Tidal's job scheduling software has dramatically improved the performance of a wide range of systems for customers such as: order management, business intelligence, customer reporting, and more.

Twitter Weekly Updates for 2009-01-25

Looking for an open source replacement for TWS, either it doesn't work, or its too complicated to figure out, which is the same thing.
@myztikjenz No, no… much worse… the Tivoli Workload Scheduler… which I think I'll just replace with a handful of shell scripts. in reply to myztikjenz
@myztikjenz TWS is all that is wrong with "enterprise" software today! in reply to myztikjenz
In case you did not know: IBM Software pricing model is per cpu core.
So, $33/value unit… 50 value units per core… 4 quad-core intels… so, what… $26,400?

schedulix Enterprise Job Scheduling

schedulix offers a huge seature set, which enables you, to meet all requirements considering your IT process automation in an efficient and elegant way.
User-defined exit state model
Complex workflows with branches and loops can be realised using batch hierarchies, dependencies and triggers by means of freely definable Exit State objects and how they are interpreted.
Job and batch dependencies
You can make sure that individual steps of a workflow are performed correctly by defining Exit State dependencies
Branches
Branches can be implemented in alternative sub-workflows using dependencies that have the exit state as a condition.
Hierarchical workflow modelling
Among other benefits, hierarchical definitions for work processes facilitate the modelling of dependencies, allow sub-processes to be reused and make monitoring and operations more transparent.
Job and batch parameters
Both static and dynamic parameters can be set for Submit batches and jobs.
Job result variable
Jobs can set any result variables via the API which can then be easily visualised in the Monitoring module.
Dynamic submits
(Sub-)workflows can be dynamically submitted or paralleled by jobs using the Dynamic Submit function.
Pipelining
Local dependencies between parts of the submitted batch instances are correctly assigned when parallelising batches using the Dynamic Submit function. This hugely simplifies the processing of pipelines.
Job and batch triggers
Dynamic submits for batches and jobs can be automated using exit state-dependent triggers.
This allows notifications and other automated reactions to workflow events to be easily implemented.
Loops
Automatic reruns of sub-workflows can be implemented by using triggers.
External jobs
So-called 'pending' jobs can be defined to swap out sub-workflows to external systems without overloading the system with placeholder processes.
Folders
Job, Batch and Milestone workflow objects can be orderly organised in a folder structure.
Folder parameters
All jobs below a folder can be centrally configured by defining parameters at folder level.
Static resources
Static resources can be used to define where a job is to be run. If the requested resources are available in multiple environments, the jobs are automatically distributed by the schedulix Scheduling System.
Load control
A quantity of available units of a resource can be defined for runtime environments using system resources. A quantity can be stated in the resource requirement for a job to ensure that the load on a resource is restricted.
Job priority
The job priority can be used to define which jobs are to take priority over other jobs when there is a lack of resources.
Load balancing
The interplay of static and system resources allows jobs to be automatically distributed over different runtime environments dependent upon which resources are currently available.
Synchronizing resources
Synchronising resources can be requested with different lock modes (no lock, shared, exclusive, etc.) or assigned them for synchronising independently started workflows.
Sticky allocations
Synchronising resources can be bound to a workflow across multiple jobs with sticky allocations to protect critical areas between two or more separately started workflows.
Resource states
A state model can be assigned to synchronising resources and the resource requirement can be defined dependent upon the state.
Automatic state changes can be defined dependent upon a job's exit state.
Resource expirations
Resource requirements can define a minimum or maximum time interval in which the resource was assigned a new state. This allows actuality and queue conditions to be easily implemented.
Resource parameters
Resource parameters allow jobs to be configured dependent upon the allocated resource.
Access controlling
Authentication routines for job servers, users and jobs using IDs and passwords are effective methods of controlling access to the system.
Time scheduling
The schedulix Time Scheduling module allows workflows to be automatically run at defined times based on complex time conditions. This usually obviates the need for handwritten calendars, although they can be used whenever required.
Web interface
The schedulix web front end allows standard browsers to be used for modelling, monitoring and operating in intranets and on the internet.
This obviates the need to run client software on the workstations.
API
The full API of the schedulix Scheduling System allows the system to be completely controlled from the command line or from programs (Java, Python, Perl, etc.).
Repository
The schedulix Scheduling System stores all the information about modelled workflows and the runtime data in an RDBMS repository.
All the information in the system can be accessed via the SCI (Standard Catalog Interface) whenever required using SQL.
Back to the schedulix Homepage

Sun Grid Engine for Dummies

Nov 30, 2009 | DanT's Grid Blog
Servers tend to be used for one of two purposes: running services or processing workloads. Services tend to be long-running and don't tend to move around much. Workloads, however, such as running calculations, are usually done in a more "on demand" fashion. When a user needs something, he tells the server, and the server does it. When it's done, it's done. For the most part it doesn't matter on which particular machine the calculations are run. All that matters is that the user can get the results. This kind of work is often called batch, offline, or interactive work. Sometimes batch work is called a job. Typical jobs include processing of accounting files, rendering images or movies, running simulations, processing input data, modeling chemical or mechanical interactions, and data mining. Many organizations have hundreds, thousands, or even tens of thousands of machines devoted to running jobs.
Now, the interesting thing about jobs is that (for the most part) if you can run one job on one machine, you can run 10 jobs on 10 machines or 100 jobs on 100 machines. In fact, with today's multi-core chips, it's often the case that you can run 4, 8, or even 16 jobs on a single machine. Obviously, the more jobs you can run in parallel, the faster you can get your work done. If one job takes 10 minutes on one machine, 100 jobs still only take ten minutes when run on 100 machines. That's much better than 1000 minutes to run those 100 jobs on a single machine. But there's a problem. It's easy for one person to run one job on one machine. It's still pretty easy to run 10 jobs on 10 machines. Running 1600 jobs on 100 machines is a tremendous amount of work. Now imagine that you have 1000 machines and 100 users all trying to running 1600 jobs each. Chaos and unhappiness would ensue.

To solve the problem of organizing a large number of jobs on a set of machines, distributed resource managers (DRMs) were created. (A DRM is also sometimes called a workload manager. I will stick with the term, DRM.) The role of a DRM is to take a list of jobs to be executed and distributed them across the available machines. The DRM makes life easier for the users because they don't have to track all their jobs themselves, and it makes life easier for the administrators because they don't have to manage users' use of the machines directly. It's also better for the organization in general because a DRM will usually do a much better job of keeping the machines busy than users would on their own, resulting in much higher utilization of the machines. Higher utilization effectively means more compute power from the same set of machines, which makes everyone happy.

Here's a bit more terminology, just to make sure we're all on the same page. A cluster is a group of machines cooperating to do some work. A DRM and the machines it manages compose a cluster. A cluster is also often called a grid. There has historically been some debate about what exactly a grid is, but for most purposes grid can be used interchangeably with cluster. Cloud computing is a hot topic that builds on concepts from grid/cluster computing. One of the defining characteristics of a cloud is the ability to "pay as you go." Sun Grid Engine offers an accounting module that can track and report on fine grained usage of the system. Beyond that, Sun Grid Engine now offers deep integration to other technologies commonly being used in the cloud, such as Apache Hadoop.

One of the best ways to show Sun Grid Engine's flexibility is to take a look a some unusual use cases. These are by no means exhaustive, but they should serve to give you an idea of what can be done with the Sun Grid Engine software.

A large automotive manufacturer uses their Sun Grid Engine cluster in an interesting way. In addition to using it to process traditional batch jobs, they also use it to manage services. Service instances are submitted to the cluster as jobs. When additional service instances are needed, more jobs are submitted. When too many are running for the current workload, some of the service instances are stopped. The Sun Grid Engine cluster makes sure that the service instances are assigned to the most appropriate machines at the time.

One of the more interesting configuration techniques for Sun Grid Engine is called a transfer queue. A transfer queue is a queue that, instead of processing jobs itself, actually forwards the jobs on to another service, such as another Sun Grid Engine cluster or some other service. Because the Sun Grid Engine software allows you to configure how every aspect of a job's life cycle is managed, the behavior around starting, stopping, suspending, and resuming a job can be altered arbitrarily, such as by sending jobs off to another service to process. More information about transfer queues can be found on the open source web site.

A Sun Grid Engine cluster is great for traditional batch and parallel applications, but how can one use it with an application server cluster? There are actually two answers, and both have been prototyped as proofs of concept.
The first approach is to submit the application server instances as jobs to the Sun Grid Engine cluster. The Sun Grid Engine cluster can be configured to handle updating the load balancer automatically as part of the process of starting the application server instance. The Sun Grid Engine cluster can also be configured to monitor the application server cluster for key performance indicators (KPIs), and it can even respond to changes in the KPIs by starting additional or stopping extra application server instances.

The second approach is to use the Sun Grid Engine cluster to do work on behalf of the application server cluster. If the applications being hosted by the application servers need to execute longer-running calculations, those calculations can be sent to the Sun Grid Engine cluster, reducing the load on the application servers. Because of the overhead associated with submitting, scheduling, and launching a job, this technique is best applied to workloads that take at least several seconds to run. This technique is also applicable beyond just application servers, such as with SunRay Virtual Desktop Infrastructure.

A research group at a Canadian university uses Sun Grid Engine in conjunction with Cobbler to do automated machine profile management. Cobbler allows a machine to be rapidly reprovisioned to a pre-configured profile. By integrating Cobbler into their Sun Grid Engine cluster, they are able to have Sun Grid Engine reprovision machines on demand to meet the needs of pending jobs. If a pending job needs a machine profile that isn't currently available, Sun Grid Engine will pick one of the available machines and use Cobbler to reprovision it into the desired profile.
A similar effect can be achieved through virtual machines. Because Sun Grid Engine allows jobs' life cycles to be flexibly managed, a queue could be configured that starts all jobs in virtual machines. Aside from always having the right OS profile available, jobs started in virtual machines are easy to checkpoint and migrate.

With the 6.2 update 5 release of the Sun Grid Engine software, Sun Grid Engine can manage Apache Hadoop workloads. In order to do that effectively, the qmaster must be aware of data locality in the Hadoop HDFS. The same principle can the applied to other data repository types such that the Sun Grid Engine cluster can direct jobs (or even data disguised as a job) to the machine that is closest (in network terms) to the appropriate repository.

One of the strong points of the Sun Grid Engine software is the flexible resource model. In a typical cluster, jobs are scheduled against things like CPU availability, memory availability, system load, license availability, etc. Because the Sun Grid Engine resource model is so flexible, however, any number of custom scheduling and resource management schemes are possible. For example, network bandwidth could be modeled as a resource. When a job requests a given bandwidth, it would only be scheduled on machines that can provide that bandwidth. The cluster could even be configured such that if a job lands on a resource that provides higher bandwidth than the job requires, the bandwidth could be limited to the requested value (such as through the Solaris Resource Manager).

Further Reading

For more information about Sun Grid Engine, here are some useful links:

Beginner's Guide to Sun Grid Engine 6.2

Sun Grid Engine 6.2 presentation slides

Sun Grid Engine documentation wiki

Sun Grid Engine product page

Grid Engine open source project

Community site

Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: February 10, 2021