Apache Oozie Reprocessing

In Oozie, job reprocessing is a key operation at the workflow, coordinator, and bundle levels. There are three situations in which a user needs to rerun the same job.

  • When a job is failed due to a transient error.
  • When the job is completed but input data is bad.
  • When there is a bug in the code.

The following are the levels on which reprocessing can be done.

  1. Workflow Reprocessing
  2. Coordinator Reprocessing
  3. Bundle Reprocessing

Let’s see each reprocessing in detail.

1. Workflow Reprocessing

A workflow job that is in SUCCEEDED, FAILED, or KILLED state is eligible for reprocessing.

The following command will reprocess a workflow job.

$ oozie job -rerun 0000092-141219003455004-oozie-joe-W -config job.properties

The above command will rerun the workflow job apart from this, there is a few more detail that will determine what will happen with that command.

  • The user needs to ensure that the output directory cleanup is done before starting the workflow.
  • For rerunning workflows, there are two configuration properties oozie.wf.rerun.skip.nodes and oozie.wf.rerun.failnodes. We can use any one of them and they can be added to the job.properties file or passed in via the -D option on the command line.
  • During rerun, if we can skip the workflow nodes using the oozie.wf.rerun.skip.nodes parameter.
  • Oozie can rerun an entire workflow if the property oozie.wf.rerun.failnodes is set to false but this property should not be used along with oozie.wf.rerun.skip.nodes option.
  • Apache Oozie wf:run() function provides the execution detail of the workflow.

2. Coordinator Reprocessing

A Coordinator action can be reprocessed until it is in a complete state. The parent coordinator job should not be in a FAILED or KILLED state. Coordinator action can be rerun using date or action numbers. We can start the complete coordinator job by supplying the start and end date. During reprocessing, Oozie tries to help the retry attempt by cleaning up the output directories by default and removing the old output it uses output event description in the coordinator XML.

The following command will display how to rerun a set of coordinator actions based on the date. It will remove the old files and recalculates the data dependencies. This command reruns the actions with the nominal time between 2020-07-20T05:00Z to 2020-07-25T20:00Z and individual actions with nominal time 2020-07-28T01:00Z and 2020-07-30T22:00Z:

$ oozie job -rerun 0000673-120823182447665-oozie-hado-C -refresh -date 2020-07-20T05:00Z::2020-07-25T20:00Z, 2020-07-28T01:00Z, 2020-07-30T22:00Z

3. Bundle Reprocessing

A Bundle reprocessing is the reprocessing of the coordinator actions that have been run under the supports of this particular bundle call. It provides the option to rerun the coordinators and/or actions corresponding to some of the dates. The options are -coordinator and -date.

$ oozie job -rerun 0000094-141219003455004-oozie-joe-B -coordinator test-coord

The Coordinators [test-coord] of bundle 0000094-141219003455004-oozie-joe-B are scheduled to rerun on date ranges.