Apache Oozie Control Flow Nodes

What are the Apache Oozie Control Flow Nodes?

Control flow nodes are used to define the starting and the end of a workflow such as a start control node, end control node, and kill control node and to control the workflow execution path it has the decision, fork, and join nodes.

The following is the list of the Apache Oozie Control flow nodes.

  1. Start control node
  2. End control node
  3. Kill control node
  4. Decision control node
  5. Fork and Join control node

Let us see each control flow node in detail.


1. Start Control Node

A workflow job starts with the start control node. It is an entry point of workflow jobs. Each workflow definition will have a start node and when the job is started, it habitually shifts to the node that is mentioned in the start node.

Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <start to="[NODE-NAME]"/> ... </workflow-app>

Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1"> ... <start to="HadoopJob"/> ... </workflow-app>


2. End Control Node

The end control node is used to indicate that the workflow job has been completed successfully.

Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <end name="[NODE-NAME]"/> ... </workflow-app>

Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1"> ... <end name="end"/> </workflow-app>


3. Kill Control Node

Kill control node is used to kill a workflow job. In case if one or more actions started by the workflow job are executing when they kill node is reached, then the actions will be killed.

Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <kill name="[NODE-NAME]"> <message>[MESSAGE-TO-LOG]</message> </kill> ... </workflow-app>

Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1"> ... <kill name="killBecauseNoInput"> <message>Input unavailable</message> </kill> ... </workflow-app>


4. Decision Control Node

A decision node is used to allow a workflow to make a selection on the execution path that should be followed. It consists of a list of predicates-transition pairs plus a default transition. Predicates are estimated in order or appearance until one of them evaluates to true and the corresponding transition is taken. In case there are no predicates evaluate to true then default transition is occupied.

Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <decision name="[NODE-NAME]"> <switch> <case to="[NODE_NAME]">[PREDICATE]</case> ... <case to="[NODE_NAME]">[PREDICATE]</case> <default to="[NODE_NAME]"/> </switch> </decision> ... </workflow-app>

Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1"> ... <decision name="mydecision"> <switch> <case to="reconsolidatejob"> ${fs:fileSize(secondjobOutputDir) gt 10 * GB} </case> <case to="rexpandjob"> ${fs:fileSize(secondjobOutputDir) lt 100 * MB} </case> <case to="recomputejob"> ${ hadoop:counters('secondjob')[RECORDS][REDUCE_OUT] lt 1000000 } </case> <default to="end"/> </switch> </decision> ... </workflow-app>


5. Fork and Join Control Node

Fork node is used to divide the path of execution into multiple concurrent paths of execution and join node waits until every concurrent execution path of a previous fork node arrives at it. Both Fork node and Join node should be used in pairs.

Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <fork name="[FORK-NODE-NAME]"> <path start="[NODE-NAME]"/> ... <path start="[NODE-NAME]"/> </fork> ... <join name="[JOIN-NODE-NAME]" to="[NODE-NAME]"/> ... </workflow-app>

Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> ... <fork name="forking"> <path start="firstparalleljob"/> <path start="secondparalleljob"/> </fork> <action name="firstparallejob"> <map-reduce> <job-tracker>foo:8021</job-tracker> <name-node>bar:8020</name-node> <job-xml>job1.xml</job-xml> </map-reduce> <ok to="joining"/> <error to="kill"/> </action> <action name="secondparalleljob"> <map-reduce> <job-tracker>foo:8021</job-tracker> <name-node>bar:8020</name-node> <job-xml>job2.xml</job-xml> </map-reduce> <ok to="joining"/> <error to="kill"/> </action> <join name="joining" to="nextaction"/> ... </workflow-app>