Apache Oozie Managing Libraries

Apache Oozie provides the facility for managing system JARs as well as user-defined JARs.

The following is the list of JARs, that are managed by Oozie.

  1. System JARs
  2. Hadoop JARs
  3. Action JARs
  4. User JARs

Let us see each type of JAR in detail.

1. System JARs

These JARs are produced during an Oozie build and included as part of the Oozie web application archive (oozie.war) file and used to run Oozie services.

2. Hadoop JARs

These JARs are used by Oozie to create communication with Hadoop services. These JARs are generated by Hadoop and Oozie adds them into the web application archive at the time of packaging.

3. Action JARs

These JARs are used to execute built-in Oozie actions.

4. User JARs

These JARs are created by end-users to execute their application logic such as mapper and reducer are required to run MapReduce action similarly Pig/Hive UDF code is required for Java action. The user bundles their JAR and deploys it under the “/lib” directory of the workflow application path.

Apache Oozie JARs Design Challenges

To design a flexible and in-built framework for JAR management in a complex system like Oozie is very tricky.

The following are some of the reasons.

1. Multiple Action Types

Apache Oozie manages different types of built-in and user-defined actions. Each action type has a different type of JARs and in a certain condition, they conflict with each other. For example, Pig and Hive have their JARs and in some cases similar JARs. In this case, Oozie should include only those JARs which are required for that action and exclude those JARs which are not required.

2. Multiple Versions

There should not be any dependency on the tool version, one action type should support multiple versions. Oozie should provide a framework to support multiple versions of each tool.

3. Different Hadoop Versions

Hadoop plays an important role in JAR management as most of the actions are directly related to the Hadoop system. The issue that arises with the Hadoop version is that if a JAR complies with Hadoop Version 1.x then it will not run on the Hadoop 2.x cluster, such type of variability should be addressed by the Oozie framework.

4. Unified Jar Upgrade

In some cases, if Oozie supports Pig 0.11 and there is some important bug fix added to Pig 0.11 and Oozie need to replace or add new JAR in that case if we directly replace it then it will create an issue for running job because of the way the Hadoop distributed cache works. Oozie should facilitate an easy and error-free JAR upgrade.

Apache Oozie JAR Precedence in Classpath

There are three ways in which JARs can be included in any workflow of Oozie.

Let us see the precedence of JAR.

Application lib Directory

If a JAR is present in the workflow application “/lib” directory that means it has given high priority in the classpath.

User-level shared Library

The user-level shared library is the second-highest priority in the classpath. It is defined through Oozie.libpath.

System-level shared Library

The JAR actions which are included in system-defined shared lib have the lowest priority.