What is Apache Oozie Architecture?

Apache Oozie is a Workflow engine. It is used to run workflow jobs such as Hadoop Map/Reduce and Pig. It is Java Web-Application that runs in a Java servlet container. By using Oozie multiple jobs can be bounded sequentially into one logical unit of work. The major advantage of the Oozie framework is that it is fully integrated with the Apache Hadoop stack and supports Hadoop jobs for Apache MapReduce, Pig, Hive, and Sqoop.

The following figure shows the architecture of Apache Oozie.

cloudduggu oozie architecture


Apache Oozie Server is configured as a Java Web Application that is hosted on the Tomcat server which is an open-source implementation of the Java servlet technology.

Let us see the components of Apache Oozie architecture.


1. Oozie Client

An Apache Oozie client is a command-line utility that interacts with the Oozie server using the Oozie command-line tool, the Oozie Java client API, or the Oozie HTTP REST API. The Oozie command-line tool and the Oozie Java API eventually use the Oozie HTTP REST API to communicate with the Oozie server.


2. Oozie Server

Apache Oozie server is a Java web application that runs in a Java servlet container. Oozie uses Apache Tomcat by default, which is an open-source Java servlet technology. The Oozie server does not store any user or job information in memory. Oozie main all this information such as running or completed in the SQL database. When a user request to process a job, the Oozie server fetches the conforming job state from the SQL database and performs the requested operation, and updates the SQL database with the new state of the job.


3. Oozie Database

Apache Oozie database stores all of the stageful information such as workflow definitions, running and completed jobs. Oozie fetches the corresponding job-state from the SQL database while processing a user request and performs the requested operation, and updates the SQL database with the new state of the job. Oozie provides support for databases such as Derby, MySQL, Oracle, and PostgreSQL.