Apache Sqoop Metastore tool is used to store Sqoop job information on a central machine so that multiple users can create Sqoop jobs and easily run them by accessing them from Metastore. The purpose of the Metastore tool is to configure Sqoop to host a shared Metastore repository. To connect with metastore a client should be configured in the sqoop-site.xml configuration file also it can connect using the --meta-connect argument.

Apache Sqoop Metastore Syntax

The following is the syntax of Apache Sqoop Metastore.

sqoop metastore (generic-aruguments) (metastore-aruguments)

The Process to Setup Apache Sqoop Metastore

At a high level, we can follow the below process to setup Sqoop Metastore.

1. Sqoop Metastore Server

First step of setup Sqoop Metastore is to select the master server which can handle heavy loads.

2. Set up Sqoop Metastore Server

Create a user that will run Metastore and concurrently create a folder in which the user will store database information. The recommendation is to use Sqoop user for running Metastore. Now we will configure Metastore detail in the sqoop-site.xml file also we can set the Apache Sqoop metadata server location with sqoop.metastore.server.location. Metadata is managed by configuring the sqoop.metastore.server.port parameter on default port 16000 over TCP/IP.

We can set client properties as below in the sqoop-site.xml file.

  • sqoop.metastore.client.autoconnect.url
  • sqoop.metastore.client.autoconnect.url
  • sqoop.metastore.client.autoconnect.url

The auto-connect URL is a connection string for HSQL DB which has the following format. An example is given below.


3. Update Service Configuration

Now login into another node in the cluster and update properties for client access. Here don’t configure property for server configuration such as sqoop.metastore.server.location and sqoop.metastore.server.port as these properties should be configured at the machine which is running metastore.

4. Start Sqoop Metastore

Now login into Sqoop Metastore server using Sqoop user ($su sqoop) and enter in Metastore folder, start the server process and redirect stdout and stderr in a file and leave it in the background ($nohup sqoop-metastore &>>shared.db.out &). If you want to shut down the metastore server then you can use ($sqoop-metastore –shutdown) from the user who is running metastore.