Apache Sqoop Connectors And Drivers

Apache Sqoop Driver

Apache Sqoop uses the JDBC driver to connect to databases and perform required operations. It also facilitates the connection in form of code paths for those databases which are not SQL compliant. Sqoop is compatible with a large number of databases. To make the connection between Sqoop to different databases we need to download it and install it separately.

By using JDBC common API, we can connect with different databases. The JDBC drivers from different databases can be installed on the client system under the $SQOOP_HOME/lib path. Sqoop uses the JARs present on this path $SQOOP_HOME/lib and loads the classes to run MapReduce jobs. Sqoop uses a protocol scheme to check the vendor logic If the database is identified by Sqoop then Sqoop works otherwise we may need to specify the driver class to load via --driver. Sqoop also provides the --direst parameter to support non-JDBC-based access.

Apache Sqoop includes vendor-specific support for the below databases.

Database Version Support (--direct)? Connect String
HSQLDB 1.8.0+ No jdbc:hsqldb:*//
MySQL 5.0+ Yes jdbc:mysql://
Oracle 10.2.0+ No jdbc:oracle:*//
PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://
CUBRID 9.2+ NO jdbc:cubrid:*


Apache Sqoop Connectors

Apache Sqoop supports the connector based architecture with which external systems can connect with Apache Sqoop. Using connectors Sqoop can connect with the data sources that have no JDBC support. We can add connector and plugin during the installation of Apache Sqoop. For installation, we can copy the JDBC driver under the $SQOOP_HOME/lib path of Sqoop in

Let us see how connectors partition, format their output, extract data and load data.


1. Partitioner

In this phase, the partitioner generates conditions that can be used by the extractor.

(upper boundary - lower boundary) / (max partitions)

If there is no specification from the user end then a primary key will be used to partition the data.

The data types supported are mentioned below.

  1. TINYINT
  2. SMALLINT
  3. INTEGER
  4. BIGINT
  5. REAL
  6. FLOAT
  7. DOUBLE
  8. NUMERIC
  9. DECIMAL
  10. BIT
  11. BOOLEAN
  12. DATE
  13. TIME
  14. TIMESTAMP
  15. CHAR
  16. VARCHAR
  17. LONGVARCHAR

2. Extractor

In this phase, the JDBC data source is queried using SQL which is based on your configuration.

  • If a user provides the table name then the SQL query is generated which will be like this "Select * from tab_name".
  • If a user provides the table name and the column name then the SQL query is generated which will be like this "Select col_name from tab_name".
  • If a user provides the table SQL then that SQL will be executed.

3. Loader

In this phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration.

  • IN case a user has provided a table name then the SQL will be generated as "INSERT INTO tab_name (col_name1, col2_name, ...) VALUES ()".
  • IN case a user has provided a table name and column name then the SQL will be generated as "INSERT INTO table_name (columns) VALUES (??,..)".
  • In case a user has provided a SQL query then that SQL query will be used.

4. Destroyers

In this phase, below two operations are performed.

  • Perform the copy operation of the staging table to the concerned table.
  • Once the copy operation is completed then empty the staging table.