Apache Sqoop Import-All Tables

Apache Sqoop Import-All-Tables tool is used to import all tables from RDBMS to Hadoop HDFS. For each table, a separate directory is created with a table name, and then in that directory table’s data will be stored.


Import-All-Tables Syntax

Below syntax is used to import all tables in Hadoop HDFS.

sqoop import-all-tables (gen-args) (imp-args)


Import-All-Tables Example

Let us see an example to import all the below tables from the MySQL database to Hadoop HDFS.

Command:

cloudduggu@ubuntu:~$ sqoop import-all-tables --connect jdbc:mysql://localhost/userdata?serverTimezone=UTC --username root
 --password cloudduggu --m 1


table list

Output:

We will see the following output after job completion.

2020-07-20 05:51:23,374 INFO mapreduce.Job: Running job: job_1595247619306_0001 2020-07-20 05:52:10,699 INFO mapreduce.Job: Job job_1595247619306_0001 running in uber mode : false 2020-07-20 05:52:10,702 INFO mapreduce.Job: map 0% reduce 0% 2020-07-20 05:52:36,626 INFO mapreduce.Job: map 100% reduce 0% 2020-07-20 05:52:38,869 INFO mapreduce.Job: Job job_1595247619306_0001 completed successfully 2020-07-20 05:52:39,570 INFO mapreduce.Job: Counters: 32 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=225037 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=63 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=43290 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=21645 Total vcore-milliseconds taken by all map tasks=21645 Total megabyte-milliseconds taken by all map tasks=44328960 Map-Reduce Framework Map input records=3 Map output records=3 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=342 CPU time spent (ms)=5700 Physical memory (bytes) snapshot=146235392 Virtual memory (bytes) snapshot=3410919424 Total committed heap usage (bytes)=32571392 Peak Map Physical memory (bytes)=146235392 Peak Map Virtual memory (bytes)=3410919424 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=63 2020-07-20 05:52:39,601 INFO mapreduce.ImportJobBase: Transferred 63 bytes in 94.2109 seconds (0.6687 bytes/sec) 2020-07-20 05:52:39,628 INFO mapreduce.ImportJobBase: Retrieved 3 records. 2020-07-20 05:52:39,629 INFO tool.CodeGenTool: Beginning code generation 2020-07-20 05:52:39,836 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1 2020-07-20 05:52:39,896 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/cloudduggu/hadoop Note: /tmp/sqoop-cloudduggu/compile/fc94ffa41264c7eeb7b2c6e4ad14db35/employee.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 2020-07-20 05:52:46,273 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudduggu/compile/fc94ffa41264c7eeb7b2c6e4ad14db35/employee.jar 2020-07-20 05:52:46,309 INFO mapreduce.ImportJobBase: Beginning import of employee 2020-07-20 05:52:46,456 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 2020-07-20 05:52:46,551 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/cloudduggu/.staging/job_1595247619306_0002 2020-07-20 05:52:48,276 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:847) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:843) 2020-07-20 05:52:49,028 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:847) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:843) 2020-07-20 05:52:50,329 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:847) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:843) 2020-07-20 05:52:51,871 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810) 2020-07-20 05:52:51,964 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810) 2020-07-20 05:52:53,928 INFO db.DBInputFormat: Using read commited transaction isolation 2020-07-20 05:52:54,509 INFO mapreduce.JobSubmitter: number of splits:1 2020-07-20 05:52:54,642 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1595247619306_0002 2020-07-20 05:52:54,644 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2020-07-20 05:52:54,906 INFO impl.YarnClientImpl: Submitted application application_1595247619306_0002 2020-07-20 05:52:54,950 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1595247619306_0002/ 2020-07-20 05:52:54,954 INFO mapreduce.Job: Running job: job_1595247619306_0002 2020-07-20 05:53:28,086 INFO mapreduce.Job: Job job_1595247619306_0002 running in uber mode : false 2020-07-20 05:53:28,087 INFO mapreduce.Job: map 0% reduce 0% 2020-07-20 05:53:51,098 INFO mapreduce.Job: map 100% reduce 0% 2020-07-20 05:53:53,169 INFO mapreduce.Job: Job job_1595247619306_0002 completed successfully 2020-07-20 05:53:53,298 INFO mapreduce.Job: Counters: 32 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=225074 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=462 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=39762 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=19881 Total vcore-milliseconds taken by all map tasks=19881 Total megabyte-milliseconds taken by all map tasks=40716288 Map-Reduce Framework Map input records=12 Map output records=12 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=281 CPU time spent (ms)=5210 Physical memory (bytes) snapshot=153419776 Virtual memory (bytes) snapshot=3410935808 Total committed heap usage (bytes)=32571392 Peak Map Physical memory (bytes)=153419776 Peak Map Virtual memory (bytes)=3410935808 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=462 2020-07-20 05:53:53,342 INFO mapreduce.ImportJobBase: Transferred 462 bytes in 66.9515 seconds (6.9005 bytes/sec) 2020-07-20 05:53:53,367 INFO mapreduce.ImportJobBase: Retrieved 12 records. 2020-07-20 05:53:53,374 INFO tool.CodeGenTool: Beginning code generation 2020-07-20 05:53:53,502 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `invoice` AS t LIMIT 1 2020-07-20 05:53:53,533 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/cloudduggu/hadoop Note: /tmp/sqoop-cloudduggu/compile/fc94ffa41264c7eeb7b2c6e4ad14db35/invoice.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 2020-07-20 05:53:55,687 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudduggu/compile/fc94ffa41264c7eeb7b2c6e4ad14db35/invoice.jar 2020-07-20 05:53:55,727 INFO mapreduce.ImportJobBase: Beginning import of invoice 2020-07-20 05:53:55,842 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 2020-07-20 05:53:55,918 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/cloudduggu/.staging/job_1595247619306_0003 2020-07-20 05:53:56,031 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810) 2020-07-20 05:53:56,112 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810) 2020-07-20 05:53:57,446 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:847) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:843) 2020-07-20 05:53:59,704 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810) 2020-07-20 05:53:59,811 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:847) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:843) 2020-07-20 05:54:00,598 WARN hdfs.DataStreamer: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986) at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810) 2020-07-20 05:54:01,695 INFO db.DBInputFormat: Using read commited transaction isolation 2020-07-20 05:54:01,867 INFO mapreduce.JobSubmitter: number of splits:1 2020-07-20 05:54:01,971 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1595247619306_0003 2020-07-20 05:54:01,973 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2020-07-20 05:54:02,419 INFO impl.YarnClientImpl: Submitted application application_1595247619306_0003 2020-07-20 05:54:02,441 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1595247619306_0003/ 2020-07-20 05:54:02,444 INFO mapreduce.Job: Running job: job_1595247619306_0003 2020-07-20 05:54:31,992 INFO mapreduce.Job: Job job_1595247619306_0003 running in uber mode : false 2020-07-20 05:54:31,994 INFO mapreduce.Job: map 0% reduce 0% 2020-07-20 05:54:57,133 INFO mapreduce.Job: map 100% reduce 0% 2020-07-20 05:54:59,241 INFO mapreduce.Job: Job job_1595247619306_0003 completed successfully 2020-07-20 05:54:59,385 INFO mapreduce.Job: Counters: 32 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=225049 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=63 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=42812 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=21406 Total vcore-milliseconds taken by all map tasks=21406 Total megabyte-milliseconds taken by all map tasks=43839488 Map-Reduce Framework Map input records=3 Map output records=3 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=364 CPU time spent (ms)=5790 Physical memory (bytes) snapshot=145625088 Virtual memory (bytes) snapshot=3408330752 Total committed heap usage (bytes)=32571392 Peak Map Physical memory (bytes)=145625088 Peak Map Virtual memory (bytes)=3408330752 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=63 2020-07-20 05:54:59,418 INFO mapreduce.ImportJobBase: Transferred 63 bytes in 63.6403 seconds (0.9899 bytes/sec) 2020-07-20 05:54:59,433 INFO mapreduce.ImportJobBase: Retrieved 3 records.

We can verify imported data using the below command in Hadoop HDFS. We can see three directories are created in HDFS with table name and data will present under these directories with the name “part-m-00000”.

cloudduggu@ubuntu:~$ hadoop fs -ls /user/cloudduggu/
cloudduggu@ubuntu:~$ hadoop fs -ls /user/cloudduggu/employee/
cloudduggu@ubuntu:~$ hadoop fs -ls /user/cloudduggu/dept/
cloudduggu@ubuntu:~$ hadoop fs -ls /user/cloudduggu/invoice/
cloudduggu@ubuntu:~$ hadoop fs -cat /user/cloudduggu/invoice/part-m-00000

import all table data