Apache Sqoop Import-All Tables

Apache Sqoop Import-All-Tables tool is used to import all tables from RDBMS to Hadoop HDFS. For each table, a separate directory is created with a table name, and then in that directory table’s data will be stored.
Import-All-Tables Syntax

Below syntax is used to import all tables in Hadoop HDFS.
$ sqoop import-all-tables (gen-args) (imp-args)
Import-All-Tables Example

Let us see an example to import all the below tables from the MySQL database to Hadoop HDFS.
Command:

cloudduggu@ubuntu:~$ sqoop import-all-tables --connect jdbc:mysql://localhost/userdata?serverTimezone=UTC --username root
--password cloudduggu --m 1

table list
Output:

We will see the following output after job completion.

            

                2020-07-20 05:51:23,374 INFO mapreduce.Job: Running job: job_1595247619306_0001
                2020-07-20 05:52:10,699 INFO mapreduce.Job: Job job_1595247619306_0001 running in uber mode : false
                2020-07-20 05:52:10,702 INFO mapreduce.Job: map 0% reduce 0%
                2020-07-20 05:52:36,626 INFO mapreduce.Job: map 100% reduce 0%
                2020-07-20 05:52:38,869 INFO mapreduce.Job: Job job_1595247619306_0001 completed successfully
                2020-07-20 05:52:39,570 INFO mapreduce.Job: Counters: 32
                File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=225037
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=63
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=43290
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=21645
                Total vcore-milliseconds taken by all map tasks=21645
                Total megabyte-milliseconds taken by all map tasks=44328960
                Map-Reduce Framework
                Map input records=3
                Map output records=3
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=342
                CPU time spent (ms)=5700
                Physical memory (bytes) snapshot=146235392
                Virtual memory (bytes) snapshot=3410919424
                Total committed heap usage (bytes)=32571392
                Peak Map Physical memory (bytes)=146235392
                Peak Map Virtual memory (bytes)=3410919424
                File Input Format Counters
                Bytes Read=0
                File Output Format Counters
                Bytes Written=63
                2020-07-20 05:52:39,601 INFO mapreduce.ImportJobBase: Transferred 63 bytes in 94.2109 seconds (0.6687
                bytes/sec)
                2020-07-20 05:52:39,628 INFO mapreduce.ImportJobBase: Retrieved 3 records.
                2020-07-20 05:52:39,629 INFO tool.CodeGenTool: Beginning code generation
                2020-07-20 05:52:39,836 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS
                t LIMIT 1
                2020-07-20 05:52:39,896 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/cloudduggu/hadoop
                Note: /tmp/sqoop-cloudduggu/compile/fc94ffa41264c7eeb7b2c6e4ad14db35/employee.java uses or overrides a
                deprecated API.
                Note: Recompile with -Xlint:deprecation for details.
                2020-07-20 05:52:46,273 INFO orm.CompilationManager: Writing jar file:
                /tmp/sqoop-cloudduggu/compile/fc94ffa41264c7eeb7b2c6e4ad14db35/employee.jar
                2020-07-20 05:52:46,309 INFO mapreduce.ImportJobBase: Beginning import of employee
                2020-07-20 05:52:46,456 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
                2020-07-20 05:52:46,551 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path:
                /tmp/hadoop-yarn/staging/cloudduggu/.staging/job_1595247619306_0002
                2020-07-20 05:52:48,276 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:847)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:843)
                2020-07-20 05:52:49,028 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:847)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:843)
                2020-07-20 05:52:50,329 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:847)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:843)
                2020-07-20 05:52:51,871 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810)
                2020-07-20 05:52:51,964 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810)
                2020-07-20 05:52:53,928 INFO db.DBInputFormat: Using read commited transaction isolation
                2020-07-20 05:52:54,509 INFO mapreduce.JobSubmitter: number of splits:1
                2020-07-20 05:52:54,642 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1595247619306_0002
                2020-07-20 05:52:54,644 INFO mapreduce.JobSubmitter: Executing with tokens: []
                2020-07-20 05:52:54,906 INFO impl.YarnClientImpl: Submitted application application_1595247619306_0002
                2020-07-20 05:52:54,950 INFO mapreduce.Job: The url to track the job:
                http://ubuntu:8088/proxy/application_1595247619306_0002/
                2020-07-20 05:52:54,954 INFO mapreduce.Job: Running job: job_1595247619306_0002
                2020-07-20 05:53:28,086 INFO mapreduce.Job: Job job_1595247619306_0002 running in uber mode : false
                2020-07-20 05:53:28,087 INFO mapreduce.Job: map 0% reduce 0%
                2020-07-20 05:53:51,098 INFO mapreduce.Job: map 100% reduce 0%
                2020-07-20 05:53:53,169 INFO mapreduce.Job: Job job_1595247619306_0002 completed successfully
                2020-07-20 05:53:53,298 INFO mapreduce.Job: Counters: 32
                File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=225074
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=462
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=39762
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=19881
                Total vcore-milliseconds taken by all map tasks=19881
                Total megabyte-milliseconds taken by all map tasks=40716288
                Map-Reduce Framework
                Map input records=12
                Map output records=12
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=281
                CPU time spent (ms)=5210
                Physical memory (bytes) snapshot=153419776
                Virtual memory (bytes) snapshot=3410935808
                Total committed heap usage (bytes)=32571392
                Peak Map Physical memory (bytes)=153419776
                Peak Map Virtual memory (bytes)=3410935808
                File Input Format Counters
                Bytes Read=0
                File Output Format Counters
                Bytes Written=462
                2020-07-20 05:53:53,342 INFO mapreduce.ImportJobBase: Transferred 462 bytes in 66.9515 seconds (6.9005
                bytes/sec)
                2020-07-20 05:53:53,367 INFO mapreduce.ImportJobBase: Retrieved 12 records.
                2020-07-20 05:53:53,374 INFO tool.CodeGenTool: Beginning code generation
                2020-07-20 05:53:53,502 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `invoice` AS t
                LIMIT 1
                2020-07-20 05:53:53,533 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/cloudduggu/hadoop
                Note: /tmp/sqoop-cloudduggu/compile/fc94ffa41264c7eeb7b2c6e4ad14db35/invoice.java uses or overrides a
                deprecated API.
                Note: Recompile with -Xlint:deprecation for details.
                2020-07-20 05:53:55,687 INFO orm.CompilationManager: Writing jar file:
                /tmp/sqoop-cloudduggu/compile/fc94ffa41264c7eeb7b2c6e4ad14db35/invoice.jar
                2020-07-20 05:53:55,727 INFO mapreduce.ImportJobBase: Beginning import of invoice
                2020-07-20 05:53:55,842 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
                2020-07-20 05:53:55,918 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path:
                /tmp/hadoop-yarn/staging/cloudduggu/.staging/job_1595247619306_0003
                2020-07-20 05:53:56,031 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810)
                2020-07-20 05:53:56,112 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810)
                2020-07-20 05:53:57,446 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:847)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:843)
                2020-07-20 05:53:59,704 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810)
                2020-07-20 05:53:59,811 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:847)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:843)
                2020-07-20 05:54:00,598 WARN hdfs.DataStreamer: Caught exception
                java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Thread.join(Thread.java:1252)
                at java.lang.Thread.join(Thread.java:1326)
                at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
                at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640)
                at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810)
                2020-07-20 05:54:01,695 INFO db.DBInputFormat: Using read commited transaction isolation
                2020-07-20 05:54:01,867 INFO mapreduce.JobSubmitter: number of splits:1
                2020-07-20 05:54:01,971 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1595247619306_0003
                2020-07-20 05:54:01,973 INFO mapreduce.JobSubmitter: Executing with tokens: []
                2020-07-20 05:54:02,419 INFO impl.YarnClientImpl: Submitted application application_1595247619306_0003
                2020-07-20 05:54:02,441 INFO mapreduce.Job: The url to track the job:
                http://ubuntu:8088/proxy/application_1595247619306_0003/
                2020-07-20 05:54:02,444 INFO mapreduce.Job: Running job: job_1595247619306_0003
                2020-07-20 05:54:31,992 INFO mapreduce.Job: Job job_1595247619306_0003 running in uber mode : false
                2020-07-20 05:54:31,994 INFO mapreduce.Job: map 0% reduce 0%
                2020-07-20 05:54:57,133 INFO mapreduce.Job: map 100% reduce 0%
                2020-07-20 05:54:59,241 INFO mapreduce.Job: Job job_1595247619306_0003 completed successfully
                2020-07-20 05:54:59,385 INFO mapreduce.Job: Counters: 32
                File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=225049
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=63
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=42812
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=21406
                Total vcore-milliseconds taken by all map tasks=21406
                Total megabyte-milliseconds taken by all map tasks=43839488
                Map-Reduce Framework
                Map input records=3
                Map output records=3
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=364
                CPU time spent (ms)=5790
                Physical memory (bytes) snapshot=145625088
                Virtual memory (bytes) snapshot=3408330752
                Total committed heap usage (bytes)=32571392
                Peak Map Physical memory (bytes)=145625088
                Peak Map Virtual memory (bytes)=3408330752
                File Input Format Counters
                Bytes Read=0
                File Output Format Counters
                Bytes Written=63
                2020-07-20 05:54:59,418 INFO mapreduce.ImportJobBase: Transferred 63 bytes in 63.6403 seconds (0.9899
                bytes/sec)
                2020-07-20 05:54:59,433 INFO mapreduce.ImportJobBase: Retrieved 3 records.
We can verify imported data using the below command in Hadoop HDFS. We can see three directories are created in HDFS with table name and data will present under these directories with the name “part-m-00000”.
cloudduggu@ubuntu:~$ hadoop fs -ls /user/cloudduggu/
cloudduggu@ubuntu:~$ hadoop fs -ls /user/cloudduggu/employee/
cloudduggu@ubuntu:~$ hadoop fs -ls /user/cloudduggu/dept/
cloudduggu@ubuntu:~$ hadoop fs -ls /user/cloudduggu/invoice/
cloudduggu@ubuntu:~$ hadoop fs -cat /user/cloudduggu/invoice/part-m-00000

import all table data
Apache Sqoop Import-All Tables

Sqoop - Import

Sqoop - Export

Import-All-Tables Syntax

Import-All-Tables Example

Command:

Output: