/  Sqoop Tutorial

Sqoop Tutorial:

Apahe Sqoop is a tool used to extract data from structured data sources into Hadoop. It can be used to transfer huge data between Hadoop and other relational database systems. We can import the data from Relational databases such as MySql, Oracle into Hadoop or Hbase for processing purpose and after completing processing we can again export the data back into RDBMS. The processing can be done with Map-Reduce programs or hive.

This whole process is automated by Sqoop, where the schema is described in the database only for the data which is being imported. While we are using Hadoop for this purpose, we get the features of fault tolerance and parallel execution in Sqoop as default.

We use mainly Map-reduce framework in sqoop so when we import the data automatically a map-reduce job starts taking place in background.

 

So, let us first start how to install sqoop and its other features.

 

How to install Sqoop?

We can directly install sqoop from apache official website. Similar to pig or hive we can untar the sqoop file, afterwards we get sqoop home directory like /home/username/sqoop-a.b.c/. We can go to that directory and give /bin/sqoop for sqoop command line interface to start. When we type above command we get sqoop shell. Learn now..

 

How to import tables in sqoop?

Sqoop uses import for importing table from RDBMS to HDFS. Here each table which is imported is represented as different record in HDFS. First of all to use this feature, we need to create a database in Mysql. In Linux environment we can use sudo yum install mysql mysql-server to install Mysql. Next let us create a sample database in it. Learn more..

 

Understanding Internals of sqoop import:

Let us first see how the data is imported into Hadoop. We import structured data from any database into Hadoop, for this Sqoop will read the table row by row into HDFS.As we have seen the import process is done parallel so the output files which are generated are multiple files. These files can be text files, sequence files, avro files, etc. Learn more..

 

Few Import examples in Sqoop:

A basic import of a table named EMPLOYEES in the test database, A basic import requiring a login, Selecting specific columns from the EMPLOYEES table, Controlling the import parallelism (using 8 parallel tasks). Learn more..

 

What are the Other features of Import in sqoop?

We can also import a subset of columns form the table, import particular rows by using ‘where’ clause. Instead of using column, rows we can also give a query for importing purpose. Example,–columns “name,employee_id,jobtitle”. Learn more..

 

What are the Incremental Imports in Sqoop?

We have an option called Incremental imports which can be used to import only the rows which have been changed since previous Import. Sqoop provides 2 types of import-append and lastmodified. Learn more..

 

How to Import Data into Hive?

Until now we have used Import tool for importing data into HDFS. Similarly, if we have hive metastore associated with HDFS cluster, then we can import data into hive by creating table in hive. We can use ‘—hive-import’ for importing data into hive.If the hive table already exists we can use’—hive-overwrite’ option for overwriting the table in hive. Learn more..

 

Understanding Export in Sqoop:

The Export tool is used for exporting the data back from HDFS to any remote database of RDBMS. Like we can do all the analysis using Hive and export the generated data back to any external database. Here the target table must already exist in the remote database. Learn more..

 

Few Export examples in Sqoop?

This example takes the files in /results/bar_data and injects their contents in to the bar table in the foo database on db.example.com. The target table must already exist in the database. Sqoop performs a set of INSERT INTO operations, without regard for existing content. Learn more..

 

What are the Important Sqoop Commands?

Validation is used for validating the data during import or export by comparing the number of row count on both source and destination. Some of the examples of this are,A basic import of a table named EMPLOYEES in the test database that uses validation to validate the row counts. Learn more..

 

What are the Important Sqoop Commands?

port by comparing the number of row count on both source and destination. Some of the examples of this are,A basic import of a table named EMPLOYEES in the test database that uses validation to validate the row counts. Learn more..