/    /  Tables

TABLE:

Now let us see how to create a table in hive. We have seen that the data is stored in HDFS and its metadata is stored in RDBMS. There are mainly 2 types of table i.e. Managed table and External tables.

When we create a table in hive, the default table which is created is managed table which means its Files, metastore are all managed by hive internal process and stored in the warehouse directory. Then if the managed table is dropped then all its data is lost.

In External table the files, metastore are all managed by process outside of hive. They can access data which is shared on other HDFS locations also. So we can use external tables by referring the path of the data stored. So when a table is dropped just its structure is lost in hive but its data will be available on remote location.

Let us take an example,

CREATE TABLE managed_table (name STRING);

LOAD DATA INPATH '/user/input.txt' INTO table managed_table;

It will move the file inputa.txt into Hive’s warehouse directory for the managed_table table, which hdfs://user/hive/warehouse/managed_table

So If the table is later dropped, using:

DROP TABLE managed_table;

the table, including its metadata and its data, is deleted.

The location of the external data is specified at table creation time:

CREATE EXTERNAL TABLE external_table (dummy STRING)

LOCATION '/user/external_table';

LOAD DATA INPATH '/user/input.txt' INTO TABLE external_table;

With the EXTERNAL keyword, Hive knows that it is not managing the data, so it doesn’t move it to its warehouse directory. When you drop an external table, Hive will leave the data untouched and only delete the metadata.

The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes in handy if you already have data generated. When dropping an EXTERNAL table, data in the table is NOT deleted from the file system.

An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property.

Let us see external table in practical,

Scenario 1

Here we are creating an external table and loading the data into it.

hive>create external table ext_student (name STRING, marks INT)

> row format delimited fields terminated by ','

> location '/home/hdadmin';

hive> load data inpath '/data1/student.txt' into table ext_student;

Loading data to table default.ext_student

Table default.ext_student stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]
OK
Time taken: 1.625 seconds

hive> select * from ext_student;

OK
siva     3
sai       6
rahul  9
siva     8
Time taken: 0.44 seconds, Fetched: 5 row(s)

Let us see the location where the external table is created.

hdadmin@ubuntu:~/hive-0.13.1-cdh5.3.2$ hdfs dfs -ls /home/hdadmin

Found 1 items

-rwxr-xr-x 1 hdadmin supergroup 34 2017-12-10 00:03 /home/hdadmin/student.txt

Now let us delete ‘student.txt’ in same location where we created it.

hdadmin@ubuntu:~/hive-0.13.1-cdh5.3.2$ hdfs dfs -rm /home/hdadmin/student.txt

17/12/11 00:15:15 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /home/hdadmin/student.txt Now let us try to access the data in ext_student table.

hive> select * from ext_student;

OK

Time taken: 1.47 seconds

Here we can see that we did not got any error while accessing the table. The table is empty as we have deleted the student.txt file

Now let us try to again load the data into it.

student.txt' into table ext_student;

FAILED: SemanticException Line 1:17 Invalid path ''/data1/student.txt'': No files matching path hdfs://localhost:9000/data1/student.txt

Now we are getting the error as we don’t have ‘student.txt’ file.

Now try to again put ‘student.txt’ file in /data1.

hdadmin@ubuntu:~$ hdfs dfs -put student.txt /data1

hdadmin@ubuntu:~$ hdfs dfs -ls /data1

Found 1 items

-rw-r--r-- 1 hdadmin supergroup 34 2017-12-11 00:18 /data1/student.txt

Now load the data,

hive> load data inpath '/data1/student.txt' into table ext_student;

Loading data to table default.ext_student

Table default.ext_student stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]

OK

Time taken: 1.723 seconds

As we have loaded the data, ‘student.txt’ file is automatically created in the path given by ‘Location’ parameter.

hdadmin@ubuntu:~/hive-0.13.1-cdh5.3.2$ hdfs dfs -ls /home/hdadmin

Found 1 items

-rwxr-xr-x 1 hdadmin supergroup 34 2017-12-11 00:18 /home/hdadmin/student.txt

hive> select * from ext_student;

OK

siva     3

sai       6

rahul  9

siva     8

Time taken: 1.47 seconds, Fetched: 5 row(s)

Scenario-2:

Now let us drop the external table and see what happens,

hive> show tables;

OK

ext_student

Time taken: 0.225 seconds, Fetched: 2 row(s)

hive> drop table ext_student;

OK

Time taken: 0.779 seconds

hive> show tables;

OK

Time taken: 0.089 seconds, Fetched: 1 row(s)

We have dropped the table but ‘student.txt’ file is present in the path given by “Location” parameter.

hdadmin@ubuntu:~/hive-0.13.1-cdh5.3.2$ hdfs dfs -ls /home/hdadmin

Found 1 items

-rwxr-xr-x 1 hdadmin supergroup 34 2017-12-11 00:18 /home/hdadmin/student.txt

/home/hdadmin/student.txt