Partition means dividing the table based on a column. They are defined at table creation time using ‘Partitioned by’ clause. It enables us to access data easily based on column. When we partition a table, a new directory is created based on number of columns. Suppose, if we partition a table by date, the records of same date will be stored in one partition. So if we want to retrieve any data we can do this easily by seeing the date.
We can partition the data by date and in that we can sub-partition by state also. Then state partition sub-directory will be created in date partition directory. With partition we can access data in less time. Here it is physical partition not a logical partition. It is very important optimization technique. Partition can be created as,
CREATE TABLE employee (id BIGINT, name STRING) PARTITIONED BY (date1 STRING, country STRING); ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
Here we are storing the table information into ‘TextFile’ and using the separator as
When we load data into a partitioned table, the partition values are specified explicitly:
LOAD DATA LOCAL INPATH 'input/hive/partitions/input1' INTO TABLE employee PARTITION (date1='2017-11-11', country='India');
Then we will have partition sub-directory created as,
/user/hive/warehouse/employee ├── date1=2017-11-11/ │ ├── country=India/ │ │ ├── input1-1 │ │ └── input1-2 │ └── country=US/ │ └── input1-3 └── date1=2017-11-12/ ├── country=India/ │ └── input1-4
The employee table has two date partitions (2001-11-11 and 2001-11-12) corresponding to subdirectories called date1=2001-11-11 and date1=2001-11-12 and two country sub-partitions India and US.
We can use the below command to see the partitions of the table.
hive> SHOW PARTITIONS employee;
To get the required information from the partition we should use the ‘Where’ command compulsory.
SELECT * FROM employee WHERE country='India';