HDFS – Commands

First, to execute HDFS commands, we need to start services of HDFS and yarn. To do that we use and Than we get all the services or daemon started like datanode, namenode, etc. as given below. We can check all the services using “JPS” command.

Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/hadoop-i2tutorial-namenode-ubuntu.out
localhost: starting datanode, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/hadoop-i2tutorial-datanode-ubuntu.out
Starting secondary namenodes [] starting secondarynamenode, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/hadoop-i2tutorial-secondarynamenode-ubuntu.out

starting yarn daemons
starting resourcemanager, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/yarn-i2tutorial-resourcemanager-ubuntu.out
localhost: starting nodemanager, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/yarn-i2tutorial-nodemanager-ubuntu.out

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ jps
2899 DataNode
3481 NodeManager
3819 Jps
2780 NameNode
3054 SecondaryNameNode
3358 ResourceManager

We can use the command ‘hdfs’ and see all its properties in Hadoop.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs
Usage: hdfs [--config confdir] COMMAND
 where COMMAND is one of:
 dfs run a filesystem command on the file systems supported in Hadoop.
 namenode -format format the DFS filesystem
 secondarynamenode run the DFS secondary namenode
 namenode run the DFS namenode
 journalnode run the DFS journalnode
 zkfc run the ZK Failover Controller daemon
 datanode run a DFS datanode
 dfsadmin run a DFS admin client
 haadmin run a DFS HA admin client
 fsck run a DFS filesystem checking utility
 balancer run a cluster balancing utility
 jmxget get JMX exported values from NameNode or DataNode.
 oiv apply the offline fsimage viewer to an fsimage
 oiv_legacy apply the offline fsimage viewer to an legacy fsimage
 oev apply the offline edits viewer to an edits file
 fetchdt fetch a delegation token from the NameNode
 getconf get config values from configuration
 groups get the groups which users belong to
 snapshotDiff diff two snapshots of a directory or diff the
 current directory contents with a snapshot
 lsSnapshottableDir list all snapshottable dirs owned by the current user
 Use -help to see options
 portmap run a portmap service
 nfs3 run an NFS version 3 gateway
 cacheadmin configure the HDFS cache
 crypto configure HDFS encryption zones
 version print the version

Most commands print help when invoked w/o parameters.

These command is used particularly for dfs. We can perform many operations in dfs as given below.

Mostly these commands look like Linux commands.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs
Usage: hadoop fs [generic options]
 [-appendToFile <localsrc> ... <dst>]
 [-cat [-ignoreCrc] <src> ...]
 [-checksum <src> ...]
 [-chgrp [-R] GROUP PATH...]
 [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
 [-chown [-R] [OWNER][:[GROUP]] PATH...]
 [-copyFromLocal [-f] [-p] <localsrc> ... <dst>]
 [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
 [-count [-q] <path> ...]
 [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
 [-createSnapshot <snapshotDir> [<snapshotName>]]
 [-deleteSnapshot <snapshotDir> <snapshotName>]
 [-df [-h] [<path> ...]]
 [-du [-s] [-h] <path> ...]
 [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
 [-getfacl [-R] <path>]
 [-getfattr [-R] {-n name | -d} [-e en] <path>]
 [-getmerge [-nl] <src> <localdst>]
 [-help [cmd ...]]
 [-ls [-d] [-h] [-R] [<path> ...]]
 [-mkdir [-p] <path> ...]
 [-moveFromLocal <localsrc> ... <dst>]
 [-moveToLocal <src> <localdst>]
 [-mv <src> ... <dst>]
 [-put [-f] [-p] <localsrc> ... <dst>]
 [-renameSnapshot <snapshotDir> <oldName> <newName>]
 [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
 [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
 [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
 [-setfattr {-n name [-v value] | -x name} <path>]
 [-setrep [-R] [-w] <rep> <path> ...]
 [-stat [format] <path> ...]
 [-tail [-f] <file>]
 [-test -[defsz] <path>]
 [-text [-ignoreCrc] <src> ...]
 [-touchz <path> ...]
 [-usage [cmd ...]]

Generic options supported are

-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

Below is the general command line syntax.

bin/hadoop command [genericOptions] [commandOptions]

We can see the list of files in the directory by using below command.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -ls /

Found 4 items
drwxrwxrwx - i2tutorial supergroup 0 2017-10-26 06:12 /data
drwxr-xr-x - i2tutorial supergroup 0 2017-10-29 01:41 /input
drwxrwx--- - i2tutorial supergroup 0 2017-10-29 01:47 /tmp
drwxr-xr-x - i2tutorial supergroup 0 2017-10-28 11:01 /user

‘copyToLocal’ is used for copying the files from local file system into HDFS.

Here we are copying nyse-data.txt into hdfs ‘data’ directory.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -copyToLocal /input/nyse-data.txt

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ ls
bin etc include logs share
bin-mapreduce1 examples lib nyse-data.txt src
cloudera examples-mapreduce1 libexec sbin

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -copyFromLocal nyse-data.txt /data
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -ls /data
Found 2 items
drwxr-xr-x - i2tutorial supergroup 0 2017-10-26 06:12 /data/hadoop
-rw-r--r-- 1 i2tutorial supergroup 1540 2017-11-28 21:57 /data/nyse-data.txt

Here we are keeping the complete examples folder into HDFS.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -put examples /
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -ls /
Found 5 items
drwxrwxrwx - i2tutorial supergroup 0 2017-11-28 21:57 /data
drwxr-xr-x - i2tutorial supergroup 0 2017-11-28 22:00 /examples
drwxr-xr-x - i2tutorial supergroup 0 2017-10-29 01:41 /input
drwxrwx--- - i2tutorial supergroup 0 2017-10-29 01:47 /tmp
drwxr-xr-x - i2tutorial supergroup 0 2017-10-28 11:01 /user

We can also change the replication factor of file by using ‘setrep’. Here we are changing the replication factor to 2 of nyse-data.txt file.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -setrep -R -w 2 /data/nyse-data.txt
Replication 2 set: /data/nyse-data.txt
Waiting for /data/nyse-data.txt ......................
