/    /  HDFS – Commands

HDFS – Commands:

First, to execute HDFS commands, we need to start services of HDFS and yarn. To do that we use start-dfs.sh and start-yarn.sh. Than we get all the services or daemon started like datanode, namenode, etc. as given below. We can check all the services using “JPS” command.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/hadoop-i2tutorial-namenode-ubuntu.out
localhost: starting datanode, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/hadoop-i2tutorial-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/hadoop-i2tutorial-secondarynamenode-ubuntu.out

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/yarn-i2tutorial-resourcemanager-ubuntu.out
localhost: starting nodemanager, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/yarn-i2tutorial-nodemanager-ubuntu.out

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ jps
2899 DataNode
3481 NodeManager
3819 Jps
2780 NameNode
3054 SecondaryNameNode
3358 ResourceManager


We can use the command ‘hdfs’ and see all its properties in Hadoop.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs
Usage: hdfs [--config confdir] COMMAND
 where COMMAND is one of:
 dfs run a filesystem command on the file systems supported in Hadoop.
 namenode -format format the DFS filesystem
 secondarynamenode run the DFS secondary namenode
 namenode run the DFS namenode
 journalnode run the DFS journalnode
 zkfc run the ZK Failover Controller daemon
 datanode run a DFS datanode
 dfsadmin run a DFS admin client
 haadmin run a DFS HA admin client
 fsck run a DFS filesystem checking utility
 balancer run a cluster balancing utility
 jmxget get JMX exported values from NameNode or DataNode.
 oiv apply the offline fsimage viewer to an fsimage
 oiv_legacy apply the offline fsimage viewer to an legacy fsimage
 oev apply the offline edits viewer to an edits file
 fetchdt fetch a delegation token from the NameNode
 getconf get config values from configuration
 groups get the groups which users belong to
 snapshotDiff diff two snapshots of a directory or diff the
 current directory contents with a snapshot
 lsSnapshottableDir list all snapshottable dirs owned by the current user
 Use -help to see options
 portmap run a portmap service
 nfs3 run an NFS version 3 gateway
 cacheadmin configure the HDFS cache
 crypto configure HDFS encryption zones
 version print the version

Most commands print help when invoked w/o parameters.

These command is used particularly for dfs. We can perform many operations in dfs as given below.

Mostly these commands look like Linux commands.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs
Usage: hadoop fs [generic options]
 [-appendToFile <localsrc> ... <dst>]
 [-cat [-ignoreCrc] <src> ...]
 [-checksum <src> ...]
 [-chgrp [-R] GROUP PATH...]
 [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
 [-chown [-R] [OWNER][:[GROUP]] PATH...]
 [-copyFromLocal [-f] [-p] <localsrc> ... <dst>]
 [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
 [-count [-q] <path> ...]
 [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
 [-createSnapshot <snapshotDir> [<snapshotName>]]
 [-deleteSnapshot <snapshotDir> <snapshotName>]
 [-df [-h] [<path> ...]]
 [-du [-s] [-h] <path> ...]
 [-expunge]
 [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
 [-getfacl [-R] <path>]
 [-getfattr [-R] {-n name | -d} [-e en] <path>]
 [-getmerge [-nl] <src> <localdst>]
 [-help [cmd ...]]
 [-ls [-d] [-h] [-R] [<path> ...]]
 [-mkdir [-p] <path> ...]
 [-moveFromLocal <localsrc> ... <dst>]
 [-moveToLocal <src> <localdst>]
 [-mv <src> ... <dst>]
 [-put [-f] [-p] <localsrc> ... <dst>]
 [-renameSnapshot <snapshotDir> <oldName> <newName>]
 [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
 [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
 [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
 [-setfattr {-n name [-v value] | -x name} <path>]
 [-setrep [-R] [-w] <rep> <path> ...]
 [-stat [format] <path> ...]
 [-tail [-f] <file>]
 [-test -[defsz] <path>]
 [-text [-ignoreCrc] <src> ...]
 [-touchz <path> ...]
 [-usage [cmd ...]]

Generic options supported are

-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

Below is the general command line syntax.

bin/hadoop command [genericOptions] [commandOptions]

We can see the list of files in the directory by using below command.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -ls /

Found 4 items
drwxrwxrwx - i2tutorial supergroup 0 2017-10-26 06:12 /data
drwxr-xr-x - i2tutorial supergroup 0 2017-10-29 01:41 /input
drwxrwx--- - i2tutorial supergroup 0 2017-10-29 01:47 /tmp
drwxr-xr-x - i2tutorial supergroup 0 2017-10-28 11:01 /user

‘copyToLocal’ is used for copying the files from local file system into HDFS.

Here we are copying nyse-data.txt into hdfs ‘data’ directory.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -copyToLocal /input/nyse-data.txt

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ ls
bin etc include logs share
bin-mapreduce1 examples lib nyse-data.txt src
cloudera examples-mapreduce1 libexec sbin

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -copyFromLocal nyse-data.txt /data
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -ls /data
Found 2 items
drwxr-xr-x - i2tutorial supergroup 0 2017-10-26 06:12 /data/hadoop
-rw-r--r-- 1 i2tutorial supergroup 1540 2017-11-28 21:57 /data/nyse-data.txt

Here we are keeping the complete examples folder into HDFS.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -put examples /
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -ls /
Found 5 items
drwxrwxrwx - i2tutorial supergroup 0 2017-11-28 21:57 /data
drwxr-xr-x - i2tutorial supergroup 0 2017-11-28 22:00 /examples
drwxr-xr-x - i2tutorial supergroup 0 2017-10-29 01:41 /input
drwxrwx--- - i2tutorial supergroup 0 2017-10-29 01:47 /tmp
drwxr-xr-x - i2tutorial supergroup 0 2017-10-28 11:01 /user

We can also change the replication factor of file by using ‘setrep’. Here we are changing the replication factor to 2 of nyse-data.txt file.

i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -setrep -R -w 2 /data/nyse-data.txt
Replication 2 set: /data/nyse-data.txt
Waiting for /data/nyse-data.txt ......................