HDFS – Commands:
First, to execute HDFS commands, we need to start services of HDFS and yarn. To do that we use start-dfs.sh and start-yarn.sh. Than we get all the services or daemon started like datanode, namenode, etc. as given below. We can check all the services using “JPS” command.
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ start-dfs.sh Starting namenodes on [localhost] localhost: starting namenode, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/hadoop-i2tutorial-namenode-ubuntu.out localhost: starting datanode, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/hadoop-i2tutorial-datanode-ubuntu.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/hadoop-i2tutorial-secondarynamenode-ubuntu.out i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/yarn-i2tutorial-resourcemanager-ubuntu.out localhost: starting nodemanager, logging to /home/i2tutorial/hadoop-2.5.0-cdh5.3.2/logs/yarn-i2tutorial-nodemanager-ubuntu.out i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ jps 2899 DataNode 3481 NodeManager 3819 Jps 2780 NameNode 3054 SecondaryNameNode 3358 ResourceManager
We can use the command ‘hdfs’ and see all its properties in Hadoop.
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs Usage: hdfs [--config confdir] COMMAND where COMMAND is one of: dfs run a filesystem command on the file systems supported in Hadoop. namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode journalnode run the DFS journalnode zkfc run the ZK Failover Controller daemon datanode run a DFS datanode dfsadmin run a DFS admin client haadmin run a DFS HA admin client fsck run a DFS filesystem checking utility balancer run a cluster balancing utility jmxget get JMX exported values from NameNode or DataNode. oiv apply the offline fsimage viewer to an fsimage oiv_legacy apply the offline fsimage viewer to an legacy fsimage oev apply the offline edits viewer to an edits file fetchdt fetch a delegation token from the NameNode getconf get config values from configuration groups get the groups which users belong to snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot lsSnapshottableDir list all snapshottable dirs owned by the current user Use -help to see options portmap run a portmap service nfs3 run an NFS version 3 gateway cacheadmin configure the HDFS cache crypto configure HDFS encryption zones version print the version
Most commands print help when invoked w/o parameters.
These command is used particularly for dfs. We can perform many operations in dfs as given below.
Mostly these commands look like Linux commands.
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs Usage: hadoop fs [generic options] [-appendToFile <localsrc> ... <dst>] [-cat [-ignoreCrc] <src> ...] [-checksum <src> ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] <localsrc> ... <dst>] [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-count [-q] <path> ...] [-cp [-f] [-p | -p[topax]] <src> ... <dst>] [-createSnapshot <snapshotDir> [<snapshotName>]] [-deleteSnapshot <snapshotDir> <snapshotName>] [-df [-h] [<path> ...]] [-du [-s] [-h] <path> ...] [-expunge] [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-getfacl [-R] <path>] [-getfattr [-R] {-n name | -d} [-e en] <path>] [-getmerge [-nl] <src> <localdst>] [-help [cmd ...]] [-ls [-d] [-h] [-R] [<path> ...]] [-mkdir [-p] <path> ...] [-moveFromLocal <localsrc> ... <dst>] [-moveToLocal <src> <localdst>] [-mv <src> ... <dst>] [-put [-f] [-p] <localsrc> ... <dst>] [-renameSnapshot <snapshotDir> <oldName> <newName>] [-rm [-f] [-r|-R] [-skipTrash] <src> ...] [-rmdir [--ignore-fail-on-non-empty] <dir> ...] [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]] [-setfattr {-n name [-v value] | -x name} <path>] [-setrep [-R] [-w] <rep> <path> ...] [-stat [format] <path> ...] [-tail [-f] <file>] [-test -[defsz] <path>] [-text [-ignoreCrc] <src> ...] [-touchz <path> ...] [-usage [cmd ...]]
Generic options supported are
-conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|jobtracker:port> specify a job tracker -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
Below is the general command line syntax.
bin/hadoop command [genericOptions] [commandOptions]
We can see the list of files in the directory by using below command.
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -ls / Found 4 items drwxrwxrwx - i2tutorial supergroup 0 2017-10-26 06:12 /data drwxr-xr-x - i2tutorial supergroup 0 2017-10-29 01:41 /input drwxrwx--- - i2tutorial supergroup 0 2017-10-29 01:47 /tmp drwxr-xr-x - i2tutorial supergroup 0 2017-10-28 11:01 /user
‘copyToLocal’ is used for copying the files from local file system into HDFS.
Here we are copying nyse-data.txt into hdfs ‘data’ directory.
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -copyToLocal /input/nyse-data.txt i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ ls bin etc include logs share bin-mapreduce1 examples lib nyse-data.txt src cloudera examples-mapreduce1 libexec sbin i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -copyFromLocal nyse-data.txt /data i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -ls /data Found 2 items drwxr-xr-x - i2tutorial supergroup 0 2017-10-26 06:12 /data/hadoop -rw-r--r-- 1 i2tutorial supergroup 1540 2017-11-28 21:57 /data/nyse-data.txt
Here we are keeping the complete examples folder into HDFS.
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -put examples / i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -ls / Found 5 items drwxrwxrwx - i2tutorial supergroup 0 2017-11-28 21:57 /data drwxr-xr-x - i2tutorial supergroup 0 2017-11-28 22:00 /examples drwxr-xr-x - i2tutorial supergroup 0 2017-10-29 01:41 /input drwxrwx--- - i2tutorial supergroup 0 2017-10-29 01:47 /tmp drwxr-xr-x - i2tutorial supergroup 0 2017-10-28 11:01 /user
We can also change the replication factor of file by using ‘setrep’. Here we are changing the replication factor to 2 of nyse-data.txt file.
i2tutorial@ubuntu:~/hadoop-2.5.0-cdh5.3.2$ hdfs dfs -setrep -R -w 2 /data/nyse-data.txt Replication 2 set: /data/nyse-data.txt Waiting for /data/nyse-data.txt ......................