The way to check a HDFS directory's size?

hdfs dfs
hdfs commands
hdfs directory path
change directory in hadoop
create directory in hdfs
list hdfs directories
search for a directory in hdfs
hdfs move directory

I know du -sh in common Linux filesystems. But how to do that with HDFS?

Prior to 0.20.203, and officially deprecated in 2.6.0:

hadoop fs -dus [directory]

Since 0.20.203 (dead link) 1.0.4 and still compatible through 2.6.0:

hdfs dfs -du [-s] [-h] URI [URI …]

You can also run hadoop fs -help for more info and specifics.

How to find Hadoop hdfs directory on my system?, hdfs dfs -du -h /"path to specific hdfs directory". image. Note the following about the output of the du –h command shown here: The first column  This command displays the list of files in the current directory and all it’s details.In the output of this command, the 5th column displays the size of file in bytes. For e.g. command hadoop fs -ls

hadoop fs -du -s -h /path/to/dir displays a directory's size in readable form.

How to check size of HDFS directory?, You can use the “hadoop fs -ls command”. This command displays the list of files in the current directory and all it's details.In the output of this  hdfs dfs-du -h /"path to specific hdfs directory" Note the following about the output of the du –h command shown here: The first column shows the actual size (raw size) of the files that users have placed in the various HDFS directories. The second column shows the actual space consumed by those files in HDFS.

The way to check a HDFS directory's size? Ask, Check a file/directory state in HDFS. Usage: hdfs dfs -state <path>. Example: $ hdfs dfs -stat '%F %b %n %o' /fibrevillage/fstab regular file 4359  Merge files in HDFS. hdfs dfs -getmerge Takes a source directory file or files as input and concatenates files in src into the local destination file. Concatenates files in the same directory or from multiple directories as long as we specify their location and outputs them to the local file system, as can be seen in the Usage below.

With this you will get size in GB

hdfs dfs -du PATHTODIRECTORY | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

Using hdfs command line to manage files and directories on Hadoop, 1 Hadoop command to check whether the directory exists or not: If the file schema.xml present in the given hdfs path,it will return the code as  There is no cd (change directory) command in hdfs file system. You can only list the directories and use them for reaching the next directory. You have to navigate manually by providing the complete path using the ls command.

When trying to calculate the total of a particular group of files within a directory the -s option does not work (in Hadoop 2.7.1). For example:

Directory structure:

some_dir
├abc.txt    
├count1.txt 
├count2.txt 
└def.txt    

Assume each file is 1 KB in size. You can summarize the entire directory with:

hdfs dfs -du -s some_dir
4096 some_dir

However, if I want the sum of all files containing "count" the command falls short.

hdfs dfs -du -s some_dir/count*
1024 some_dir/count1.txt
1024 some_dir/count2.txt

To get around this I usually pass the output through awk.

hdfs dfs -du some_dir/count* | awk '{ total+=$1 } END { print total }'
2048 

To check if the file or directory exists in HDFS -, Testing for Files. You can check whether a certain HDFS file path exists and whether that path is a directory or a file with the test command: $ hdfs  NameNode WebUi Check. From the NameNode WebUI, determine if all NameNodes and DataNodes are up and running. You can also check snapshot status, cluster startup status etc.. If you are on a highly available HDFS cluster, go to the StandbyNameNode web UI to see if all DataNodes are up and running:

Managing HDFS Storage, Manage Files on HDFS via Cli/Ambari Files View The command mkdir takes the path URI's as an argument and creates a directory or multiple directories. As these files are in HDFS, you will not be able to find them on the local filesystem - they are distributed across your cluster nodes as blocks (for real files), and metadata entries (for files and directories) in the NameNode. Thanks a lot for your explanation.

Manage Files on HDFS with the Command Line, The path of the snapshottable directory. See also the corresponding Java API void allowSnapshot(Path path) in HdfsAdmin. Disallow Snapshots. It is very similar to the way you check for the file in Unix Directory using Unix Command. You just have to type hadoop fs -ls /Directorypath/filename.extn It will list the files similar to name mentioned in command present in the directory.

Apache Hadoop 3.2.1 – HDFS Snapshots, hdfs dfs –put test-file /user/clsadmin/test-dir An alternative way to look at the directory structure, contents, owners, and size is to navigate to  The hadoop fs -ls command allows you to view the files and directories in your HDFS filesystem, much as the ls command works on Linux / OS X / *nix. Default Home Directory in HDFS A user’s home directory in HDFS is located at /user/userName.

Comments
  • -du -s (-dus is deprecated)
  • For newer versions of hdfs, hdfs -du -s -h /path/to/dir it's more appropriate.
  • hdfs dfs -du PATHTODIRECTORY | awk '/^[0-9]+/ { print int($1/(10243) " [GB]\t" $2 }' - Please update your command. Two closing bracket after 10243. It should be only 1
  • I got an error with "hdfs", the way it worked for me was: hadoop fs -du -h /user (i didn't need to use sudo)
  • sudo is not needed and should be used sparingly.
  • duplicate answer