small files and hadoop’s hdfs (bonus: an inode formula)

This topic is a fairly detailed and better described via sources such as this HDFS Scalability whitepaper, but basically it comes down to the NN needing to keep track of “objects” (i.e. in memory).  These objects are directories, files and blocks.  For example; if you had a tree that only had 3 top-level directories *and* each of these had 3 files *and* each of these took up 3 blocks, then your NN would need to keep track of 39 objects as detailed below.

[hdfs@sandbox ~]$ hdfs dfs -count /
         248          510          560952712 /

1 GB of heap size for every 1 million files

Not to prescribe a setting for your cluster, but at a minimum, even for early POC efforts, increase your NN heap size to, at least, 4096 MB.  If using Ambari, the following screenshot shows you where to find this setting.

So, if we are actively monitoring our NN heap size and keeping it inline with the number of objects the NN is managing we can more accurately fine-tune our expectations for each cluster.

On the flip side, it seems easy enough to manage the amount of disk space we have on HDFS by all the inherent reporting abilities of HDFS (and tools like Ambari), not to mention some very simple math.  I recently did get asked about how inodes themselves can prevent the cluster from allowing new files to be added.  As a refresher, inodes keep track of all the files on a given (physical, not HDFS) file system.  Here is an example of a partition that has 99% of the space allocated showing free, but the inodes are all used up.

Filesystem    1K-blocks     Used  Available Use% Mounted on
/dev/sdf1    3906486416 10586920 3700550300   1% /data04

Filesystem          Inodes  IUsed    IFree IUse% Mounted on
/dev/sdf1           953856 953856        0  100% /data04

Of course, things are often more complicated when we dive deeper.  First, each file will be broken up into appropriately sized blocks and then each of these will have three copies.  So, our example file above with 3 blocks will need to have 9 separate physical files stored by the DNs.  As you peel back the onion, you’ll see there really are two files for each block stored; the block itself and then a second file that contains metadata & checksum values.  In fact, it gets a tiny bit more complicated than that by the DNs needing to have a directory structure so they don’t overrun a flat directory.  Chapter 10 of Hadoop: The Definitive Guide (3rd Edition) has a good write up on this as you can see here which is further visualized by the abbreviated directory listing from a DN’s block data below.

[hdfs@sandbox finalized]$ pwd
/hadoop/hdfs/data/current/BP-1200952396-10.0.2.15-1398089695400/current/finalized
[hdfs@sandbox finalized]$ ls -al
total 49940
drwxr-xr-x 66 hdfs hadoop    12288 Apr 21 07:18 .
drwxr-xr-x  4 hdfs hadoop     4096 Jul  3 12:33 ..
-rw-r--r--  1 hdfs hadoop        7 Apr 21 07:16 blk_1073741825
-rw-r--r--  1 hdfs hadoop       11 Apr 21 07:16 blk_1073741825_1001.meta
-rw-r--r--  1 hdfs hadoop       42 Apr 21 07:16 blk_1073741826
-rw-r--r--  1 hdfs hadoop       11 Apr 21 07:16 blk_1073741826_1002.meta
-rw-r--r--  1 hdfs hadoop   392124 Apr 21 07:18 blk_1073741887
-rw-r--r--  1 hdfs hadoop     3071 Apr 21 07:18 blk_1073741887_1063.meta
-rw-r--r--  1 hdfs hadoop  1363159 Apr 21 07:18 blk_1073741888
-rw-r--r--  1 hdfs hadoop    10659 Apr 21 07:18 blk_1073741888_1064.meta
drwxr-xr-x  2 hdfs hadoop     4096 Apr 21 07:22 subdir0
drwxr-xr-x  2 hdfs hadoop     4096 Jun  3 08:45 subdir1
drwxr-xr-x  2 hdfs hadoop     4096 Apr 21 07:21 subdir63
drwxr-xr-x  2 hdfs hadoop     4096 Apr 21 07:18 subdir9

To start the math, let’s calculate the NbrOfAvailBlocksPerCluster which is simply the inode limit per disk TIMES the number of disks in a DN TIMES the number of DNs in the cluster and then DIVIDE that number by the 6.3 value described above.  For example, the following values surface for a cluster that has DNs with 10 disks each whose JBOD file system partitions can support 1 million inodes.

So, let’s see if we can write that down in an algebraic formula.  Remember, we’re just thinking about inodes here — the disk sizing calculation is MUCH easier.

NbrOfFilesInodeSettingCanSupport = ( ( InodeSettingPerDisk * NbrDisksPerDN * NbrDNs ) / ( ReplFactor * 2.1 ) )  /  AvgNbrBlocksPerFile

Published by lestermartin

Developer advocate, trainer, blogger, and data engineer focused on data lake & streaming frameworks including Trino, Hive, Spark, Flink, Kafka and NiFi.

2 thoughts on “small files and hadoop’s hdfs (bonus: an inode formula)

Leave a Reply to lestermartinCancel reply

Discover more from Lester Martin (l11n)

Subscribe now to keep reading and get access to the full archive.

Continue reading