Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Sanat Dash
7 min readJan 13, 2022

📌 What is LVM? 🤔

You can think of LVM as “dynamic partitions”, meaning that you can create/resize/delete LVM “partitions” (they’re called “Logical Volumes” in LVM-speak) from the command line while your Linux system is running: no need to reboot the system to make the kernel aware of the newly-created or resized partitions.

LVM is a tool for logical volume management which includes allocating disks, striping, mirroring and resizing logical volumes.

With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.

Other nice features that LVM “Logical Volumes” provide are:

  1. If you have more than one hard-disk, Logical Volumes can extend over more than one disk: i.e., they are not limited by the size of one single disk, rather by the total aggregate size.
  2. You can set up “striped” LVs, so that I/O can be distributed to all disks hosting the LV in parallel. (Similar to RAID-0, but a bit easier to set-up.)

Let’s understand ..

What is Elasticity ?

Elasticity is defined as “the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible”

🎇Now let’s see … how to integrate LVM with hadoop and how to provide elasticity to datanode storage.

For this task I have used one namenode and one datanode both are from local system.

We added physical Harddisk to our datanode . We added two volumes one is of 3GB and one is of 4GB but we wanted to share 5GB storage from datanode to namenode and there was one need came up that we required to extend the amount of storage shared by datanode ..but with the help of LVM concept we did this all task easily even extended partition .

Let’s see the steps of task…

Step 1 . Add physical Harddisks to our datanode . Here I have added two volumes :

  • one harddisk is of 3GB 👇
  • second one is of 4GB👇

Now let’s see available harddisks using 👉 fdisk -l

Here we can see the two harddisk which I have added in previous steps

  1. /dev/sdb :3GB
  2. /dev/sdc: 4GB

Step 2. Now we have to convert that two harddisks into physical volumes.

Command to convert harddisk into physical volume :👇

# pvcreate /dev/sdb(first HD) /dev/sdc (second HD)

Now we can see converted physical volumes . using following command

# pvdisplay /dev/sdb(first HD) /dev/sdc(second HD)

Step 3:Create Volume Group(VG)

We have to create volume group using that physical volumes.

I have created volume group named mytaskvg using two physical volumes using following command.

command to create volume group.

# vgcreate <name_of_volumegroup> <first PV> <second PV>

we can see our created volume group by using following command.

# vgdisplay <name_of_volume_group>

Step 4 : Create partition in volume group of size you want to contribute to namenode. Here I am contributing 5GB

First I wanted to contribute only 5GB from volume group.

Using following command we have created LV partitions

Command to create LV :

# lvcreate — size<G/M/T> — name <LV_name> <VG_name>

For checking the partition is created or not used command as :

# lvdisplay <VG_name>/<LV_name>

“If we want to store data on any of the storage device we should have to format that device .“ So here we have created LV of 5GB to store the data that’s why we have formatted that partition.

Step5: Format the partition

Command to format the partition👇

#mkfs.ext4 /dev/<VG_name>/<LV_name>

Now we have to mount that partition on directory

Step 6: Mount the created partition on directory.

As I wanted to mount that partition on directory so first I created one directory.

  • Created directory named /datanode using command : mkdir /datanode

Now mounted partition on /datanode directory using following command.

#mount /dev/VG_name/LV_name /<directory_name>

Then I added that directory /datanode to the datanode while configuring on which partition has mounted .

Step 7 : Start the datanode service and check the volume contribution to datanode .

For starting the datanode service used command : start datanode

And for checking the contribution used command :

#hadoop dfsadmin — -report

🤔Let’s suppose in future we need to increase the size of shared volume contribution then let’s see how to extend the size of LV..

If any need come up for increasing size of shared storage we can extend the size of volume contribution without unmounting and stopping any services

Step 7 : For extending the volume contribution use command :

#lvextend — -size +1G /dev/<VG_name>/<LV_name>

Step 8 : Format the extended part using following command:

#resize2fs /dev/VG_name/LV_name

step 9: Again check the contributed volume from datanode by using command #hadoop dfsadmin -report

We can see that volume has extended on the fly without removing and unmounting the services.

Now after extending LV datanode is contributing volume of 6GB approximately.

Let’s see how to reduce LV and how we have integrated it with datanode in hadoop cluster.

To free up lv space when not in use can be beneficial to utilize the storage efficiently by dedicating the free storage to other LV.

“When we reduce the LV size it gives space back to VG.”

Now let’s suppose if we want to give volume from VG to another LV but we have added almost all the storage of VG to previous LV then we can reduce some storage from previous LV and then VG size will increase and then we can give storage to storage to another LV

step 1. unmount the partition

I have unmounted the previously mounted directory.

step 2. for clean and scan the bad sectors.

step 3: I want to keep data safe upto 3GB from previous LV so by keeping safe 3GB from 6GB format remaining 3GB

step4: We can see before reducing volume free up space in VG is approx. 1GB remaining . After reducing LV we will see free up size will increase.

step5: I have reduced LV by 2GB from 6GB by using command which is highlited in picture.

step6: We can see here after reducing LV by 2GB free up space of VG became approximately 3GB

step7: We can see that before reducing LV size we had size of LV was 6GB approximately after reducing LV by 2GB we can

step8: Now again mount the partition on same directory and use that mounted directory while configuring the datanode. We have reduced LV size so now we have partition of only 3GB so datanode is contributing approximately 3GB storage .

Conclusion —

* Here we saw how concept LVM and how LVM help us for extending partition without stopping or unmounting the services. And how we can reduce the partition without losing the data.

* When we extend the partition it takes space from VG and when we reduce the partitions it give back space to the VG