how to find large files in a folder in linux using command line
Sometimes, you need to find large files in a file system that is taking up most of the disk space so that it can be cleaned up to preserve space. These files could be inside folders and sub-folders many levels deep, thus making it almost impossible to navigate to each one of them.
There are some Linux commands that you can use to find large files within a folder or directory. You may also want to have different criteria depending on the situation such as…You might want to list files that are more than a specified size, you might want to list all large files sorted by size, list only the top x number of files sorted by size, list only files the top 10 file above a minimum size etc etc.
While almost all of the commands and program provide you with the desired information, some are much more friendly to use than others. Some does not provide the flexibility or a proper output format. Some desirable features in these commands would be the ability to sort and the ability to find or search recursively.
There are some command line options as well as some GUI based software. We will look at the command line options in this post.
Using ls
This is quick, easy and useful if you are just checking the current working directory to find large files. There is really no recursive feature that will work here in the ls command. This will also print out all the sub directories as well as the files in the current folder.
ls -Sl
- -l : print out in long format
- -S: sort output based on size
There is a recursive option (-R or –recursive) that can be used with ls, but the output format is not as friendly to sort it by size as we desire. Also, as another option you can pipe the output through pipe to grep to exclude the directories if you like.
Using du
A better option to find large files is the du (Disk Usage) that computes the size of each file and directory. It is simple command that takes just the folder name or the current directory if one is not specified. To check the /var/log folder, you use
du /var/log/
the above command will list all the folders (and just the folders and sub-folders with in…no files) and the combined size of all files inside each of the folder. Use the -h or –human-readable option to print out the size in a human readable format. Also, the -a or –all option will print out the files in addition to the folders.
du -ha /var/log
The output is still not sorted. In order to sort the output with the largest file on top, we will pipe the output through to the sort command.
du -a /var/log | sort -nr
We will drop the -h option from the du command, so that all sizes are printed out as KB which allows the sort command to sort them correctly. The -n option will perform the sort numerically rather than lexically and the -r option is to reverse the sort order. The default sort order, without the -r or –reverse option, is ascending which will print out the smallest files first.
If the folder contains a lot of files, and you are only interested in the top x number of files by size, then you can pipe this again through to the head command.
du -a /var/log | sort -nr | head -n +10
This will the find large files in folder and sub-folders and print out in a formatted table.
Using Find
find is a very versatile command that can be used to list files recursively inside a folder based on several useful criteria. Using the different filters available with the find command will help you find large files that you want. In order to find all file above a specific file size (eg. 10KB) in a specific folder (eg. /var/log) , you can use the following command
find /var/log -type f -size +10k
You will see that the output prints out just the file names, without any additional information or sorting. In order to add extra information such as size and date, use the exec option to execute the ls command on the files …
find /var/log -type f -size +10k -exec ls -l {} \;
You can note that the output is still "messy" and a little bit unreadable although all the information is there….you may pipe it through to awk command to make it prettier and just print the relevant information….something like this
find /var/log -type f -size +10k -exec ls -l {} \;| awk '{ print $9 ": " $5 }'
In order to display only the directories instead of the files, you can do something as shown below. But then that is equivalent to what is displayed by the du command as mentioned in the earlier section.
find /var/log -type d -exec du -s {} \;
Now, you probably do not always care about a specific size but just want to display the top x (eg. 10) number of the largest files no matter what the size. To achieve that using the find command, you will need something like this:
find /var/log -type f -exec ls -s {} \; | sort -n -r | head -n 10
This is very similar to the du command earlier, but will just print out just the files and not any directories. find command has the additional advantage that it has a lot more options that will allow you to further restrict the files that it search and display. You can further filter them by last access time, last modified time or file name among others.
If you like to see only the largest 10 log files in the /var/log folder, that have not been modified in the last 30 days sorted by size, but exclude the mysql/ folder…
find /var/log -type f -name "*.log" -mtime +30 -not -wholename "*/mysql/*" -exec ls -s {} \; | sort -nr | head -n 10
Thus, using the command line utilities will help you find large files easily and exactly the way you want. But if you are not into command line and prefer GUI based option, then there are also some good software that does the same job. They are quite user friendly if you do have access to a graphical X on the machine.