how to merge multiple files into one single file in linux

Many a times you may have multiple files that needs to merged into one single file. It could be that you previously split a single file into multiple files, and want to just merge them back or you have several log files that you want merged into one. Whatever the reason, it is very easy to merge multiple text files into a single file in Linux.

The command in Linux to concatenate or merge multiple files into one file is called cat. The cat command by default will concatenate and print out multiple files to the standard output. You can redirect the standard output to a file using the '>' operator to save the output to disk or file system.

Another useful utility to merge files is called join that can join lines of two files based on common fields. It can however work only on two files at a time, and I have found it to be quite cumbersome to use. We will cover mostly the cat command in this post.

Merge Multiple files into One in Order

The cat command takes a list of file names as its argument. The order in which the file names are specified in the command line dictates the order in which the files are merged or combined. So, if you have several files named file1.txt, file2.txt, file3.txt etc…

bash$ cat file1.txt file2.txt file3.txt file4.txt > ./mergedfile.txt

The above command will append the contents of file2.txt to the end of file1.txt. The content of file3.txt is appended to the end of merged contents of file1.txt and file2.txt and so on…and the entire merged file is saved with the name mergedfile.txt in the current working directory.

Many a time, you might have an inordinately large number of files which makes it harder to type in all the file names. The cat command accepts regular expressions as input file names, which means you can use them to reduce the number of arguments.

bash$ cat file*.txt my*.txt > mergedfile.txt

This will merge all the files in the current directory that start with the name file and has a txt extension followed by the files that start with my and has a txt extension. You have to be careful about using regular expressions, if you want to preserve the order of files. If you get the regular expression wrong, it will affect the exact order in which the files are merged.

A quick and easy way to make sure the files get merged in the exact order you want, is to use the output of another file listing program such as ls or find and pipe it to the cat command. First execute the find command with the regular expression and verify the file order…

bash$ find . -name "file*.txt" -o -name "my*.txt"

This will print the files in order such that you can verify it to be correct or modify it to match what you want. You can then pipe that output into the cat command.

bash$ find . -name "file*.txt" -o -name "my*.txt" | xargs cat > ./mergedfile.txt

When you merge multiple files into one file using regular expressions to match them, especially when it is piped and where the output file is not very obvious, make sure that the regular expression does not match the filename of the merged file. In the case that it does match, usually the cat command is pretty good at error-ing out with the message "input file is output file". But it helps to be careful to start with.


Merge Two Files at Arbitrary Location

Sometimes you might want to merge two files, but at a particular location within the content of a file. This is more like the process of inserting contents of one file into an another at a particular position in the file.

If the file sizes are small and manageable, then vi is a great editor tool to do this. Otherwise the option is to split the file first and then merge the resulting files in order. The easiest way is to split the file is based on the line numbers, exactly at where you want to insert the other file.

bash$ split -l 1234 file1.txt

You can split the file into any number of output files depending on your requirement. The above example will split the file file1.txt to chunks of 1234 lines. It is quite possible that you might end up with more than two files, named xaa, xab, xac etc..You can merge all of it back using the same cat command as mentioned earlier.

bash$ cat xaa file2.txt xa{b..z}

The above command will merge the files in order with the contents of file2.txt in between the contents of xaa and xab.

Another use case is when you need to merge only specific parts of certain files depending on some condition. This is especially useful for me when I have to analyze several large log files, but am only interested in certain messages or lines. So, I will need to extract the important log messages based on some criteria from several log files and save them in a different file while also maintaining or preserving the order of the messages.

Though you can do this using cat and grep commands, you can do it with just the grep command as well.

bash$ grep -h "[Error]" logfile*.log > onlyerrors.log

The above will extract all the lines that match the pattern [Error] and save it to another file. You will have to make sure that the log files are in order when using the regular expression to match them, as mentioned earlier in the post.