how to count the number of delimiters in a text file in linux
It is easy enough to count lines in a text file, also it is easy enough to count characters with in a text file. So, counting delimiters in the text file should be just as simple. The delimiters are usually nothing but characters and counting those characters will give the count of delimiters in the file.
For example, counting lines are nothing but counting the end of line (EOF) characters in the file.
Usually, when you want to count the delimiters it is different from the normal delimiters such as end of line (EOF) and white spaces. Also, you might need to consider multiple characters as delimiters. The general idea behind each of these techniques is to delete every other characters that are not to be counted, and then to use the wc command to count what is left.
count the total number of delimiters in the file
Let's start with considering the comma (,) character to be the only delimiter in the file. So, that means the EOF is not a delimiter and we will be considering the entire text file as one long line. You should be able to substitute almost any character instead of the comma (,) in the examples below.
using the tr command
The tr or translate command can be used to extract all characters that you want to count, and then count them using the wc command. The -c command line option in the wc command will count the characters in the string.
bash$ tr -cd , < filename.txt | wc -c
- -c : complement the set of characters given in the command line
- -d : delete the characters in the set in command line
using the grep or fgrep command
The fgrep command can be used to match and print just the characters you want. As fgrep will print each of the matched character in a separate line, we will use the wc command to count the number of lines which will show the total number of characters in the file.
bash$ fgrep -o , filename.txt | wc -l
- -o : prints only the matched parts of the matching line in a separate line
count the number of delimiters per line
The previous section counted the total number of delimiters in the file. What if we want to count the delimiters per line for every line in the file.
using the awk command
bash$ awk -F "," '{print NF-1}' filename.txt
The awk command works on a line by line basis and can be used to count the delimiters in each line in the file. We print out 1 less than the count of fields which is the count of the delimiters.
- -F : sets the delimiter
- NF: the number of fields
using the tr command
You can use the tr command here as well, but you will need to handle it differently than in the previous section. That is because the lines in the file are not preserved after the tr processing.
bash$ tr -cd ",\n" < filename.txt | awk '{print length}'
We will need to complement and delete for both the delimiter character as well as the end of line character, so that the lines in the file are preserved. We then use the awk command to print out the length of each line.
count the total number of multiple delimiters
Let's say we want to consider multiple characters as delimiters. So instead of just using the comma (,) as a delimiter, we will consider comma(,), period(.) and colons(:) as delimiters in this use case. You should be able to easily substitute any other characters or add more characters to the list as your requirements dictate.
bash$ tr -cd ",.:" < filename.txt | awk '{print length}'
So, the above command will count the total number of delimiters in the whole file. All we have done is add the characters to the command line argument of the tr command. To count each line separately, we just need to add the EOL or ‘*/\n/*‘ to the list as well.
bash$ tr -cd ",.:\n" < filename.txt | awk '{print length}'
Once you have figured out how to count characters in a string and in a file, almost every other use case becomes a variation of the same. You can easily manipulate the variables and command line arguments in the command to pretty much count anything that you want.