Many modern day graphical text editors have the functionality to count characters, words and lines of the text file that it is being edited. But sometimes you would want to see the statistics of the text file from the command line. It is also useful when you want to see and compare these for several different files.
The most common command used in Linux for this purpose is wc. This command can print out byte, character, word and line counts of a text file or the standard input. The command is simple with just a few command line options.
To count the number of lines in a text file, where the lines are separated by the end of line (EOF) character, use the wc command with the –lines or -l option. The EOF character is the default line separator used in text files.
bash$ wc -l <filename>
The –lines (or -l) option prints out the newline count which is equal to the number of lines in the file. The <filename> in the above example refers to the path to the text file that you want to analyze.
Remember that the blank (or empty) lines in the file will count as a line as well. If you find a discrepancy between what you expected and what is printed out, then it could be a reason.
In order to count the words in the text file across all lines, you can use the –words (or -w) option of the wc command. The words in the text files are considered to be separated by white spaces, which are known word separators such as spaces, tabs, line breaks etc.
bash$ wc -w <filename>
The wc command by default uses the standard white spaces as delimiters or separators. If you wish to use another character as a delimiter, then you will need to pipe the content of the text file through the tr command before sending to the wc.
For example, Let’s say you have a csv (comma separated file) as input and you want to get a word count on that file. Being a CSV, the appropriate word delimiter is the comma. So, you will need to consider the comma (,) as a word delimiter in addition to the white space. The example below shows you how you can do this by using tr and wc.
bash$ cat <filename> | tr "," " " | wc -w
You can use any character as a custom delimiter using the above method. You just need to substitute the comma (,) in the above example with the desired character.
You can again use the wc command to count the number of characters in a text file. The option –chars (or -m) can be used to print out the character count.
bash$ wcm -m <filename>
You may also use the –bytes (or -c) option to get the same information. In almost all scenarios, the -m and -c option prints out the same count unless you have double bytes characters in the text file or something similar.
The wc command will count the spaces or blanks in the file as a character. You can omit the space from being counted by using the tr command to substitute and delete the spaces from the text file. So, to count the characters without the spaces you can use
bash$ cat <filename> |tr -d [:blank:] | wc -m
This will delete all the horizontal spaces in the file and count the rest of the characters in the file.
If you like to count the occurrence of a single character rather than all the characters in the text file, then you will need to use fgrep to print just the desired character out and then pipe it to the wc command.
bash$ fgrep -o <char> <filename> | wc -l
here, <char> denotes the character that you want to count. So, as an example if you want to count how many occurrences of character t is in the text file sample.txt then you can use the example below.
bash$ fgrep -o t sample.txt | wc -l
Using various commands such as tr, fgrep, grep, cut and awk you can convert pretty much filter any content to just the desired text that you want to analyze and then run the wc command on the result to count characters, words and lines either separately or together.
Just as with any other Linux command, you can use multiple files as argument and the wc command will print out the word or line count for all of them individually and also provide a total for all files combined. You can also use regular expressions to match just the files you need. This can be a good tool to compare the content in several text files.