An important consideration when it comes to Search Engine Optimization (or SEO) is that your webpages and content are getting crawled and indexed by various search engines. Without getting indexed, you will not rank in any search query or search pages no matter what you do.
The traffic on your website can be categorized into different types: direct traffic, paid traffic, referral traffic or organic traffic. The organic traffic refers to the traffic that you get from search engines: when users search using a query and your webpages are listed in search results and they click on it. Such traffic has the potential to be quite high when compared to other types and is essentially “free” in terms of money and effort.
why indexing is important?
Organic traffic is not really “free“. You still have to put a lot of effort into generating good content and getting ranked high up in the search result pages before you get any appreciable amount of traffic. But the potential to generate high volumes of traffic if and when you rank high is the best part of organic traffic. In order for all that to happen, you need to get the website content get indexed in search engines.
For most websites, the search engine or organic traffic can be up to 70% to 80% of their overall traffic. And such traffic is more than half of all internet traffic. So, if you find that your site is not generating enough organic traffic, then it might be time to start investigating why…
First of all, your website should be get crawled and indexed auto-magically if it is publicly available on the internet. It might take some time depending on many factors such as popularity, backlinks, site content etc. But it will happen over time organically as long as you have not mis-configured something.
Is it really online and public?
The first thing to make sure is that your website is really public and accessible. This might seem like a trivial thing, but you will be surprised by how many novice users does not fully test the website. You should test your website using different machines in different locations.
Use your home computer, work computer, an internet cafe and friend’s computer to see if the website displaying correctly. You can also use different smartphones and different networks to do just the same.
This should allow you to track down any DNS and connectivity issue right out of the gate.
do search engines see it?
The easiest thing to do is to search using the search engines with very specific keywords or domain names. Pick a couple of long phrases that are unique from your web content and search for them and see if they show up in the search pages. You can also use the domain name of your website as the search query.
If your website is really new, say only a couple of days then give it some more time for it to be indexed. It is not unusual for new websites can take up to several days to show up in search engines.
You can also use some search operators in your query to narrow down the results to the specific website. The search operator site: is a good example that will display only pages from the specific web domain.
add web analytics to your site
There are several web analytics services, including Google Analytics that allow you to monitor web traffic to your site. Integrating with such services will allow you to monitor not only organic traffic but other traffic channels to better understand what drives traffic or users to your website.
Issues with indexing
It is still possible to have issue with indexing for a variety of reasons. Once you have verified with some certainty that your website is not getting indexed, you can start to investigate what the exact reason might be. Here are some of the most common reasons:
The robots.txt file provides some instructions to the search engine crawlers as to what not to index. It is quite possible that you have accidently left some instructions in there de-indexing your website. It could be left over from the development days or it could just be a typo.
If the robots.txt file is correct and accurate, make sure that correct file is published to the internet. Access the robots.txt file from the network (over internet) using the browser and verify that it is the same file. Alternatively you can use the Google Webmaster Tool to verify the validity of your robots file.
Another file that aids search engines is the sitemap file. It is recommended that you create an accurate sitemap file to facilitate the correct indexing of all your webpages. The absence of sitemap is usually not the reason that you are not getting indexed. However, having one can increase the probability of getting indexed faster and accurately.
It is quite possible that search engine crawlers are having issues parsing your web content. This could be any kind of errors, from issues accessing the page to wrong html syntax or tags. You should check for crawler errors in your webmaster tools to see what they might be and fix them.
Such errors will be unique to your website. It is hard to speculate and enumerate all possible scenarios.
Another file that can restrict access to your content is the .htaccess file located in your root web folder. Often times this file is used to restrict access to sections of the website or do web redirects of web urls. You should check this file to make sure that everything is correct and accurate as you intended.
There are http meta tags configured for each and every page (or URL) in your website. You should check the webpages to make sure that you are not setting the noindex meta tag in pages accidentally. These are also known as robots meta tag.
When you view the source of the web page, look for something that resembles the examples below in the head section of the webpage. The noindex instructs the search engines not to index the current page, while nofollow instructs them not to follow and index any URLs in the page.
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
Unless, it is strictly intentional you should remove such tags allowing the robots or crawlers to crawl and index the pages.
Just as with the meta tags, you should also check the http headers for your pages. This is another place where you could have configured the no index option, which is read by search bots. The exact header that you should look for is called X-Robots-Tag. An example of such headers is
X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow
I am assuming that you do know how to check the http response headers for your pages.
excluded url parameters
Many search engine crawlers allow you to exclude url parameters (sometimes specific parameters) from the search indexes. If you configure this incorrectly, then there is a possibility that one or many of your web pages which differ in these URL parameters are not getting crawled at all.
Log in to the search console for the appropriate crawler and fix (or remove) url parameters. Most times it is best to leave to the crawlers to figure it out by themselves.
unusual URLs and Ajax related issues
Many modern day web applications are single page applications where URLs do not change (or at least not often). There are also many technologies that load web URLs and content dynamically without manipulating or modifying the web URLs correctly.
It is quite possible that the search engine crawlers are not smart enough to crawl such websites effectively and efficiently without some guidance and explicit instruction.
Many search engines provide you the ability to block certain URLs from being crawled. This is usually configured using a search engine console such as the Google Webmaster Tools and is in addition to the other methods such as robots file, meta tags and http headers.
If you have any such configuration, then double check to make sure that you are not accidently blocking any URLs accidentally.
wrong content syntax
Both web browsers and crawlers are very flexible and lenient when it comes to HTML syntax. It makes every effort to make sense of the content. However, it is still a good idea to make sure that your web pages are following good practices and are syntactically correct.
You should use a web validation tool to ensure that your web pages are coded correctly.
canonicalization and duplicate content
It is not uncommon for you to have same or similar content that can be accessed using several different web urls. In most cases, such content from the same web domain should not cause issues. However, it is your responsibility to make sure that search engines do not interpret it as duplicate content and penalize your website.
You should make sure that different sub-domains of your website resolves to a single domain. Also use canonicalization on your web pages to ensure that the web content is indexed to the correct URL.
page load times and host issues
Is it possible that your web content takes too long to load causing the crawlers to timeout or abort indexing? You should check the page load time that it takes a webpage to load and ensure that it is reasonable. If you have integrated with an analytics service, then it is very easy to check this. The webmaster tools also provides you with some information as to how long it takes your pages to load.
It is quite possible that you website or web domain have been penalized by the search engine. This could happen for a variety of reasons. Again, search engine consoles are the best places to check for this. For Google, check the Webmaster Tools service that will display the reasons why pages are not getting indexed and if (and why) you are getting penalized.
These are some common issues that can cause your web content from getting indexed correctly. The above is a checklist of things that you could use to make sure that it is not configuration issue at your end.