Multilingual content and web crawlers
When an international website uses locale-adaptive pages, the goal is to identify the optimal language for the human user and to redirect them. It all starts with the best of intentions but there can be unforeseen negative technical SEO consequences as you’ll soon realize!
To geo-target human users, locale-adaptive pages use two main methods:
- They analyze the IP address of the visitor: for instance people based in Germany will use the German IP address. It seems logical to serve German content to people who are based in Germany, right?
- They analyze the default or preferred language set by the human user in their web browsers.
Let me illustrate why by given you a fresh, real-world example 🙂
A few days ago, one of my followers working on an international SEO project tagged me.
The context is important to understand the different human user experience we both went through. My follower is based in a different country and although we both speak English, we do not speak, read and write in the same native languages.
Their company website is available in multiple languages and I was automatically redirected to gTLD/language-subdirectory/*
In this specific case, I was redirected to domain[.]com/fr/ (their French language section).
I told my follower that their website was automatically redirecting me to their French version.
He told me there was no redirection! I insisted. Initially, he could not understand why. He then used a VPN to pretend to connect from a French IP address.
Still, he was not being automatically redirected! Why is that? One of the web browsers I was using had French set as the default language. My follower on the other hand was using English as his preferred web browser language and he always saw the default website language which was English. No visible redirection for him, even with a French IP address!
Both the IP detection and the browser language detection can lead to serious technical SEO consequences if search engine crawlers are being force redirected to one specific language.
In 2015, Google came up with two ideas to solve this issue.
→ Geo-distributed crawling
Googlebot started to use IP addresses that appear to be coming from outside the USA, in addition to the many IP addresses that are from the USA.
→ Language-dependent crawling
Googlebot would start to crawl with an Accept-Language HTTP header in the request.
10 years later, the conclusion is that by default, Googlebot still requests a lot of pages WITHOUT setting an Accept-Language HTTP request header and still uses a lot of IP addresses that appear to be located in the USA.
If you return different content based on the perceived country or preferred language of the visitor, Googlebot might not crawl your different languages!
→ If you force the redirection of Google’s crawler to your English content because your website detects a US-based IP address and no Accept-Language, Googlebot might end up NOT crawling (and therefore NOT indexing) your multilingual content!
Instead of monitoring Googlebot’s geo-distributed crawls using reverse DNS lookups, be aware that Google always had a hard time understanding websites that use locale-adaptive techniques.
For international SEO projects, I suggest:
1.Clearly structuring your multilingual content using the branding strategies I laid out in this guide:
2. Using the hreflang attribute (I’m writing a free guide).