Clearly, a slash or no slash define two different resources, except at the root level because the HTTP request can’t access the root without a slash.
The HTTP request for the root of a website looks like this:
GET / HTTP/1.1
It’s just not technically possible to distinguish a version without a slash even if the URL syntax allows you to write it…
About a website hierarchy Google says:
When referring to the homepage, a trailing slash after the hostname is optional since it leads to the same content (
https://example.com/ is the same as
https://example.com). For the path and filename, a trailing slash would be seen as a different URL (signaling either a file or a directory), for example,
https://example.com/fish is not the same as
Notice that here the author says “signaling either a file or a directory”. Another hint of the thought about slashes at Google.
file.html is referencing a file, it generally should not have a slash at the end (i.e. on a hard drive, you don’t put slashes at the end of filenames.) However, on the web you see it all and both appear here and there. I still think no slash is more common when you have an extension such as the
I’ve seen CMS that place files attached to a page in a virtual sub-folder under that very page. So for example, if you had a
recipe.pdf file, it could be:
Personally, I think it is weird to have a sub-directory under a
.html because you can’t replicate that in a file system (in case you wanted later to create a static version of your website… you’d be in trouble with such virtual folders.)
A Google Example
Google had some examples about this problem (I’ll add the link if I find that page…) Their examples removed the extension as in:
(Note: Google says that they use the extension as a hint of the page’s contents, so file.html tells them that the content is most certainly HTML.)
So no slash at the end of what represents files, but when no extension is present, we get a folder and it makes sense to have a slash. So a complete list of the files could look like this:
Note: In the past, Google said that they would test parent folders automatically, even if the parent page is not defined in the
sitemap.xml or in a link. So GoogleBot would check out all of the following:
# Found document:
# Also check parents:
However, I don’t recall ever seeing such hits in my logs. I never really look to prove that statement either. Yet, this means a functional parent is a good idea. Actually, as a user, I do that once in while, because when I see I’m in a certain sub-folder and think that I should be able to find things of interest one directory up, I manually try to go there by changing the URL. Website that break when you do that are definitely annoying.
When you save a webpage from Firefox or other browser, they create a sub-directory for all the files attached to the page (CSS, JS, images, etc.) and name it with the extension but change the period with an underscore. So something like this:
As for redirecting, it is not mandatory if your sitemap.xml and canonical are correct. In other words, if the search engines goes to:
but finds the following link meta tag:
<link rel="canonical" href="http://www.example.com/path/file.html"/>
then no 301 is required. Google & Co. will use the version without the slash.
The opposite is true too, of course. With the following, the slash version is saved and still no 301 is required:
<link rel="canonical" href="http://www.example.com/path/file.html/"/>
If you don’t have a proper canonical (or as I’ve seen on some CMS, if the code generating the canonical copies the URI as is–so you get two different pages in this case,) a redirect is probably the easiest way to fix the problem.
sitemap.xml file must reference the page using the canonical URL of the page. So if the canonical uses the slash, the
sitemap.xml URL will include the slash, and vice versa.