Google search console error: indexed, though blocked by robots.txt

I used a shared hosting in the past and everything was fine. But when I switched to a Google VPS (Digital Ocean) search console this error appears: "I … | Read the rest of https://www.webhostingtalk.com/showthread.php?t=1786978&goto=newpost

robots.txt: Google Search Console and Sitemap settings for a website subfolder

You may have a different XML sitemap configured for your folder and it will not confuse search engines. Source.

Just keep in mind that they are correctly linked from the main site map.

But when it comes to GSC or Analytics, it is not possible to implement it in folders.

robots.txt: Google Search Console settings for the subfolder site

I am managing a website https://www.example.com/ and I just launched a new website https://www.example.com/talkabout (with a subfolder).
I already set up a GSC account, sitemaps (html and XML), robots.txt for the root domain, but I was wondering if I need to configure them for https://www.example.com/talkabout.

I think that the robots.txt file of the site "/ talkabout" should be the same as the root domain (https://www.example.com/robots.txt) but it should set up a separate GSC account and an XML site map by example, https://www.example.com/talkabout/sitemap.xml and add it in the robots.txt file?

Thanks in advance

Can robots.txt be used to prevent bots from viewing lazily loaded content?

Let's say googlebot is scraping https://example.com/page.

  • example.com have a robots.txt file that does not allow /endpoint-for-lazy-loaded-contentbut allows /page
  • /page lazy load content using /endpoint-for-lazy-loaded-content (via fetch)

Does Googlebot see lazy content loaded?

The Google URL inspection says that the URL of my image is blocked by robots.txt. I don't even have one!

I just discovered that Google does not track the domain of our imaging system for a long time.
The reason is that all URLs appear to be blocked by robots.txt – But I don't even have one.

Disclaimer: Due to some configuration tests, I now have a generic file of robots that allow everything at the root of the website. I didn't have one before this time.

We execute an image resizing system in a subdomain of our website.
I have a very strange behavior, since Search Console claims to be blocked by robots.txt, when in reality I don't even have one in the first place.

All the URLs in this subdomain give me this result when I test them live:

unknown url for google

url supposedly blocked by robots

Trying to debug the problem, I created a robots.txt in the root:

valid robots

The robot archive is even visible in the search results:

indexed robots

The response headers also seem fine:

​HTTP/2 200 
date: Sun, 27 Oct 2019 02:22:49 GMT
content-type: image/jpeg
set-cookie: __cfduid=d348a8xxxx; expires=Mon, 26-Oct-20 02:22:49 GMT; path=/; domain=.legiaodosherois.com.br; HttpOnly; Secure
access-control-allow-origin: *
cache-control: public, max-age=31536000
via: 1.1 vegur
cf-cache-status: HIT
age: 1233
expires: Mon, 26 Oct 2020 02:22:49 GMT
alt-svc: h3-23=":443"; ma=86400
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 52c134xxx-IAD

Here are some sample URLs to try:

https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/10/legiao_zg1YXWVbJwFkxT_ZQR534L90lnm8d2IsjPUGruhqAe.png.jpeg
https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/10/legiao_FPutcVi19O8wWo70IZEAkrY3HJfK562panvxblm4SL.png.jpeg
https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/09/legiao_gTnwjab0Cz4tp5X8NOmLiWSGEMH29Bq7ZdhVPlUcFu.png.jpeg

That I have to do?

Google says my URL is blocked by robots.txt. I don't even have one!

I just discovered that Google does not track the domain of our imaging system for a long time.
The reason is that all URLs appear to be blocked by robots.txt – But I don't even have one.

Disclaimer: Due to some configuration tests, I now have a generic file of robots that allow everything at the root of the website. I didn't have one before this time.

We execute an image resizing system in a subdomain of our website.
I have a very strange behavior, since Search Console claims to be blocked by robots.txt, when in reality I don't even have one in the first place.

All the URLs in this subdomain give me this result when I test them live:

unknown url for google

url supposedly blocked by robots

Trying to debug the problem, I created a robots.txt in the root:

valid robots

The robot archive is even visible in the search results:

indexed robots

The response headers also seem fine:

​HTTP/2 200 
date: Sun, 27 Oct 2019 02:22:49 GMT
content-type: image/jpeg
set-cookie: __cfduid=d348a8xxxx; expires=Mon, 26-Oct-20 02:22:49 GMT; path=/; domain=.legiaodosherois.com.br; HttpOnly; Secure
access-control-allow-origin: *
cache-control: public, max-age=31536000
via: 1.1 vegur
cf-cache-status: HIT
age: 1233
expires: Mon, 26 Oct 2020 02:22:49 GMT
alt-svc: h3-23=":443"; ma=86400
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 52c134xxx-IAD

Here are some sample URLs to try:

https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/10/legiao_zg1YXWVbJwFkxT_ZQR534L90lnm8d2IsjPUGruhqAe.png.jpeg
https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/10/legiao_FPutcVi19O8wWo70IZEAkrY3HJfK562panvxblm4SL.png.jpeg
https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/09/legiao_gTnwjab0Cz4tp5X8NOmLiWSGEMH29Bq7ZdhVPlUcFu.png.jpeg

That I have to do?

Which is better: Meta Robot tags or robots.txt?

Which is better: Meta Robot tags or robots.txt?

Why does Google index our robots.txt file and display it in the search results?

For some reason, Google is indexing the robots.txt file for some of our sites and displays it in the search results. See screenshots below.

Our robots.txt file is not linked from any part of the site and contains only the following:

User-agent: *
Crawl-delay: 5

This only happens for some sites. Why does this happen and how do we stop it?

enter the description of the image here

Screenshot 1: Google search console

enter the description of the image here

Screenshot 2: Google search results

Why does Google index our robots.txt file and display it in the search results?

For some reason, Google is indexing the robots.txt file for some of our sites and displays it in the search results. See screenshots below.
SEMrush

Our robots.txt file is not linked from any part of the site and contains only the following:

User Agent: *
Tracking-delay: 5

This only happens for some sites. Why does this happen and how do we stop it?

[IMG] "data-url =" https://i.stack.imgur.com/5t9Ms.png

Screenshot 1: Google search console

[IMG] "data-url =" https://i.stack.imgur.com/V2UaU.png

Screenshot 2: Google search results

What is not allowed in the robots.txt file?

What is not allowed in the robots.txt file?