google search console – Can I combine sitemap and sitemap index together?

I want to create a sitemap for Google and Bing.

Referring to sitemap protocol: https://www.sitemaps.org/protocol.html#sitemapXMLExample

I want to combine sitemap file and sitemap-index file in one. What I mean is as follows:

<?xml version="1.0" encoding="UTF-8"?>

<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd"

         xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

   <sitemap>

      <loc>http://www.example.com/sitemap-index-1.xml</loc>

      <lastmod>2004-10-01T18:23:17+00:00</lastmod>

   </sitemap>

   <sitemap>

      <loc>http://www.example.com/sitemap-index-2.xml</loc>

      <lastmod>2004-10-01T18:23:17+00:00</lastmod>

   </sitemap>

</sitemapindex>

<!-- Sitemap for individual URLs -->

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"

         xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">


   <url>

      <loc>http://www.example.com/page-1/</loc>

      <lastmod>2005-01-01</lastmod>

      <changefreq>monthly</changefreq>

      <priority>0.8</priority>

   </url>

   <url>

      <loc>http://www.example.com/page-2/</loc>

      <lastmod>2005-01-01</lastmod>

      <changefreq>monthly</changefreq>

      <priority>0.8</priority>

   </url>

   <url>

      <loc>http://www.example.com/page-3/</loc>

      <lastmod>2005-01-01</lastmod>

      <changefreq>monthly</changefreq>

      <priority>0.8</priority>

   </url>


</urlset>

Technically this is possible, and I will make sure to maintain directory hierarchy. Although, I am not sure if this is allowed as per the protocol.

If I create a sitemap like this, then is it acceptable by Google, and Bing?

When is it better to use multiple XML sitemaps vs one sitemap?

When is it better to use multiple XML sitemaps vs one sitemap?

web crawlers – What is difference between robots.txt, sitemap, robots meta tag, robots header tag?

So I am trying to learn SEO and I am honestly confused and have following 8 questions.

  • Do I tell a bot not to visit a certain link through X-Robots-Tag or through robot meta tag or robots.txt?

  • Is it ok to include all 3 (robot.txt, robot meta tag, and X-Robots-Tag header) or I should always only provide 1?

  • Do I get penalized if I show same info in X-Robots-Tag and in robot meta tag and robots.txt?

  • Let’s say for /test1 my robots.txt says Disallow but my robots meta tag says follow,index and my X-Robots-Tag says nofollow,index,noarchive. Do I get penalized if those values are different?

  • Let’s say for /test1 my robots.txt says Disallow but my robots meta tag says follow,index and my X-Robots-Tag says nofollow,index,noarchive. Which rule will be followed by the bot? What is the importance here?

  • Let’s say my robots.txt has a rule saying Disallow: / and Allow: /link_one/link_two and my X-Robots-Tag and robot meta tag for every link except /link_one/link_two says nofollow,noindex,noarchive. From what I understand bot will never get to /link_one/link_two since I prevented it from crawling at root level. Now if I provide a sitemap.xml in the robots.txt that has /link_one/link_two there, will it actually end up being crawled?

  • Will bot crawl into the directory provided by sitemap.(xml/txt) even though it is not accessible through home page or any pages following the home page?

  • And overall I would appreciate some clarification on what is the difference between robots.txt, X-Robots-Tag and robot meta tag and sitemap.(xml/txt). To me they seem like they do the exact same thing.

  • I already saw that there are some questions that answer a small subset of what I asked. But I want the whole big explanation.

    Yandex not crawling compressed sitemap index

    I have submitted a sitemap index file (one that links to other sitemaps that contain the actual URLs search engines are instructed to crawl). It is GZip compressed.

    Using the Yandex sitemap validation tool it tells me it is valid and has 202 links and no errors.

    However, in Yandex Webmaster it shows up with a small, grey sign in the status column. When clicked it says ‘Not indexed’.

    Yandex is not indexing the URLs provided in the file, which are all new. Though it states it has consulted the sitemap.

    Any ideas what may be wrong?

    web crawlers – Ampersand (&) in actual URL and sitemap

    if this difference of ampersand in URL and sitemap will cause any issue.

    tl;dr No issue, because the URLs are the same.

    Since in sitemap & has to be escaped I replaced & with &amp;

    Your sitemap is an XML document. As with any XML document, the data values must be stored XML-entity encoded. The & character is a special character (it itself denotes the start of an XML-entity) and therefore must be encoded to negate its special meaning. This is just the way data is stored inside an XML document.

    When the XML document is read by an XML parser the data values are XML-entity decoded, back to the actual value. So, &amp; becomes & when the XML document is read.

    So, a URL of the form /page?foo=1&amp;bar=2 stored inside an XML document is identical to the URL /page?foo=1&bar=2 in your HTML5 document.

    My actual page URLs contain just &

    In HTML5 that is perfectly OK, providing there is no ambiguity. However, in HTML4.1 (and earlier) you would have needed to correctly HTML-entity encode the & as &amp; in your HTML source code for valid HTML. However, browsers are very tolerant and your HTML document would most probably have still “worked”.

    In HTML5 you only strictly need to HTML-entity encode the & if there is an ambiguity. Take the following contrived example. We want to pass the literal string “&dollar;” in the foo URL parameter.

    <!-- In an HTML document (WRONG) -->
    <a href="https://webmasters.stackexchange.com/page?foo=&dollar;">link</a>
    

    The desired URL is http://example.com/page?foo=&dollar;, however, the above HTML anchor results in sending the user to http://example.com/page?foo=$ – which is not the intention. To create the desired result, the & must be HTML-entity encoded to negate its special meaning, resulting in the following (correct) HTML:

    <!-- In an HTML document (CORRECT) -->
    <a href="https://webmasters.stackexchange.com/page?foo=&amp;dollar;">link</a>
    

    It is always safer to consistently HTML-entity encode the & in your HTML-document. If you are generating your content through a CMS, then this should be automatic.

    I am able to access the site after replacing & with &amp; in the URL.

    Presumably you mean “in the URL, in your HTML“? Because if you were to HTML-entity encode the & with &amp; in the browsers address bar (for instance), ie. outside of an HTML context, then you will not get the expected results. For example, if you typed the following directly into the browser’s address bar:

    /page?foo=1&amp;bar=2
    

    Then you would get the two URL parameters (foo) => 1 and (amp;bar) => 2, which is clearly not the intention.

    Magento 2 sitemap url does not work after we add any rewrite rule in root htaccess file

    Our sitemap URL works fine at /sitemap/sitemap.xml but as soon as we add any rewrite rule in root .htaccess file then it gives status code 404 page not found.

    RewriteRule ^abc/(.*)?$ /media/doc/files/abc/$1 (R=301,NC,L)

    Magento version 2.2.5

    seo – Optimize lastmod fields in sitemap index files for a large website that are expensive to compute

    I am trying to create sitemaps for a very large multilingual website; means that every single URL is duplicated with as many languages there are; however the more pressing issue is that content is incredibly dynamic, the lastmod tag can be easily obtained.

    The sitemap is composed as follows, each index contains and specifies every sitemap under it.

    /sitemaps/index.xml
    /sitemaps/[language]/index.xml
    /sitemaps/[language]/[section]/[collection-timestamp].xml
    

    If I create each collection point based on creation, the point is added, and hence a lastmod tag cannot be added or otherwise known other than by fetching the resource via a HEAD request and reading the header.

    If I create each collection point based on modification, the point is added if it was modified during the day, and as so there will be duplicates entries between points with different lastmod dates, any data that changes; since it’s impossible to modify already stored collection points as it will require intensive reads to modify data in older collection points.

    Is it a good idea to submit sitemap?

    hello everone.
    i have porn sharing website…so i wanted to know that should i share sitemap to google as it’s the best way to generate organic traffic but only issue is google have verified my google account with my mobile number…will it be issue?

    seo – Disallowing a handler in robots.txt while adding its dynamic URLs to the XML sitemap

    I’m using ASP.NET webforms, and I have a page lets call it Subjects.aspx, I don’t want crawlers to crawl that page but I want them to crawl the dynamic URLs that are powered by it. For example /subjects/{id}/{title} which routes to subjects.aspx.

    I used a crawling tool and the page /Subjects.aspx was found. Is it okay that i disallow that page in robots.txx like the following:

    user-agent: *
    disallow: /subjects.aspx/
    

    while adding the dynamic URLs in sitemap?

    seo – Disallowing a page while it adding it in the sitemap

    I’m using ASP.NET webforms, and i have a page lets call it Subjects.aspx, i dont want crawlers to crawl that page but i want them to crawl its route map for example /subjects/{id}/{title} which routes to subjects.aspx.

    i used a crawling tool and the page /Subjects.aspx was found. is it okay that i disallow that page in robots.txx like the following:

    user-agent: *
    disallow: /subjects.aspx/
    

    while adding the dynamic routes in sitemap?

    Any suggestions? Thank you!