How to know if Google Sheets IMPORTDATA, IMPORTFEED, IMPORTHTML or IMPORTXML functions are able to get data from a resource hosted on a website?

If the content is added dynamically (by using Javascript), it can’t be imported by using Google Sheets built-in functions. Also if the website webmaster have taken certain measures, this functions will not able to import the data.


To check if the content is added dynamically, using Chrome,

  1. Open the URL of the source data.
  2. Press F12 to open Chrome Developer Tools
  3. Press Control+Shift+P to open the Command Menu.
  4. Start typing javascript, select Disable JavaScript, and then press Enter to run the command. JavaScript is now disabled.

JavaScript will remain disabled in this tab so long as you have DevTools open.

Reload the page to see if the content that you want to import is shown, if it’s shown it could be imported by using Google Sheets built-in functions, otherwise it’s not possible but might be possible by using other means for doing web scraping.

According to Wikipedia,

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.

The webmasters could use robots.txt file to block access to website. In such case the result will be #N/A Could not fetch url.

The webpage could be designed to return a special a custom message instead of the data.


IMPORTDATA, IMPORTFEED, IMPORTHTML and IMPORTXML are able to get content from resources hosted on websites that are:

  • Publicly available. This means that the resource doesn’t require authorization / to be logged in into any service to access it.
  • The content is “static”. This mean that if you open the resource using the view source code option of modern web browsers it will be displayed as plain text.
    • NOTE: The Chrome’s Inspect tool shows the parsed DOM; in other works the actual structure/content of the web page which could be dynamically modified by JavaScript code or browser extensions/plugins.
  • The content has the appropriated structure.
    • IMPORTDATA works with structured content as csv or tsv doesn’t matter of the file extension of the resource.
    • IMPORTFEED works with marked up content as ATOM/RSS
    • IMPORTHTML works with marked up content as HTML that includes properly markedup list or tables.
    • IMPORTXML works with marked up content as XML or any of its variants like XHTML.
  • Google servers are not blocked by means of robots.txt or the user agent.

On W3C Markup Validator there are several tools to checkout is the resources had been properly marked up.

Regarding CSV check out Are there known services to validate CSV files

It’s worth to note that the spreadsheet

  • should have enough room for the imported content; Google Sheets has a 5 million cell limit by spreadsheet, according to this post a columns limit of 18278, and a 50 thousand characters as cell content even as a value or formula.
  • it doesn’t handle well large in-cell content; the “limit” depends on the user screen size and resolution as now it’s possible to zoom in/out.

References

Related

The following question is about a different result, #N/A Could not fetch url