TLDR; What do you know about how our information is harvested when we browse the internet? What methods are used by content providers to gather and sell our data? Your insight is appreciated!
When I browse the internet, occasionally I’ll catch a bug that has me really interested in a very specific topic. For example, one day I took an interest in airplanes, I read about pilot’s licenses, looked at flight schools and even searched for airplanes on eBay. I clicked on one eBay listing for an airplane. Next thing I new, when I was on Facebook, I saw ads for airplanes for sale. Even in my gmail, at the top of my feed I was seeing pilot related ads!
My question is this: what methods are used to get this information to advertise to a user? Is it primarily google who is selling my content searches? If I am lead to a site and start clicking links leading me across multiple web sites without a single websearch, how much am I being tracked and by what means?
My reasons for asking, is because I’m working on a project to develop a small team of automated bots that will visibly browse the internet for me over a wide variety of topics while I conduct my own business. My idea is to intentionally “polute” the data feeds of whoever is inspecting my activity so that they cannot adequately access who I really am and what my interests are.
I know that I can avoid all this stuff by not using google, staying off social media, using VPN’s etc. This is for academic purposes. And I would like to explore the idea of internet privacy by “hiding in plain site” but surrounded by a smoke screen of conflicting interests.
Lastly, I’m looking at using google chrome in headless mode and controlling it with the node library Puppeteer in order to keep cpu consumption down while the bots do their work. Seeing that headless chrome is a development tool, will potential data from searches conducted under headless mode still be harvested by the same interested parties? This is the real reason I want to know how data is harvested. Does the harvested data primarily originate from data gathered by the browser? Or is it on a granular level such as the URL in your web history?
Lots of questions here, I know. I invite anyone with any insight into how our privacy is compromised or harvested to share it here. Hopefully this thread will be useful to others developing tools for maintaining a private identity online. Thanks!