Web scraping with python on sites with authentication

I am trying to automate a web data collection process using Python. In my case, I need to extract the information from the page https://app.ixml.com.br/documentos/nfe. However, before going to this page, you must log in at https://app.ixml.com/login. The following code theoretically should log in to the site:

import re
from robobrowser import RoboBrowser


username = 'meu email'
password = 'minha senha'

br = RoboBrowser()

br.open('https://app.ixml.com.br/login')

form = br.get_form()

form('email') = username
form('senha') = password

br.submit_form(form)

src = str(br.parsed())

However, when printing the src variable, I get the source code of the page https://app.ixml.com.br/login, that is, before logging in. If I enter the following lines at the end of the previous code

br.open('https://app.ixml.com.br/documentos/nfe')
src2 = str(br.parsed())

The src2 variable contains the source code of the page https://app.ixml.com.br/ .. I tried some variations, such as creating a new br object, but I got the same result. How can I access the information at https://app.ixml.com.br/documentos/nfe?

architecture – Suggestions on system design for data scraping

I will need to delete data from different property listing sites and be able to search by name and location.

I don't have much experience in the design of the entire system. My initial thought is to create, for example, 5 separate databases for 5 property sites. Then, each of them will have different fields.

Then I create an intermediate database that consolidates the data into a large database. For example, I will change the units, the descriptions, so that the data is consistent (that have the same field).

Then I synchronize the data with Elasticsearch so I can easily search for property names. The result will return a list of properties that match my search.

I am not sure if this is a good design and has a disadvantage.

I am also not sure how I can combine information from different sites. For example, property site A only contains name, year of construction, price. Property site B contains other information, such as location. How to combine two sets of data into one while the name of different sites may be different?

[UDEMY] – Introduction to scraping web data with Python doing 20 real projects

Introduction to scraping web data with Python doing 20 real projects

From beginner, get the web data you want as CAKE! It is easy.

REMEMBER: Coupons are LIMITED, SO QUICK!
https://www.udemy.com/python-master-web-scraping-course-doing-20-real-projects

api: scraping data from a form submission in another domain

I am currently investigating a possible security vulnerability that I might have encountered on a known website. Before introducing myself to the company, I would like to create a proof of concept.

I am looking at a site (Site A) that has a form. The user chooses a location from a list and then enters a number between 1 and 10. The form then returns information and a price.

I would like to create a form on a different site (Site B) that replicates the steps and extracts data from Site A.

I am pretty sure that this is not possible without API access or CORS configuration. Is it correct or could I build a bot to scrape data from site A to B?

import – web scraping images

I am trying to load player profile images from the following page:

https://www.transfermarkt.com/manchester-united/startseite/verein/985/saison_id/2006#

I use the following code:

Import("https://www.transfermarkt.com/manchester-united/startseite/
verein/985/saison_id/2006#", "Images")

Unfortunately, all profile images are loaded as follows:
enter the description of the image here

All the images I don't need are in the correct format …

The pause function (for example, for a second)

Pause(1)

It only works between tasks. So I don't think I can use it.

Any idea how to solve this problem?

Problem with results of a Python scraping with beautifulsoup

I am starting with Python.

I want to get some data from a website.

I've been documenting but it's costing me. Well the case is that I retrieve the information I need, because I see it in the forge that runs through the array but then I am not able to work with that position in the variable.

The code is:

    # -*- coding: utf-8 -*-

from bs4 import BeautifulSoup
import requests

URL = "http://www.morningstar.es/es/funds/snapshot/snapshot.aspx?id=F0GBR04AR1"

# Realizamos la petición a la web
req = requests.get(URL)

# Comprobamos que la petición nos devuelve un Status Code = 200
status_code = req.status_code
if status_code == 200:

    # Pasamos el contenido HTML de la web a un objeto BeautifulSoup()
    html = BeautifulSoup(req.text, "html.parser")

    # Obtenemos todos los divs donde están las entradas
    entradas = html.find_all('td', {'class': 'line text'})

    print(entradas(2))
    for i, entrada in enumerate(entradas):
        print (i, entrada)

    print('---')
    print(entradas(0))

    #no funciona la siguiente linea
    type(entradas)

    valor_antes = entradas(0)
    print(valor_antes)

    #no funciona las siguientes lineas
    type(valor_antes)
    len(valor_antes)



else:
    print ("Error: Status Code %d" % status_code)

And the result is:

    LU0011850392
0 EUR 115,83
1 -0,32%
                      
2 LU0011850392
3 EUR 808,94
4 EUR 685,22
5 5,00%
          
6 2,25%
---
EUR 115,83
EUR 115,83
(Finished in 2.2s)

As you can see the type or the len show nothing. I tried acrotala and something else but I do not get.

I think that entries () is not an array.

Some help?

Many thanks

Scraping prices from a search bar on a website with Python

I have a list of part numbers that can be found in the search bar at: https://www.partsfinder.com/catalog/preview?q=0119000230

I want to collect the prices of the results.

This is what I put together but I'm not sure where to go from here:

import requests
from bs4 import BeautifulSoup

r = requests.get (& # 39; https: //www.partsfinder.com/catalog/preview? q = 0119000230 & # 39;)

soup = BeautifulSoup (r.text, & # 39; html.parser & # 39;)

resultsRow = soup.find_all (& # 39; a & # 39 ;, {& # 39; class & # 39 ;: & # 39; search_result_row & # 39;})

results = ()

Any help appreciated, thanks!

Design – What is the best way to do web scraping in Flutter / Dart?

I would like to automate some tasks on a website using a Flutter application. Basically, I have to log in to navigate to a specific page to fill out a form and send it. I'm trying to achieve that by using a WebView in flutter, and doing the navigation using Javascript. So I go to a page, I run a bit of javascript (how to complete the login / step and send form), I go to the next page, etc. But in this way, it is very difficult to synchronize the execution of JS on the correct page.

I wonder if there is a better approach to do that. Before that, I tried something using an Http client and running some post Http commands, but I quit because I thought using WebView could be easier. But it was not easy at all.

Is there any other alternative to do what I want?

design – What is the best way to do a web scraping in Flutter?

I would like to automate some tasks on a website using a Flutter application. Basically, I have to log in to navigate to a specific page to fill out a form and send it. I'm trying to achieve that by using a WebView in flutter, and doing the navigation using Javascript. So I go to a page, I run a bit of javascript (how to complete the login / step and send form), I go to the next page, etc. But in this way, it is very difficult to synchronize the execution of JS on the correct page.

I wonder if there is a better approach to do that. Before that, I tried something using an Http client and running some post Http commands, but I quit because I thought using WebView could be easier. But it was not easy at all.

Is there any other alternative to do what I want?