python – "errorMessage": "local variable & # 39; action & # 39; referenced before assignment", "errorType": "UnboundLocalError"

I tried to make the variable action global but it didn't work. It seems that any variable within the else statement is isolated from the rest of the code, although they are in the same block of code in the for loop.

for group in auto_scaling_groups:
    if servers_need_to_be_started(group):
        pass
    else:
        action = "Stopping"
        min_size = 0
        max_size = 0
        desired_capacity = 0

    print("Version is {}".format(botocore.__version__))

    print (action + ": " + group)  #Error in this line 
    response = client.update_auto_scaling_group(
        AutoScalingGroupName=group,
        MinSize=min_size,
        MaxSize=max_size,
        DesiredCapacity=desired_capacity,
    )

    print (response)

python: why does my web scraper only work half the time?

My goal is to get the name of the product and the price of all Amazon pages detected on any website you provide to my program.

My post is a text file that contains five websites. On each of these websites, a total of five to fifteen Amazon links are found.

My code is this:

from simplified_scrapy.request import req
from simplified_scrapy.simplified_doc import SimplifiedDoc
import requests
import re
from bs4 import BeautifulSoup
from collections import OrderedDict
from time import sleep
import time
from lxml import html
import json
from urllib2 import Request, urlopen, HTTPError, URLError

def isdead(url):
user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent':user_agent }
req = Request(url, headers = headers)
sleep(10)
try:
page_open = urlopen(req)
except HTTPError, e:
return e.code #404 if link is broken
except URLError, e:
return e.reason
else:
return False

def check(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
page = requests.get(url, headers = headers)

doc = html.fromstring(page.content)
XPATH_AVAILABILITY = '//div(@id ="availability")//text()'
RAw_AVAILABILITY = doc.xpath(XPATH_AVAILABILITY)
AVAILABILITY = ''.join(RAw_AVAILABILITY).strip()
#re.... is a list. if empty, available. if not, unavailable.
#return re.findall(r'Available from',AVAILABILITY(:30), re.IGNORECASE)

if len(re.findall(r'unavailable',AVAILABILITY(:30),re.IGNORECASE)) == 1:
return "unavailable"
else:
return "available"


file_name = raw_input("Enter file name: ")
filepath = "%s"%(file_name)

with open(filepath) as f:
listoflinks = (line.rstrip('n') for line in f)

all_links = ()
for i in listoflinks:
htmls = req.get(i)
doc = SimplifiedDoc(htmls)
amazon_links = doc.getElements('a')
amazon_links = amazon_links.containsOr(('https://www.amazon.com/','https://amzn.to/'),attr='href')
for a in amazon_links:
if a.href not in all_links:
all_links.append(a.href)

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

all_links = (x for x in all_links if "amazon.com/gp/prime" not in x)
all_links = (y for y in all_links if "amazon.com/product-reviews" not in y)
for i in all_links:
print "LINK:"
print i
response = requests.get(i, headers=headers)
soup = BeautifulSoup(response.content, features="lxml")

if isdead(i) == 404:
print "DOES NOT EXIST"
print "/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/"
pass
else:
title = soup.select("#productTitle")(0).get_text().strip()
if check(i) == "unavailable":
price = "UNAVAILABLE"
else:
if (len(soup.select("#priceblock_ourprice")) == 0) and (len(soup.select("#priceblock_saleprice")) == 0):
price = soup.select("#a-offscreen")
elif len(soup.select("#priceblock_ourprice")) == 0:
price = soup.select("#priceblock_saleprice")
else:
price = soup.select("#priceblock_ourprice")

print "TITLE:%s"%(title)
print "PRICE:%s"%(price)
print "/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/"

print "..............................................."
print "FINALLY..."
print "# OF LINKS RETRIEVED:"
print len(all_links)

Whenever it works fine, the output looks more or less like this (please don't judge the PRICE output, I've spent so much time trying to fix that, but nothing works because I can't convert it to a string and get_text () doesn't work) It does not work. This project is for personal use only, so it is not so important, but if you have suggestions, I am very receptive to them):

LINK:
https://www.amazon.com/dp/B007Y6LLTM/ref=as_li_ss_tl?ie=UTF8&linkCode=ll1&tag=lunagtkf1-20&linkId=ee8c5299508af57c815ea6577ede4244
TITLE:Moen 7594ESRS Arbor Motionsense Two-Sensor Touchless One-Handle Pulldown Kitchen Faucet Featuring Power Clean, Spot Resist Stainless
PRICE:($359.99)
/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/

… and so.
The error looks like this:

Traceback (most recent call last):
File "name.py", line 75, in 
title = soup.select("#productTitle")(0).get_text().strip()
IndexError: list index out of range

It's very strange because there is a text file that is fed many times and, sometimes, all sites are deleted well, but sometimes, the error appears in the tenth Amazon product, sometimes, the error appears in the first product …

I suspect it is a bot detection problem, but I have a header. What is the problem?

python – Termination character in suffix trees

I am trying to create a generalized suffix tree from a large number of strings using the Ukkonen algorithm, however, I run out of unique symbols to use as string termination characters.

I was wondering if there is any way to solve this problem. For example, would it be possible to mark the ends of the string as integers and, when the integer grows to a multi-digit value, still interpret that value as a single character.

Any input would be appreciated,
Thank you !

php shell_exec () skipping the Python script

Working on the Apache 2 web server on Ubuntu 12.04 LTS

i usually shell_exec() to run a shell script.

Run all scripts, except the parts where I run python scripts, for example. python pythonscript.py

When I run the shell script from the terminal, it works fine.

Any idea how I could fix this? Could it be something related to permissions?

Health

python – Get the value given to a variable in my dataframe when using LabelEncoder

I have a dataframe with a set of categorical variables. What I have done has been to apply LabelEncoder of the preprocessing package of sklearn.

That went well and generated my new dataframe With its new corresponding values, the thing is that there is one of those values ​​that I would like to continue identifying. This before processing was a # But now it's a numerical value that I don't know.

Reading the sklearn documentation I see that it exists inverse_transform which allows me to return to the previous state. Now, how do I know what value was given to #?

I've tried the following, but I think I'm going wrong,

list(le.inverse_transform(columnas_categoricas))

Where my process of labelencoder It was,

le = LabelEncoder()

cat_df(columnas_categoricas) = cat_df(columnas_categoricas).apply(lambda col: le.fit_transform(col))

Can anyone guide me on how to get the value?

Thank you

Proper configuration of the Python project: code review stack exchange

They asked me to migrate this question from SO here.

I know there are already several questions about folder structures, relative imports, modules / scripts and __init__.py files, but even after reading about those topics, the proper configuration of the project remains what confuses me most about python.

For this purpose, I have created a sample project that should clarify all imports and different calls I can make and how to handle them, so I hope to use it as a template for all future projects. This is the structure of the project:

Sample
├── __init__.py
├── __main__.py
├── package1
│   ├── foo.py
│   └── __init__.py
├── package2
│   ├── bar.py
│   ├── __init__.py
│   └── subpackage
│       ├── baz.py
│       └── __init__.py
├── test.py
└── utils.py

the __init__ Currently, the files are left (intentionally) empty and (again) I avoided explicit relative imports. To avoid saturating this post with the content of each file, I have uploaded the entire project to GitHub.

I would like to fix the project configuration, if possible only by making the following changes:

  • Change imports from absolute to relative or vice versa
  • Add things to different __init__.py records

so that the following calls are resolved without ImportErrors and ModuleNotFoundErrors:

  • user@pc:/path/Sample$ python .
  • user@pc:/path$ python Sample
  • user@pc:/$ python path/Sample
  • user@pc:/path$ python Sample/__main__.py
  • user@pc:/path$ python Sample/test.py
  • user@pc:/path$ python Sample/package2/subpackage/baz.py

Bonus points if the following calls also run without errors (as long as the sample project is in the Python search path):

  • python -m Sample
  • python -m Sample.test
  • python -m Sample.package2.subpackage.baz
  • python -m Sample.package2.baz

The critical parts in my code that must be handled are:

  • the various types of imports on lines 1-4 of __main__.py
  • import on line 2 of test.py, which requires baz be available from package2, instead of his full path of package2.subpackage (This is what currently causes an error when executing test.py); This is similar to how, for example, one can do from keras import Model, instead of from keras.models import Model.
  • the import on line 1 of baz.py, which imports a module at the top level from a subpackage (this is what currently causes an error when executing __main__.py

Can I fix the project only with the changes listed above? Is there any other change that I have to make in the configuration of my project to comply with Python standards?

Python: how to reduce the size of cross-compiled shared libraries?

I am working to install python3.6 together with zmq on an ARM-based processor that has about 32 MB of free memory space.

I built python3.6 and deleted the unwanted libraries, created the python installation package with 15 MB and it is working fine for sample programs.

I need to install zmq to run my application, for that, I have compiled cross pyzmq for ARM according to the following link
https://github.com/zeromq/pyzmq/wiki/Cross-compiling-PyZMQ-for-Android
(This link is for Android, but I made modifications according to my configuration)

As expected, I obtained the list of the following libraries compiled to arm

2.6M    constants.cpython-36m-x86_64-linux-gnu.so
3.0M    context.cpython-36m-x86_64-linux-gnu.so
3.0M    _device.cpython-36m-x86_64-linux-gnu.so
3.0M    error.cpython-36m-x86_64-linux-gnu.so
3.1M    message.cpython-36m-x86_64-linux-gnu.so
3.1M    _poll.cpython-36m-x86_64-linux-gnu.so
3.1M    socket.cpython-36m-x86_64-linux-gnu.so
3.0M    utils.cpython-36m-x86_64-linux-gnu.so
3.0M    _version.cpython-36m-x86_64-linux-gnu.so

I need help on two problems here.

  1. The size of each library was about 20 MB before the strip. I was able to reduce them to 3 MB, but I need to reduce it further to accommodate the flash. I have seen these libraries on other boards that are around 50 KB each, so I think there is a way to reduce the size of each library. Can anyone tell me how I can do this?

  2. The name of the files is not named as arm. However, this is not an important problem for me, since I can rename them manually, but I need to know if I can change them during the compilation process.
    When I execute the file command in these libraries, I can see that they are built to arm.

constants.cpython-36m-x86_64-linux-gnu.so: 32-bit ELF shared LSB
object, ARM, EABI5 version 1 (SYSV), dynamically linked, stripped

Below is my setup.cfg file that I used to build pyzmq

(global)
# the prefix with which libzmq was configured / installed
zmq_prefix = /home/sagar/zmq/_install
have_sys_un_h = False

(build_ext)
libraries = python3.6
library_dirs = /home/sagar/python_source/arm_install_with_zmq/lib
include_dirs = /usr/include/python3.6m/
plat-name = linux-armv

(bdist_egg)
plat-name = linux-armv

Thanks in advance.

cpython – read the value of an address in memory with python

I need my script to read an external variable and take the value to a variable within the script, this external variable is of the 4-byte integer type, I have searched many solutions but no concrete and I have no other place to search, until I could use a script that the output changes according to the value of the variable but the output is in bytes and I can't decode it. I have no way out, can anyone help me?
Follow the code I am using.

    #-*- coding: cp1252 -*-

import ctypes, win32ui, win32process ,win32api

PROCESS_ALL_ACCESS = 0x1F0FFF
HWND = win32ui.FindWindow(None,"3D Pinball for Windows - Space Cadet").GetSafeHwnd()
print(HWND)
PID = win32process.GetWindowThreadProcessId(HWND)(1)
print(PID)
PROCESS = win32api.OpenProcess(PROCESS_ALL_ACCESS,0,PID)

rPM = ctypes.windll.kernel32.ReadProcessMemory
wPM = ctypes.windll.kernel32.WriteProcessMemory

ADDRESS1 = 0x0475ECC4
ADDRESS2 = ctypes.create_string_buffer(64)
pi = ctypes.pointer(ADDRESS2)
rPM(PROCESS.handle,ADDRESS1,ADDRESS2,64,None)
valor = ADDRESS2.value
print(valor)
x=ctypes.windll.kernel32.GetLastError()
print(x)

python: how to analyze this strange INI file with the .dat extension

I came across this data source to include it in my current NodeJS project and I was completely discouraged by the fact that it seems impossible to analyze by any script I try to create.

Apparently, this file is not a normal INI file: it has regular sections and the lines in the sections have no key, but these lines have fields separated by a & # 39; | & # 39; character, some of which explain the meaning of these fields and are duly commented. I would like to convert this file into a much more usable JSON, but for my life, I can't find a suitable way to handle it (mainly due to duplicate lines).

How would you do it (possibly using Python or Node)?

Example

(FIRs)
;ICAO|NAME|PREFIX POSITION|
;dummy codes for Adria CTR
ADR|Adria Radar||ADR
ADR-W|Adria Radar (West)|ADR_W|ADR-W
ADR-E|Adria Radar (East)|ADR_E|ADR-E
AGGG|Honiara||AGGG
ANAU|Nauru||ANAU
AYPM|Port Moresby||AYPM
BGGL|Sondrestrom|SFJ|BGGL
BGGL|Sondrestrom|GREN|BGGL
BIRD|Reykjavik||BIRD
BIRD-E|Reykjavik ACC (East) - Reykjavik|BIRD_E|BIRD-E
BIRD-N|Reykjavik ACC (North) - Reykjavik|BIRD_N|BIRD-N
BIRD-S|Reykjavik ACC (South) - Reykjavik|BIRD_S|BIRD-S

If you want to see the file yourself, here it is (github.com).

python: txt file `readlines ()` cannot be opened

When i use

with open('books_txt/Horror/7894.txt', "r") as fout:
    text = fout.readlines()

An error occurs:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
 in 
      1 with open('books_txt/Horror/7894.txt', "r") as fout:
----> 2     text = fout.readlines()

~/anaconda3/lib/python3.7/codecs.py in decode(self, input, final)
    320         # decode input (taking the buffer into account)
    321         data = self.buffer + input
--> 322         (result, consumed) = self._buffer_decode(data, self.errors, final)
    323         # keep undecoded input until the next call
    324         self.buffer = data(consumed:)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 74: invalid start byte

But I could open the txt file in the terminal by vim. How to open it in a python file?

To update:
I find that "rb" instead of "r" works. What is the mechanism behind this?