network – Initial Block Download on an alternative multithread client

I’m implementing a full bitcoin node from scratch, I know it’s titanic effort but I’m fine because the goal is to get all ins and out of current bitcoin core implementation while working on a totally different approach and codebase.

The project will be opensource (actually I’ve already a github repo but it’s far to be usable).

I’ve already done lot of stuff but actually I’d like to have kind of brainstorm about IBD stage.
I’m following the header first approach but I want my implementation to be multithread, validating headers from multiple peers and not from a prefered one.

My validation code is still locking allowing a single batch of headers (from the same source) and so I don’t have concurrency problems.

Now I’d like to implement the block download the same way (from multiple sources even during IBD) but I’ve to consider all the problems that this approach leads to (excluding code complexity).

So, before writing a wall of text with my thoughts, can you point me to some alternative IBD approaches on different bitcoin fullnode implementations that differs from bitcoin core implementation?

Thanks

multi-thread: a simple secure Deque for threads in C ++

I am trying to implement a secure deque for threads in C ++.
ThreadSafeDeque will be used by a FileLogger class.
When the threads call log() function of FileLogger the messages will be push_back()ed to ThreadSafeDeque and come back almost immediately. In a separate thread, the FileLogger pop_front() messages and write them in a file at your own pace.
Am I doing things correctly next?

#pragma once
#include 
#include 
template
class ThreadSafeDeque {
public:
    void pop_front_waiting(T &t) {
        // unique_lock can be unlocked, lock_guard can not
        std::unique_lock lock{ mutex }; // locks
        while(deque.empty()) {
            condition.wait(lock); // unlocks, sleeps and relocks when woken up  
        }
        t = deque.front();
        deque.pop_front();
    } // unlocks as goes out of scope

    void push_back(const T &t) {
        std::unique_lock lock{ mutex }; 
        deque.push_back(t);
        lock.unlock();
        condition.notify_one(); // wakes up pop_front_waiting  
    }
private:
    std::deque               deque;
    std::mutex                  mutex;
    std::condition_variable condition;
};  

multi-thread – Parallel MergeSort in C ++

I tried to implement MergeSort parallel in C ++, which also tracks how many comparisons are made and how many threads it uses:

#include 
#include 
#include 
#include 
#include 
#include 

int *original_array,*auxiliary_array;
std::mutex protector_of_the_global_counter;
int global_counter=0;
std::mutex protector_of_the_thread_counter;
int number_of_threads=0;


template
class Counting_Comparator {
    private:
    bool was_allocated;
    int *local_counter;
    public:
    Counting_Comparator() {
        was_allocated=true;
        local_counter=new int(0);
    }
    Counting_Comparator(int *init) {
        was_allocated=false;
        local_counter=init;
    }
    int get_count() {return *local_counter;}
    bool operator() (T first, T second) {
        (*local_counter)++;
        return first &x) {
        was_allocated=x.was_allocated;
        local_counter=x.local_counter;
    }
    ~Counting_Comparator() {
        if (was_allocated) delete local_counter;
    }
};

struct limits {
    int lower_limit,upper_limit,reccursion_depth;
};

void parallel_merge_sort(limits argument) {
    int lower_limit=argument.lower_limit;
    int upper_limit=argument.upper_limit;
    if (upper_limit-lower_limit<2) return; //An array of length less than 2 is already sorted.
    int reccursion_depth=argument.reccursion_depth;
    int middle_of_the_array=(upper_limit+lower_limit)/2;
    limits left_part={lower_limit,middle_of_the_array,reccursion_depth+1},
            right_part={middle_of_the_array,upper_limit,reccursion_depth+1};
    if (reccursion_depth comparator_functor(&local_counter);
    std::merge(original_array+lower_limit,
            original_array+middle_of_the_array,
            original_array+middle_of_the_array,
            original_array+upper_limit,
            auxiliary_array+lower_limit,
            comparator_functor);
    protector_of_the_global_counter.lock();
    global_counter+=comparator_functor.get_count();
    protector_of_the_global_counter.unlock();
    std::copy(auxiliary_array+lower_limit,
            auxiliary_array+upper_limit,
            original_array+lower_limit);
}

int main(void) {
    using std::cout;
    using std::cin;
    using std::endl;
    cout <<"Enter how many numbers you will input." <>n;
    try {
        original_array=new int(n);
        auxiliary_array=new int(n);
    }
    catch (...) {
        std::cerr <<"Not enough memory!?" <>original_array(i);
    limits entire_array={0,n,0};
    number_of_threads=1;
    clock_t processor_time=clock();
    try {
    std::thread root_of_the_reccursion(parallel_merge_sort,entire_array);
    root_of_the_reccursion.join();
    }
    catch (std::system_error error) {
        std::cerr <<"Can't create a new thread, error "" <

So what do you think about it?

multi-thread – Run Bash scripts in parallel

I would like to run a script several times in more than 10 folders in parallel. What I need to know is how to structure the arguments by number. My non-parallel script is:

For i in {1..10};
    do python myscript.py "folder_"$i;
done;

I have heard of mpirun but I am not sure how to structure the arguments by folder number or something similar.

multi-thread: concurrence of Java in practice “Listing 7.9. Interrupting a task in a dedicated thread. "What is the purpose of scheduled taskThread.interrupt ()?

I am reading Java concurrency in practice and I find the following code snippet.

public static void timedRun(final Runnable r,
                            long timeout, TimeUnit unit)
        throws InterruptedException {
    class RethrowableTask implements Runnable {
        private volatile Throwable t;
        public void run() {
            try { r.run(); }
            catch (Throwable t) { this.t = t; }
        }
        void rethrow() {
            if (t != null)
                throw launderThrowable(t);
        }
    }
    RethrowableTask task = new RethrowableTask();
    final Thread taskThread = new Thread(task);
    taskThread.start();
    cancelExec.schedule(new Runnable() {
        public void run() { taskThread.interrupt(); }
    }, timeout, unit);
    taskThread.join(unit.toMillis(timeout));
    task.rethrow();
}

timedRun The method is used to execute the task r Within a time range. This feature can be implemented by taskThread.join(unit.toMillis(timeout));. So, why do we need scheduled taskThread.interrupt();?

multi-thread: I don't know what the problem is with this question, including the Java thread synchronization question

herepackage test1;
 class SyncWait extends Thread{
 @Override
 public synchronized void run(){
 for (int i = 0; i < 2; i++){
 System.out.println(Thread.currentThread().getName());
 try {
 Thread.sleep(100);
 } catch (Exception e) {
 }
 }
 }
 public static void main(String() args) {
     SyncWait a = new SyncWait(); //Runnable Object
      Thread t1 = new Thread(a);
       t1.setName("Thread 1");
      Thread t2 = new Thread(a);
       t2.setName("Thread 2");
      Thread t3 = new Thread(a);
       t3.setName("Thread 3");
      Thread t4 = new Thread(a);

      t1.start();
      t2.start();
      t3.start();
      t4.start();
 }

First, I don't use the synchronized method, therefore, random output thread numbers come out. Bt using the response of the synchronized method is the same. My question is why the output does not come like this?

Thread 1 (2 times)
Thread 2 (2 times)
Thread 3 (2 times)
Thread 4 (2 times)

multi-thread: group of threads in C for web server

Check the following code for the thread group in C (I wrote it to implement a web server with threads for my personal project). I have been programming in C ++ and Java but I never did any serious programming in C.

threadpool.h

#ifndef THREAD_POOL_H
#define THREAD_POOL_H

#include 

#define MAX_THREAD 10

typedef struct {
    void *(*function) (void *);
} runnable_task;


typedef struct safe_queue_node {
    runnable_task *task;
    struct safe_queue_node *next;
} safe_queue_node;


typedef struct safe_queue{
    struct safe_queue_node *head;
    struct safe_queue_node *tail;
    pthread_mutex_t mutex;
} safe_queue;


typedef struct {
    pthread_t threads(MAX_THREAD);
    safe_queue *queue;
    int count;
    volatile int should_close;
} thread_pool;


runnable_task *
new_runnable_task(
        void *(*function) (void *));


thread_pool *
new_thread_pool();


void
free_thread_pool(
        thread_pool *pool);


void
add_task_to_pool(
        thread_pool *pool,
        runnable_task *task);


void
shutdown_thread_pool(
        thread_pool *pool);

#endif

threadpool.c

#include "threadpool.h"
#include 

runnable_task *
new_runnable_task(
        void *(*function) (void *)) {
    runnable_task *task = malloc(sizeof(runnable_task));
    task->function = function;
    return task;
}


safe_queue_node *
_new_safe_queue_node(
        runnable_task *task) {
    safe_queue_node *node = malloc(sizeof(safe_queue_node));
    node->task = task;
    node->next = NULL;
    return node;
}


safe_queue *
_new_safe_queue() {
    safe_queue *queue = malloc(sizeof(safe_queue));
    pthread_mutex_init(&queue->mutex, NULL);
    queue->head = NULL;
    queue->tail = NULL;
    return queue;
}


void
_free_safe_queue(
        safe_queue *queue) {
    free(queue);
}


runnable_task *
_get_from_queue(
        safe_queue *queue) {
    runnable_task *task = NULL;

    pthread_mutex_lock(&queue->mutex);
    if (queue->head != NULL) {
        task = queue->head->task;
        queue->head = queue->head->next;
        if (queue->head == NULL)
            queue->tail = NULL;
    }
    pthread_mutex_unlock(&queue->mutex);

    return task;
}


void *
_run_task(
        void *arg) {
    thread_pool *pool = (thread_pool *) arg;
    while (!pool->should_close) {
        runnable_task *task = _get_from_queue(pool->queue);
        if (task == NULL)
            continue;
        task->function(NULL);
    }

    return NULL;
}


thread_pool *
new_thread_pool(
        int count) {
    thread_pool *pool = malloc(sizeof(thread_pool));
    pool->queue = _new_safe_queue();
    pool->count = count <= MAX_THREAD ? count : MAX_THREAD;
    pool->should_close = 0;
    for (int i = 0; i < pool->count; i++) {
        pthread_create(pool->threads + i, NULL, _run_task, pool);
    }

    return pool;
}


void
free_thread_pool(
        thread_pool *pool) {
    _free_safe_queue(pool->queue);
    free(pool);
}


void
_add_to_queue(
        safe_queue *queue,
        runnable_task *task) {
    pthread_mutex_lock(&queue->mutex);
    if (queue->head == NULL) {
        queue->head = queue->tail = _new_safe_queue_node(task);
    } else {
        queue->tail->next = _new_safe_queue_node(task);
        queue->tail = queue->tail->next;
    }
    pthread_mutex_unlock(&queue->mutex);
}


void
add_task_to_pool(
        thread_pool *pool,
        runnable_task *task) {
    _add_to_queue(pool->queue, task);
}


void
shutdown_thread_pool(
        thread_pool *pool) {
    while (pool->queue->head != NULL) ;

    pool->should_close = 1;

    for (int i = 0; i < pool->count; i++)
        pthread_join(pool->threads(i), NULL);
}

multi-thread: Java thread security and concurrency, concept of the book I did not understand

Quoted in the book that a critical region has 4 key points.

1. Progresso: guarantees that all threads enter and leave the critical region, avoiding dead spots.

2. Mutualmente Exclusivo: Only one thread can be acting in the critical region, the others must wait their turn.

3. Bounded Waiting: guarantees that each thread after a time limit is allowed within the critical region. Preventing a thread was too long without progress due to the lack of "luck" in the selection.

and what made no sense to me:

4. No Assumptions on Timing: What does that mean?

Also, if any concept prior to the room is incorrect or incomplete, correct me.

and one last concept that I did not fully understand with respect to other types of competition:

When using semaphores there is no spinning, hence no waste of resources due to busy waiting.

Don't traffic lights use the same concept as while (not true) {…}?

and finally if you can explain to me the concept of monitor Synchronized, as I understand it, the monitor are synchronizations that allow you to use locks, but I think there is something else.

Thank you

multi-thread: simple producer-consumer implementation in Python

I implemented a simple solution for the Producer-consumer problem
I would love you to take a look.

The producer simply adds random numbers to a queue and the consumer (from a separate thread) takes out the numbers from the queue and prints them. Specifically I would like to receive comments on the concurrence aspects of this implementation. Thank you!

from collections import deque
import random
import threading


TIMES = 100


class SafeQueue:
  def __init__(self, capacity):
    self.capacity = capacity
    self.queue = deque(())
    self.remaining_space = threading.Semaphore(capacity)
    self.fill_count = threading.Semaphore(0)

  def append(self, item):
    self.remaining_space.acquire()
    self.queue.append(item)
    self.fill_count.release()

  def consume(self):
    self.fill_count.acquire()
    item = self.queue.popleft()
    self.remaining_space.release()
    return item


class Producer:
  def __init__(self, queue, times=TIMES):
    self.queue = queue
    self.times = times

  def run(self):
    for _ in range(self.times):
      self.queue.append(random.randint(0, 100))


class Consumer:
  def __init__(self, queue, times=TIMES):
    self.queue = queue
    self.times = times

  def run(self):
    for _ in range(self.times):
      print(self.queue.consume())


def main():
  queue = SafeQueue(10)

  producer = Producer(queue)
  producer_thread = threading.Thread(target=producer.run)
  consumer = Consumer(queue)
  consumer_thread = threading.Thread(target=consumer.run)

  producer_thread.start()
  consumer_thread.start()

  producer_thread.join()
  consumer_thread.join()


if __name__ == "__main__":
  main()

python – Amazon multi-thread ultra-fast scraper

This is a code tracking here: web scraper that extracts URL from Amazon and eBay
A modification of multiple threads to the previous version that Amazon supports and most of the necessary documentation is found in the document chains.

You will find a copy of the source code, as well as the necessary files here, including (proxies.txt, amazon_log.txt, user_agents.txt) to lock in the same folder as the code.

features:

  • Multi-thread scraping of contents.
  • Save URL in .txt files
  • Scrape sections of Amazon that include: best sellers, new releases, the most desired for …
  • Save names in .txt files.
  • Map names to urls.
  • Content caching for later reuse.
  • Extraction of product features that include (name, title, url, features, technical details …

I will implement another class that manages this with public methods that organize files in csv / json files and perform an analysis of data and optimizations to reduce / eliminate failures. I will post follow-ups when I finish.

For reviewers:

  • Modifications: I made many modifications to this version and it is completely different from the previous one. It is only focused on Amazon and many unnecessary previous methods parameters print_progress, cleanup_empty They are now class attributes. Sequential extraction is now optional, as is multithreaded extraction, which is 500 times faster. The document chains are updated and completely changed in terms of style and content. The code is much more organized in this version and is much more readable.
  • Shorter code suggestions: I want to shorten the code and eliminate repetition (if it exists), most of the code is free of repetition, but the tasks are repetitive in generally different ways.
  • Representatives and user agents: Regarding the responses collected using the _get_response() method are proxies and and headers parameters doing the necessary work? Are the proxies working this way? Is there any improvement that can be made?
  • Occasional random failures: There are occasional and random cases of failure to extract features in sections that do not include the best sellers or the most desired. Why do these failures sometimes occur and sometimes not? and how to control this and get the lowest possible failure rate?
  • Private methods: The methods defined here are private _private() because this class will be used by another class that manages the extraction and will mainly contain public methods.
  • Suggestions: General suggestions to improve the code are welcome and do not hesitate to ask questions if you need to clarify things.

Code

#!/usr/bin/env python3
from requests.exceptions import HTTPError, ConnectionError, ConnectTimeout
from concurrent.futures import ThreadPoolExecutor, as_completed
from bs4 import BeautifulSoup
from time import perf_counter
from random import choice
import requests
import bs4
import os


class AmazonScraper:
    """
    A tool to scrape Amazon different sections.

    Sections:
    Best Sellers - New Releases - Gift Ideas - Movers and Shakers - Most Wished For.

    Features:
    Category/Subcategory Urls and names.
    Product Urls and details(title, features, technical details, price, review count)
    """

    def __init__(
            self, path=None, print_progress=False, cache_contents=True, cleanup_empty=True, threads=1, log=None):
        """
        Args:
            path: Folder path to save scraped and cached contents.
            print_progress: If True then the progress will be displayed.
            cache_contents: If True then the scraped contents will be cached for further re-use.
            cleanup_empty: If True, empty .txt files that might result will be deleted.
            threads: If number of threads(1 by default) is increased, multiple threads will be used.
            log: If print_progress is True, content will be saved to the log (a file name + .txt).
        """
        if not path:
            self.path = '/Users/user_name/Desktop/Amazon Scraper/'
        if path:
            self.path = path
        self.headers = ({'User-Agent': item.rstrip()} for item in open('user_agents.txt').readlines())
        self.print_progress = print_progress
        self.cache_contents = cache_contents
        self.cleanup_empty = cleanup_empty
        self.session = requests.session()
        self.threads = threads
        if log:
            if log in os.listdir(self.path):
                os.remove(log)
            self.log_file = log
            self.log = open(log, 'w')
        self.proxies = ({'https:': 'https://' + item.rstrip(), 'http':
                        'http://' + item.rstrip()} for item in open('proxies.txt').readlines())
        self.modes = {'bs': 'Best Sellers', 'nr': 'New Releases', 'gi': 'Gift Ideas',
                      'ms': 'Movers and Shakers', 'mw': 'Most Wished For'}
        self.starting_target_urls = 
            {'bs': ('https://www.amazon.com/gp/bestsellers/', 'https://www.amazon.com/Best-Sellers'),
             'nr': ('https://www.amazon.com/gp/new-releases/', 'https://www.amazon.com/gp/new-releases/'),
             'ms': ('https://www.amazon.com/gp/movers-and-shakers/', 'https://www.amazon.com/gp/movers-and-shakers/'),
             'gi': ('https://www.amazon.com/gp/most-gifted/', 'https://www.amazon.com/gp/most-gifted'),
             'mw': ('https://www.amazon.com/gp/most-wished-for/', 'https://www.amazon.com/gp/most-wished-for/')}

    def _cache_main_category_urls(self, text_file_names: dict, section: str, category_class: str,
                                  content_path: str, categories: list):
        """
        Cache the main category/subcategory URLs to .txt files.
        Args:
            text_file_names: Section string indications mapped to their corresponding .txt filenames.
            section: Keyword indication of target section.
                'bs': Best Sellers
                'nr': New Releases
                'ms': Movers & Shakers
                'gi': Gift Ideas
                'mw': Most Wished For
            category_class: Category level indication 'categories' or 'subcategories'.
            content_path: Path to folder to save cached files.
            categories: The list of category/subcategory urls to be saved.
        Return:
             None
        """
        os.chdir(content_path + 'Amazon/')
        with open(text_file_names(section)(category_class), 'w') as cats:
            for category in categories:
                cats.write(category + 'n')
                if self.print_progress:
                    if not open(text_file_names(section)(category_class)).read().isspace():
                        print(f'Saving {category} ... done.')
                        if self.log:
                            print(f'Saving {category} ... done.', file=self.log, end='n')
                    if open(text_file_names(section)(category_class)).read().isspace():
                        print(f'Saving {category} ... failure.')
                        if self.log:
                            print(f'Saving {category} ... failure.', file=self.log, end='n')
        if self.cleanup_empty:
            self._cleanup_empty_files(self.path)

    def _read_main_category_urls(self, text_file_names: dict, section: str, category_class: str, content_path: str):
        """
        Read the main category/subcategory cached urls from their respective .txt files.
        Args:
            text_file_names: Section string indications mapped to their corresponding .txt filenames.
            section: Keyword indication of target section.
                'bs': Best Sellers
                'nr': New Releases
                'ms': Movers & Shakers
                'gi': Gift Ideas
                'mw': Most Wished For
            category_class: Category level indication 'categories' or 'subcategories'.
            content_path: Path to folder to save cached files.
        Return:
             A list of the main category/subcategory urls specified.
        """
        os.chdir(content_path + 'Amazon')
        if text_file_names(section)(category_class) in os.listdir(content_path + 'Amazon/'):
            with open(text_file_names(section)(category_class)) as cats:
                if self.cleanup_empty:
                    self._cleanup_empty_files(self.path)
                return (link.rstrip() for link in cats.readlines())

    def _get_response(self, url):
        """
        Send a get request to target url.
        Args:
            url: Target Url.
        Return:
             Response object.
        """
        return self.session.get(url, headers=choice(self.headers), proxies=choice(self.proxies))

    def _scrape_main_category_urls(self, section: str, category_class: str, prev_categories=None):
        """
        Scrape links of all main category/subcategory Urls of the specified section.
        Args:
            section: Keyword indication of target section.
                'bs': Best Sellers
                'nr': New Releases
                'ms': Movers & Shakers
                'gi': Gift Ideas
                'mw': Most Wished For
            category_class: Category level indication 'categories' or 'subcategories'.
            prev_categories: A list containing parent category Urls.
        Return:
             A sorted list of scraped category/subcategory Urls.
        """
        target_url = self.starting_target_urls(section)(1)
        if category_class == 'categories':
            starting_url = self._get_response(self.starting_target_urls(section)(0))
            html_content = BeautifulSoup(starting_url.text, features='lxml')
            target_url_part = self.starting_target_urls(section)(1)
            if not self.print_progress:
                return sorted({str(link.get('href')) for link in html_content.findAll('a')
                               if target_url_part in str(link)})
            if self.print_progress:
                categories = set()
                for link in html_content.findAll('a'):
                    if target_url_part in str(link):
                        link_to_add = str(link.get('href'))
                        categories.add(link_to_add)
                        print(f'Fetched {self.modes(section)}-{category_class(:-3)}y: {link_to_add}')
                        if self.log:
                            print(f'Fetched {self.modes(section)}-{category_class(:-3)}y: '
                                  f'{link_to_add}', file=self.log, end='n')
                return categories
        if category_class == 'subcategories':
            if not self.print_progress:
                if self.threads == 1:
                    responses = (self._get_response(category)
                                 for category in prev_categories)
                    category_soups = (BeautifulSoup(response.text, features='lxml') for response in responses)
                    pre_sub_category_links = (str(link.get('href')) for category in category_soups
                                              for link in category.findAll('a') if target_url in str(link))
                    return sorted({link for link in pre_sub_category_links if link not in prev_categories})
                if self.threads > 1:
                    with ThreadPoolExecutor(max_workers=self.threads) as executor:
                        future_html = {
                            executor.submit(self._get_response, category): category for category in prev_categories}
                        responses = (future.result() for future in as_completed(future_html))
                        category_soups = (BeautifulSoup(response.text) for response in responses)
                        pre_sub_category_links = (str(link.get('href')) for category in category_soups
                                                  for link in category.findAll('a') if target_url in str(link))
                        return sorted({link for link in pre_sub_category_links if link not in prev_categories})
            if self.print_progress:
                if self.threads == 1:
                    responses, pre, subcategories = (), (), set()
                    for category in prev_categories:
                        response = self._get_response(category)
                        responses.append(response)
                        print(f'Got response {response} for {self.modes(section)}-{category}')
                        if self.log:
                            print(f'Got response {response} for {self.modes(section)}-{category}',
                                  file=self.log, end='n')

                    category_soups = (BeautifulSoup(response.text, features='lxml') for response in responses)
                    for soup in category_soups:
                        for link in soup.findAll('a'):
                            if target_url in str(link):
                                fetched_link = str(link.get('href'))
                                pre.append(fetched_link)
                                print(f'Fetched {self.modes(section)}-{fetched_link}')
                                if self.log:
                                    print(f'Fetched {self.modes(section)}-{fetched_link}', file=self.log,
                                          end='n')
                    return sorted({link for link in pre if link not in prev_categories})
                if self.threads > 1:
                    with ThreadPoolExecutor(max_workers=self.threads) as executor:
                        category_soups = ()
                        future_responses = {
                            executor.submit(self._get_response, category): category for category in prev_categories}
                        for future in as_completed(future_responses):
                            url = future_responses(future)
                            try:
                                response = future.result()
                                print(f'Got response {response} for {self.modes(section)}-{url}')
                                if self.log:
                                    print(f'Got response {response} for {self.modes(section)}-{url}',
                                          file=self.log, end='n')
                            except(HTTPError, ConnectTimeout, ConnectionError):
                                print(f'Failed to get response from {url}')
                                if self.log:
                                    print(f'Failed to get response from {url}', file=self.log, end='n')
                            else:
                                category_soups.append(BeautifulSoup(response.text, features='lxml'))
                        pre_sub_category_links = (str(link.get('href')) for category in category_soups
                                                  for link in category.findAll('a') if target_url in str(link))
                        return sorted({link for link in pre_sub_category_links if link not in prev_categories})

    def _get_main_category_urls(self, section: str, subs=True):
        """
        Manage the scrape/read from previous session cache operations and return section Urls.
        If the program found previously cached files, will read and return existing data, else
        new content will be scraped and returned.
        Args:
            section: Keyword indication of target section.
                'bs': Best Sellers
                'nr': New Releases
                'ms': Movers & Shakers
                'gi': Gift Ideas
                'mw': Most Wished For
            subs: If False, only categories will be returned.
        Return:
            2 sorted lists: categories and subcategories.
        """
        text_file_names = 
            {section_short: {'categories': self.modes(section_short) + ' Category Urls.txt',
                             'subcategories': self.modes(section_short) + ' Subcategory Urls.txt'}
             for section_short in self.modes}
        if 'Amazon' not in os.listdir(self.path):
            os.mkdir('Amazon')
            os.chdir(self.path + 'Amazon')
        if 'Amazon' in os.listdir(self.path):
            categories = self._read_main_category_urls(text_file_names, section, 'categories', self.path)
            if not subs:
                if self.cleanup_empty:
                    self._cleanup_empty_files(self.path)
                return sorted(categories)
            subcategories = self._read_main_category_urls(text_file_names, section, 'subcategories', self.path)
            try:
                if categories and subcategories:
                    if self.cleanup_empty:
                        self._cleanup_empty_files(self.path)
                    return sorted(categories), sorted(subcategories)
            except UnboundLocalError:
                pass
        if not subs:
            categories = self._scrape_main_category_urls(section, 'categories')
            if self.cache_contents:
                self._cache_main_category_urls(text_file_names, section, 'categories', self.path, categories)
            if self.cleanup_empty:
                self._cleanup_empty_files(self.path)
            return sorted(categories)
        if subs:
            categories = self._scrape_main_category_urls(section, 'categories')
            if self.cache_contents:
                self._cache_main_category_urls(text_file_names, section, 'categories', self.path, categories)
            subcategories = self._scrape_main_category_urls(section, 'subcategories', categories)
            if self.cache_contents:
                self._cache_main_category_urls(text_file_names, section, 'subcategories', self.path, subcategories)
            if self.cleanup_empty:
                self._cleanup_empty_files(self.path)
            return sorted(categories), sorted(subcategories)

    def _extract_page_product_urls(self, page_url: str):
        """
        Extract product Urls from an Amazon page and the page title.
        Args:
            page_url: Target page.
        Return:
             The page category title(string) and a sorted list of product Urls.
        """
        prefix = 'https://www.amazon.com'
        response = self._get_response(page_url)
        soup = BeautifulSoup(response.text, features='lxml')
        try:
            title = soup.h1.text.strip()
        except AttributeError:
            title = 'N/A'
        product_links = {prefix + link.get('href') for link in soup.findAll('a') if 'psc=' in str(link)}
        return title, sorted(product_links)

    @staticmethod
    def _cleanup_empty_files(dir_path: str):
        """
        Cleanup a given folder from empty .txt files.
        Args:
            dir_path: Path to the target folder to be cleaned up.
        Return:
             None
        """
        for file_name in (file for file in os.listdir(dir_path)):
            if not os.path.isdir(file_name):
                try:
                    contents = open(file_name).read().strip()
                    if not contents:
                        os.remove(file_name)
                except(UnicodeDecodeError, FileNotFoundError):
                    pass

    def _category_page_title_to_url(self, section: str, category_class: str, delimiter='&&&'):
        """
        Map category/subcategory names to their respective Urls.
        Args:
        section:
            'bs': Best Sellers
            'nr': New Releases
            'ms': Movers & Shakers
            'gi': Gift Ideas
            'mw': Most Wished For
        category_class: Category level indication 'categories' or 'subcategories'.
        delimiter: Delimits category/subcategory names and their respective Urls in the .txt files.
        Return:
             A list of lists(pairs): ((category/subcategory name, Url), ...)
        """
        file_names = {'categories': self.modes(section) + ' Category Names.txt',
                      'subcategories': self.modes(section) + ' Subcategory Names.txt'}
        names_urls = ()
        os.chdir(self.path)
        if 'Amazon' in os.listdir(self.path):
            os.chdir('Amazon')
            file_name = file_names(category_class)
            if file_name in os.listdir(self.path + 'Amazon'):
                with open(file_name) as names:
                    if self.cleanup_empty:
                        self._cleanup_empty_files(self.path)
                    return (line.rstrip().split(delimiter) for line in names.readlines())
        if 'Amazon' not in os.listdir(self.path):
            os.mkdir('Amazon')
            os.chdir('Amazon')
        categories, subcategories = self._get_main_category_urls(section)
        if not self.print_progress:
            if self.threads == 1:
                responses_urls = ((self._get_response(url), url)
                                  for url in eval('eval(category_class)'))
                soups_urls = ((BeautifulSoup(item(0).text, features='lxml'), item(1)) for item in responses_urls)
                for soup, url in soups_urls:
                    try:
                        title = soup.h1.text.strip()
                        names_urls.append((title, url))
                    except AttributeError:
                        pass
            if self.threads > 1:
                with ThreadPoolExecutor(max_workers=self.threads) as executor:
                    future_responses = {
                        executor.submit(self._get_response, category): category
                        for category in eval('eval(category_class)')}
                    responses = (future.result() for future in as_completed(future_responses))
                    responses_urls = (
                        (response, url) for response, url in zip(responses, eval('eval(category_class)')))
                    soups_urls = (
                        (BeautifulSoup(item(0).text, features='lxml'), item(1)) for item in responses_urls)
                    for soup, url in soups_urls:
                        try:
                            title = soup.h1.text.strip()
                            names_urls.append((title, url))
                        except AttributeError:
                            pass
        if self.print_progress:
            if self.threads == 1:
                for url in eval('eval(category_class)'):
                    response = self._get_response(url)
                    print(f'Got response {response} for {url}')
                    print(f'Fetching name of {url} ...')
                    if self.log:
                        print(f'Got response {response} for {url}', file=self.log, end='n')
                        print(f'Fetching name of {url} ...', file=self.log, end='n')

                    soup = BeautifulSoup(response.text, features='lxml')
                    try:
                        title = soup.h1.text.strip()
                        names_urls.append((title, url))
                        print(f'Fetching name {title} ... done')
                        if self.log:
                            print(f'Fetching name {title} ... done', file=self.log, end='n')
                    except AttributeError:
                        print(f'Fetching name failure for {url}')
                        if self.log:
                            print(f'Fetching name failure for {url}', file=self.log, end='n')
            if self.threads > 1:
                with ThreadPoolExecutor(max_workers=self.threads) as executor:
                    future_responses = {
                        executor.submit(self._get_response, category): category
                        for category in eval('eval(category_class)')}
                    for future_response in as_completed(future_responses):
                        response = future_response.result()
                        url = future_responses(future_response)
                        print(f'Got response {response} for {url}')
                        if self.log:
                            print(f'Got response {response} for {url}', file=self.log, end='n')
                        soup = BeautifulSoup(response.text, features='lxml')
                        try:
                            title = soup.h1.text.strip()
                            names_urls.append((title, url))
                            print(f'Fetching name {title} ... done')
                            if self.log:
                                print(f'Fetching name {title} ... done', file=self.log, end='n')
                        except AttributeError:
                            print(f'Fetching name failure for {url}')
                            if self.log:
                                print(f'Fetching name failure for {url}', file=self.log, end='n')

            if self.cache_contents:
                with open(file_names(category_class), 'w') as names:
                    for name, url in names_urls:
                        names.write(name + delimiter + url + 'n')
            if self.cleanup_empty:
                self._cleanup_empty_files(self.path + 'Amazon')
        return names_urls

    def _extract_section_products(self, section: str, category_class: str):
        """
        For every category/subcategory successfully scraped from the given section, product urls will be extracted.
        Args:
            section:
                'bs': Best Sellers
                'nr': New Releases
                'ms': Movers & Shakers
                'gi': Gift Ideas
                'mw': Most Wished For
            category_class: Category level indication 'categories' or 'subcategories'.
        Return:
             List of tuples(category name, product urls) containing product Urls for each scraped category/subcategory.
        """
        products = ()
        names_urls = self._category_page_title_to_url(section, category_class)
        urls = (item(1) for item in names_urls)
        folder_name = ' '.join((self.modes(section), category_class(:-3).title() + 'y', 'Product Urls'))
        if not self.print_progress:
            if self.threads == 1:
                products = (
                    (category_name, (product_url for product_url in self._extract_page_product_urls(category_url)(1)))
                    for category_name, category_url in names_urls)
                products = (item for item in products if item(1))
            if self.threads > 1:
                with ThreadPoolExecutor(max_workers=self.threads) as executor:
                    future_products = {executor.submit(self._extract_page_product_urls, category_url): category_url
                                       for category_url in urls}
                    products = (future.result() for future in as_completed(future_products))
                    products = (item for item in products if item(1))
        if self.print_progress:
            products = ()
            if self.threads == 1:
                for category_name, category_url in names_urls:
                    product_urls = self._extract_page_product_urls(category_url)
                    if product_urls(1):
                        print(f'Extraction of {category_name} products ... done')
                        if self.log:
                            print(f'Extraction of {category_name} products ... done', file=self.log, end='n')
                        products.append(product_urls)
                    else:
                        print(f'Extraction of {category_name} products ... failure')
                        if self.log:
                            print(f'Extraction of {category_name} products ... failure', file=self.log, end='n')
            if self.threads > 1:
                with ThreadPoolExecutor(max_workers=self.threads) as executor:
                    future_products = {executor.submit(self._extract_page_product_urls, category_url): category_url
                                       for category_url in urls}
                    for future in as_completed(future_products):
                        category_name, category_urls = future.result()
                        if category_urls:
                            print(f'Extraction of {category_name} products ... done')
                            if self.log:
                                print(f'Extraction of {category_name} products ... done', file=self.log, end='n')
                            products.append((category_name, category_urls))
                        else:
                            print(f'Extraction of {category_name} products ... failure')
                            if self.log:
                                print(f'Extraction of {category_name} products ... failure', file=self.log, end='n')
        if self.cache_contents:
            if folder_name not in os.listdir(self.path + 'Amazon'):
                os.mkdir(folder_name)
            os.chdir(folder_name)
            for category_name, category_product_urls in products:
                with open(category_name + '.txt', 'w') as links:
                    for url in category_product_urls:
                        links.write(url + 'n')
        if self.cleanup_empty:
            self._cleanup_empty_files(self.path + 'Amazon/' + folder_name)
        return products

    def _get_amazon_product_details(self, product_url: str):
        """
        Extract product details including:
            (Price, Title, URL, Rating, Number of reviews, Sold by, Features, Technical table)
        Args:
            product_url: Target product.
        Return:
            A dictionary with the scraped details.
        """
        product_html_details, text_details = {}, {}
        response = self._get_response(product_url).text
        html_content = BeautifulSoup(response, features='lxml')
        product_html_details('Price') = html_content.find('span', {'id': 'price_inside_buybox'})
        product_html_details('Url') = product_url
        product_html_details('Title') = html_content.title
        product_html_details('Rating') = html_content.find('span',
                                                           {'class': 'reviewCountTextLinkedHistogram noUnderline'})
        product_html_details('Number of reviews') = html_content.find('span', {'id': 'acrCustomerReviewText'})
        product_html_details('Sold by') = html_content.find('a', {'id': 'bylineInfo'})
        product_html_details('Features') = html_content.find('div', {'id': 'feature-bullets'})
        if product_html_details('Features'):
            product_html_details('Features') = product_html_details('Features').findAll('li')
        technical_table = html_content.find('table', {'class': 'a-keyvalue prodDetTable'})
        if technical_table:
            product_html_details('Technical details') = list(
                zip((item.text.strip() for item in technical_table.findAll('th')),
                    (item.text.strip() for item in technical_table.findAll('td'))))
        for item in product_html_details:
            if isinstance(product_html_details(item), bs4.element.Tag):
                text_details(item) = product_html_details(item).text.strip()
            if isinstance(product_html_details(item), bs4.element.ResultSet):
                text_details(item) = ' • '.join((tag.text.strip() for tag in product_html_details(item)))
            if isinstance(product_html_details(item), str):
                text_details(item) = product_html_details(item)
            if item == 'Technical details':
                text_details(item) = ' • '.join((' : '.join(pair) for pair in product_html_details(item)))
        return text_details


if __name__ == '__main__':
    start_time = perf_counter()
    path = input('Enter path to save files: ')
    session = AmazonScraper(print_progress=True, threads=20, log='amazon_log.txt', path=path)
    print(session._extract_section_products('bs', 'categories'))
    print(session._extract_section_products('bs', 'subcategories'))
    end_time = perf_counter()
    print(f'Time: {end_time - start_time} seconds.')