python: download images (or videos) from Instagram with Selenium

Python script that can download images and videos from public and private profiles, such as Photo Gallery or Videos. Save the data in the folder.

How does it work:

  • Login to Instragram with selenium and navigate to profile

  • Check the availability of the Instagram profile if it is private or existing

  • Create a folder with the name of your choice

  • Collecting URLs from images and videos

  • Use threads and multiprocessing to improve execution speed

My code:

from pathlib import Path
import requests
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from multiprocessing.dummy import Pool
import urllib.parse
from concurrent.futures import ThreadPoolExecutor
from typing import *
import argparse


class PrivateException(Exception):
    pass


class InstagramPV:

    def __init__(self, username: str, password: str, folder: Path, profile_name: str):
        """

        :param username: Username or E-mail for Log-in in Instagram
        :param password: Password for Log-in in Instagram
        :param folder: Folder name that will save the posts
        :param profile_name: The profile name that will search
        """
        self.username = username
        self.password = password
        self.folder = folder
        self.http_base = requests.Session()
        self.profile_name = profile_name
        self.links: List(str) = ()
        self.pictures: List(str) = ()
        self.videos: List(str) = ()
        self.url: str = 'https://www.instagram.com/{name}/'
        self.posts: int = 0
        self.MAX_WORKERS: int = 8
        self.N_PROCESSES: int = 8
        self.driver = webdriver.Chrome()

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.http_base.close()
        self.driver.close()

    def check_availability(self) -> None:
        """
        Checking Status code, Taking number of posts, Privacy and followed by viewer
        Raise Error if the Profile is private and not following by viewer
        :return: None
        """
        search = self.http_base.get(self.url.format(name=self.profile_name), params={'__a': 1})
        search.raise_for_status()

        load_and_check = search.json()
        self.posts = load_and_check.get('graphql').get('user').get('edge_owner_to_timeline_media').get('count')
        privacy = load_and_check.get('graphql').get('user').get('is_private')
        followed_by_viewer = load_and_check.get('graphql').get('user').get('followed_by_viewer')
        if privacy and not followed_by_viewer:
            raise PrivateException('(!) Account is private')

    def create_folder(self) -> None:
        """Create the folder name"""
        self.folder.mkdir(exist_ok=True)

    def login(self) -> None:
        """Login To Instagram"""
        self.driver.get('https://www.instagram.com/accounts/login')
        WebDriverWait(self.driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, 'form')))
        self.driver.find_element_by_name('username').send_keys(self.username)
        self.driver.find_element_by_name('password').send_keys(self.password)
        submit = self.driver.find_element_by_tag_name('form')
        submit.submit()

        """Check For Invalid Credentials"""
        try:
            var_error = WebDriverWait(self.driver, 4).until(EC.presence_of_element_located((By.CLASS_NAME, 'eiCW-')))
            raise ValueError(var_error.text)
        except TimeoutException:
            pass

        try:
            """Close Notifications"""
            notifications = WebDriverWait(self.driver, 20).until(
                EC.presence_of_element_located((By.XPATH, '//button(text()="Not Now")')))
            notifications.click()
        except NoSuchElementException:
            pass

        """Taking cookies"""
        cookies = {
            cookie('name'): cookie('value')
            for cookie in self.driver.get_cookies()
        }

        self.http_base.cookies.update(cookies)

        """Check for availability"""
        self.check_availability()

        self.driver.get(self.url.format(name=self.profile_name))

        self.scroll_down()

    def posts_urls(self) -> None:
        """Taking the URLs from posts and appending in self.links"""
        elements = self.driver.find_elements_by_xpath('//a(@href)')
        for elem in elements:
            urls = elem.get_attribute('href')
            if 'p' in urls.split('/'):
                if urls not in self.links:
                    self.links.append(urls)

    def scroll_down(self) -> None:
        """Scrolling down the page and taking the URLs"""
        last_height = self.driver.execute_script('return document.body.scrollHeight')
        while True:
            self.driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
            time.sleep(1)
            self.posts_urls()
            time.sleep(1)
            new_height = self.driver.execute_script("return document.body.scrollHeight")
            if new_height == last_height:
                break
            last_height = new_height
        self.submit_links()

    def submit_links(self) -> None:
        """Gathering Images and Videos and pass to function  Using ThreadPoolExecutor"""

        self.create_folder()

        print('(!) Ready for video - images'.title())
        print(f'(*) extracting {len(self.links)} posts , please wait...'.title())

        new_links = (urllib.parse.urljoin(link, '?__a=1') for link in self.links)
        with ThreadPoolExecutor(max_workers=self.MAX_WORKERS) as executor:
            for link in new_links:
                executor.submit(self.fetch_url, link)

    def get_fields(self, nodes: Dict, *keys) -> Any:
        """
        :param nodes: The json data from the link using only the first two keys 'graphql' and 'shortcode_media'
        :param keys: Keys that will be add to the nodes and will have the results of 'type' or 'URL'
        :return: The value of the key 
        """
        fields = nodes('graphql')('shortcode_media')
        for key in keys:
            fields = fields(key)
        return fields

    def fetch_url(self, url: str) -> None:
        """
        This function extracts images and videos
        :param url: Taking the url
        :return None
        """
        logging_page_id = self.http_base.get(url.split()(0)).json()
        if self.get_fields(logging_page_id, '__typename') == 'GraphImage':
            image_url = self.get_fields(logging_page_id, 'display_url')
            self.pictures.append(image_url)

        elif self.get_fields(logging_page_id, '__typename') == 'GraphVideo':
            video_url = self.get_fields(logging_page_id, 'video_url')
            self.videos.append(video_url)

        elif self.get_fields(logging_page_id, '__typename') == 'GraphSidecar':
            for sidecar in self.get_fields(logging_page_id, 'edge_sidecar_to_children', 'edges'):
                if sidecar('node')('__typename') == 'GraphImage':
                    image_url = sidecar('node')('display_url')
                    self.pictures.append(image_url)
                else:
                    video_url = sidecar('node')('video_url')
                    self.videos.append(video_url)
        else:
            print(f'Warning {url}: has unknown type of {self.get_fields(logging_page_id,"__typename")}')

    def download_video(self, new_videos: Tuple(int, str)) -> None:
        """
        Saving the video content
        :param new_videos: Tuple(int,str)
        :return: None
        """
        number, link = new_videos

        with open(self.folder / f'Video{number}.mp4', 'wb') as f:
            content_of_video = self.http_base.get(link).content
            f.write(content_of_video)

    def images_download(self, new_pictures: Tuple(int, str)) -> None:
        """
        Saving the picture content
        :param new_pictures: Tuple(int, str)
        :return: None
        """

        number, link = new_pictures
        with open(self.folder / f'Image{number}.jpg', 'wb') as f:
            content_of_picture = self.http_base.get(link).content
            f.write(content_of_picture)

    def downloading_video_images(self) -> None:
        """Using multiprocessing for Saving Images and Videos"""
        print('(*) ready for saving images and videos!'.title())
        picture_data = enumerate(self.pictures)
        video_data = enumerate(self.videos)
        pool = Pool(self.N_PROCESSES)
        pool.map(self.images_download, picture_data)
        pool.map(self.download_video, video_data)
        print('(+) Done')


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('-U', '--username', help='Username or your email of your account', action='store',
                        required=True)
    parser.add_argument('-P', '--password', help='Password of your account', action='store', required=True)
    parser.add_argument('-F', '--filename', help='Filename for storing data', action='store', required=True)
    parser.add_argument('-T', '--target', help='Profile name to search', action='store', required=True)
    args = parser.parse_args()
    with InstagramPV(args.username, args.password, Path(args.filename), args.target) as pv:
        pv.login()
        pv.downloading_video_images()

if __name__ == '__main__':
    main()

Use:
myfile.py -U myemail@hotmail.com -P mypassword -F Mynamefile -T stackoverjoke

Changes:

1) Changed the function of scroll_down

2) Added get_fields

My previous comparative review tag: Instagram scraping posts with Selenium

python: concatenate multiple images at once, vertically divided equally

I have an internal folder with 100 images, I want to join them vertically. So I want to crop the image evenly.
Example image 6: 800 x 1600 + 800 x 7000 + 800 x 12000 + 800 x 15000 + 800 x 25000 + 800 x 8000 = 800 x 68600

Now, I want to divide the image evenly with the size, vertically
and I have 57 images with 800 x 1200 resolution, and 1 image: 800 x 200
and numbered from 1 to 58

Thanks for taking the time for me.

enter the image description here

enter the image description here

enter the image description here

My code :

from PIL import Image
import cv2
import os
import glob


im1 = Image.open('1.jpg')
im2 = Image.open('2.jpg')
im3 = Image.open('3.jpg')
im4 = Image.open('4.jpg')
im5 = Image.open('5.jpg')

def get_concat_v_multi_resize(im_list, resample=Image.BICUBIC):
    min_width = min(im.width for im in im_list)
    im_list_resize = (im.resize((min_width, int(im.height * min_width / im.width)),resample=resample)
                      for im in im_list)
    total_height = sum(im.height for im in im_list_resize)
    dst = Image.new('RGB', (min_width, total_height))
    pos_y = 0
    for im in im_list_resize:
        dst.paste(im, (0, pos_y))
        pos_y += im.height
    return dst

get_concat_v_multi_resize((im1, im2, im3, im4, im5)).save('concat.jpg')

mysql – Loading multiple PHP images – Security issue

I tried to create with PHP a load of multiple images (a description is associated with the images), but I am almost sure that my code is open to SQL injection.
Here is my code:

query("INSERT INTO uploads (timestamp, file, description) VALUES ('$timestamp', '$finalFileName', '$description')");
                    $id = $mysqli->insert_id;
                    $success_images( $id ) = $finalFileName;
                    $uploadOk = 1;

                } else {

                    echo "Error";
                    $uploadOk = 0
                    break; // break loop !!!
                }
            } else {

                echo "Just JPG and PNG files are allowed";
                $uploadOk = 0;
                break;
            }
        }
    }

    if ($uploadOk === 0) {

        foreach ($success_images as $id => $filename) {
            // `id` - primary key of table uploads?
            $mysqli->query(sprintf('DELETE FROM uploads WHERE `id`=%d', $id));
            if (file_exists($filename)) {
                unlink($filename);
            }
        }
    }
}
if ($uploadOk === 1) {
    echo "Upload successful";
}

?>

Would anyone find the risks I am exposed to and how to fix them? Thank you!

Why do images appear small in the upper left corner in Photoshop CS6 in Windows 10?

photoshop cs6 when I open an image appears in the upper left corner and is very small. I can't understand how to center it. Nothing I try to move the image from its place. Can anybody help me please?this is exactly what i see

simple unit animation with images

Thanks for contributing a response to Game Development Stack Exchange!

  • Please make sure answer the question. Please provide details and share your research!

But avoid

  • Ask for help, clarification or respond to other answers.
  • Make statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

For more information, check out our tips on how to write great answers.

html: search for an online tool that displays linked images from a website

I am looking for an online tool that displays linked images from an html directory. More specifically, I have numerous links to html directories that list hundreds of links to images. These directories do not show the images but link them. I need an online tool that displays all of these images in gallery format, so I quickly identify the images that are relevant to me.

I'm sure there are online tools that do that kind of thing.

Best

Magento2, how to stop generating cache for images?

I don't want Magento to cache images. How can I restrict the generation of cache for images?

Nginx rule: images don't work

My .htaccess rules;

RewriteRule ^img/((A-Za-z0-9)+)/((A-Za-z0-9)+)/(.*)  /img.php?width=$1&height=$2&src=$3&crop-to-fit (NC)

My Nginx rules;

rewrite ^/img/((A-Za-z0-9)+)/((A-Za-z0-9)+)/(.*)$ /img.php?width=$1&height=$2&src=$3&crop-to-fit;

Nginx working URL:
example.com/200/200/IMAGESFOLDER/test.jpg

Nginx does not work URL:
example.com/200/200/IMAGESFOLDER/en/test.jpg

photo editing: is there software to automatically crop a scan of multiple images? (Windows 10)

I want to digitize my old family albums with a scanner and there are a lot of photos in them. So far, my method has been to scan 4 images at once and then manually crop them in a simple editor like paint 3d. This has taken a long time as each scan is followed by 4 trim operations. Scanning each image individually is possibly even more time consuming.
I'm on Windows 10. Is there any software (other than Photoshop) or a simple plugin that does this job?