Scraping Python books data from Amazon using Scrapy Framework

We learned how we can scrap twitter data using BeautifulSoup. But BeautifulSoup is slow and we need to take care of multiple things.

Here we will see how to scrap data from websites using scrapy.

I tried scraping Python books details from Amazon.com using scrapy and I found it extremely fast and easy. We will see how to start working with scrapy, create a scraper, scrap data and save data to Database.

Scraper code is available on Github. I dumped the data in MySQL database and developed a mini Django app over it which is available here.

Continue reading “Scraping Python books data from Amazon using Scrapy Framework”

Python Script 7: Scraping tweets using BeautifulSoup

Twitter is one of the most popular social networking services used by most prominent people of world. Tweets can be used to perform sentimental analysis.

In this article we will see how to scrap tweets using BeautifulSoup. We are not using Twitter API as most of the APIs have rate limits.

Continue reading “Python Script 7: Scraping tweets using BeautifulSoup”

py_instagram_dl – The Python Package to Download All pictures of an Instagram User

I created a small script to download all pictures of an Instagram user without using APIs as APIs poses few limitations like rate limit.

After few rounds of tweaking, optimisation and beautifying code, I though of creating a python package out of it. If you want to know how to create a distributable python package, this article will be extremely helpful as steps are discussed in great detail.

You can find the  py_instagram_dl  package listed on pypi.
link is –  https://pypi.python.org/pypi/py-instagram-dl.

How to download all pictures of an Instagram user:
  • Create a virtual environment. Optional but strongly recommended. You may follow this simple and step by step pocket guide on Python Virtual Environment.
  • Install dependencies. This package instead few other python packages to work.
  • Now install this package.
  • Use the installed package in your code.
    Parameter Options:
Download  method have one mandatory and two optional parameters as of now.

Mandatory Parameter:
Parameter 1: Valid username of Instagram user.

Optional Parameter:
verbose
: default value – True (boolean) : Decides whether information should be printed on screen. Recommended to have it set to True so that in case of large number of downloads you can make sure script is working and is not just freezed.

wait_between_requests : default value – 0 (integer) : This is the time in seconds for which scripts waits to send new hit to download the picture to Instagram. It is recommended to pass a positive value for this parameter. If you are getting rate limit exceptions after downloading few pictures, pass 1 in this parameter, i.e. wait for 1 second between each request.

Exceptions:

InvalidUsernameException: When a non existent username is provided.
RateLimitException: When rate limit is reached. Use parameter wait_between_requests  to avoid this.

 

Source code.