We learned how we can scrap twitter data using BeautifulSoup. But BeautifulSoup is slow and we need to take care of multiple things.
Here we will see how to scrap data from websites using scrapy.
I tried scraping Python books details from Amazon.com using scrapy and I found it extremely fast and easy. We will see how to start working with scrapy, create a scraper, scrap data and save data to Database.
Scraper code is available on Github. I dumped the data in MySQL database and developed a mini Django app over it which is available here.
Continue reading “Scraping Python books data from Amazon using Scrapy Framework”
Twitter is one of the most popular social networking services used by most prominent people of world. Tweets can be used to perform sentimental analysis.
In this article we will see how to scrap tweets using BeautifulSoup. We are not using Twitter API as most of the APIs have rate limits.
Continue reading “Python Script 7: Scraping tweets using BeautifulSoup”
I created a small script to download all pictures of an Instagram user without using APIs as APIs poses few limitations like rate limit.
After few rounds of tweaking, optimisation and beautifying code, I though of creating a python package out of it. If you want to know how to create a distributable python package, this article will be extremely helpful as steps are discussed in great detail.
You can find the
py_instagram_dl package listed on pypi.
link is – https://pypi.python.org/pypi/py-instagram-dl.
How to download all pictures of an Instagram user:
- Create a virtual environment. Optional but strongly recommended. You may follow this simple and step by step pocket guide on Python Virtual Environment.
- Install dependencies. This package instead few other python packages to work.
pip install beautifulsoup4 bs4 lxml requests urllib3
- Now install this package.
pip install py_instagram_dl
- Use the installed package in your code.
import py_instagram_dl as pyigdl
# run script by providing username as command line argument
# usage : python script_name.py username
except Exception as e:
method have one mandatory and two optional parameters as of now.
Parameter 1: Valid username of Instagram user.
verbose : default value – True (boolean) : Decides whether information should be printed on screen. Recommended to have it set to True so that in case of large number of downloads you can make sure script is working and is not just freezed.
wait_between_requests : default value – 0 (integer) : This is the time in seconds for which scripts waits to send new hit to download the picture to Instagram. It is recommended to pass a positive value for this parameter. If you are getting rate limit exceptions after downloading few pictures, pass 1 in this parameter, i.e. wait for 1 second between each request.
InvalidUsernameException: When a non existent username is provided.
RateLimitException: When rate limit is reached. Use parameter
wait_between_requests to avoid this.