Encryption-Decryption in Python Django

Sometimes we need to encrypt critical information in out Django App. For example client might ask you to store the user information in encrypted format for extra security. Or you might be required to pass some data in URL in encrypted format.

Here we will see how can we encrypt and decrypt the information in Django.

Once you are done with initial setup of project and added the first app, create a new directory or add a new python package with the name utility  in your app.

Create _init__.py  file in the utility directory.

Add a new file, name it encryption_util.py  in utility directory.

Encryption:

Add a new function to encrypt the provided content. Use below code. Each line is explained by the accompanying comment.

Please note that:
– ENCRYPT_KEY should be kept safe. Keep it in settings_production.py  file and do not commit it to git.
– We are also converting the encoded string to urlsafe base64 format because we might required to pass the encoded data to URL.
– If there is any error, log it and return null.

Decryption:

To decrypt any encrypted text, we will just reverse the process.

Dependencies:

You need to install below python module.

– cryptography

Key:

You need to generate the ENCRYPT_KEY using below process.

– Open terminal in your virtual environment where cryptography python module is installed.
– Import Fernet.

– Generate key.

– Keep the key in settings file.

 

encryption decryption in django

Usage:

In the utility package we created in first step, we created the __init__.py  file. Add the below statement in this file.

Now use the encryption and decryption methods in your views.

Complete code:

Complete code for encryption_util.py  is below.

 

Host your Django App for Free.

How to upload and process the Excel file in Django

In this article we will discuss how to upload an Excel file and then process the content without storing file on server. One approach could be uploading the file, storing it in upload directory and then reading the file. Another approach could be uploading file and reading it directly from post data without storing it in memory and displaying the data.

We will work with the later approach here.

You may create a new project or work on existing code.

If you are setting up a new project then create a new virtual environment and install Django 2.0 and openpyxl modules in virtual environment using pip.

 

Continue reading “How to upload and process the Excel file in Django”

Creating sitemap of Dynamic URLs in your Django Application

A site map is a list of a website’s content designed to help both users and search engines navigate the site. A site map can be a hierarchical list of pages, an organization chart, or an XML document that provides instructions to search engine crawl bots.

Why sitemaps is required:

XML Sitemaps are important for SEO because they make it easier for Google to find your site’s pages—this is important because Google ranks web PAGES not just websites. There is no downside of having an XML Sitemap and having one can improve your SEO, so we highly recommend them.

Example:

Sitemap for this blog can be found at http://thepythondjango.com/sitemap_index.xml .

example sitemap
Example sitemap

Steps to add Sitemaps to your Django Application:

Create a file sitemap.py  in your app.

Create two different classes in sitemap.py file, one for static pages and another for Dynamic URLs.

Lets assume your website sell some product where product details are stored in database. Once a new product is added to database, you want that product page to be searchable by search engines. We need to add all such product pages/urls to sitemaps.

Static Sitemap:

Define a class StaticSitemap  in your sitemap.py  file. Define the mandatory function  items  in it which will return the list of objects. These objects will be passed to location method which will create URL from these objects.

Here in items function, we are returning appname:url_name  which will be used by location method to convert into absolute URL. Refer you app’s urls.py file for url names.

Dynamic Sitemap:

Similarly we will create Dynamic sitemap by fetching values from DB.

Here we are getting all products from database and generating URLs like  http:example.com/product/12 .

Adding sitemaps in URLconf:

Now add these sitemap class in URLconf. Edit the project’s urls.py  file and add below code in it.

 

Now reload your server and go to localhost:8000/sitemap.xml  and you will be able to see your sitemap there.

 

Reference : https://docs.djangoproject.com/en/2.0/ref/contrib/sitemaps/

 

Host your Django App for Free.

Adding Robots.txt file to Django Application

Robots.txt is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.

Why robots.txt is important:

Before a search engine crawls your site, it will look at your robots.txt file as instructions on where they are allowed to crawl/visit and index on the search engine results. If you want search engines to ignore any  pages on your website, you mention it in your robots.txt file.

Basic Format:
Example:
Steps to add robots.txt in Your Django Project:

Lets say your project’s name is myproject.

Create a directory templates in root location of your project. Create another directory with the same name as your project inside templates directory.

Place a text file robots.txt in it.

Your project structure should look something like this.

Add user-agent and disallow URL in it.

 

Now go to your project’s urls.py file and add below import statement

Add below URL pattern.

Now restart the server and go to localhost:8000/robots.txt in your browser and you will be able to see the robots.txt file.

Serving robots.txt from web server:

You can serve robots.txt directly from your web server. Below is the sample configuration for apache.

Quick Tips:
  1. robots.txt is case sensitive. The file must be named robots.txt, not Robots.txt or robots.TXT.
  2. robots.txt file must be placed in a website’s top-level directory.
  3. Make sure you’re not blocking any content or sections of your website you want crawled as this will not be good for SEO.

 

Host your Django App for Free.

Comparing celery-rabbitmq docker cluster, multi-threading and scrapy framework for 1000 requests

I recently tried scraping the tweets quickly using Celery RabbitMQ Docker cluster. Since I was hitting same servers I was using rotating proxies via Tor network. Turned out it is not very fast and using rotating proxy via Tor is not a nice thing to do.

I was able to scrape approx 10000 tweets in 60 seconds i.e. 166 tweets per second. Not an impressive number. (But I was able to make Celery, RabbitMQ, rotating proxy via Tor network and Postgres, work in docker cluster.)

Above approach was not very fast, hence I tried to compare below three approaches to send multiple request and parse the response.
– Celery-RabbitMQ docker cluster
– Multi-Threading
– Scrapy framework

I planned to send requests to 1 million websites, but once I started, I figured out that it will take one whole day to finish this hence I settled for 1000 URLs.

Continue reading “Comparing celery-rabbitmq docker cluster, multi-threading and scrapy framework for 1000 requests”

Using Docker instead of Virtual Environment for Django app development

Docker have all the good featured of virtual machine. It helps developer to set up an environment on development machine which is similar to production environment. Please go through official docker site if you want to know more about Docker.

In this article we will see how to develop a hello world Django project and will run it docker container instead of virtual environment.

Installing Docker:

Please follow this guide to install docker on your machine.

We are using Docker version 17.12.1-ce  for this article.

Starting docker container of application:

Continue reading “Using Docker instead of Virtual Environment for Django app development”

Scraping 10000 tweets in 60 seconds using celery, RabbitMQ and Docker cluster with rotating proxy

In previous articles we used requests and BeautifulSoup to scrape the data. Scraping data this way is slow (Using selenium is even slower). Sometimes we need data quickly. But if we try to speed up the process of scraping data using multi-threading or any other technique, we will start getting http status 429  i.e. too may requests. We might get banned from the site as well.

Purpose of this article is to scrape lots of data quickly without getting banned and we will do this by using docker cluster of celery and RabbitMQ along with Tor.

For this to achieve we will follow below steps:

  1. Install docker and docker-compose
  2. Download the boilerplate code to setup docker cluster in 5 minutes
  3. Understanding the code
  4. Experiment with docker cluster
  5. Update the code to download tweets
  6. Using Tor to avoid getting banned.
  7. Speeding up the process by increasing workers and concurrency

Note: This article is for educational purpose only. Do not send too many requests to any server. Respect the robot.txt file. Use API if possible.

Let’s start.

Continue reading “Scraping 10000 tweets in 60 seconds using celery, RabbitMQ and Docker cluster with rotating proxy”

Generating and Returning PDF as response in Django

We might need to generate a receipt or a report in PDF format in Django app. In this article, we will see how to generate a dynamic PDF from html content and return it as a response.

Create a Django project. If you are not using virtual environment, we strongly recommend to do so.

Installing Dependencies:

Once virtual environment is ready and activated, install the below dependencies.

For pdfkit  to work, we need wkhtmltopdf  installed in our Linux system.

Continue reading “Generating and Returning PDF as response in Django”

Using signals in Django to log changes in models

Sometimes we need to know who made what changes to which table. This might be required for legal audit purpose or for simple organisational level logging.

There are multiple Django apps available online which can help you log the model changes but there is no fun in doing that. We will see how to do it without using ready-made app and hence will learn something in the process.

Signals:

Signals lets a sender notify another receiver that some event have occurred and some action needs to be performed.

For example, we have some data in cache as well in DB. We read data from cache and if not found then goes to DB as fallback. Now whenever a DB is updated, we need to update the cache as well. But we might update the model from multiple views. Hence it is tough and not clean to write cache update logic in every such view. Signals comes into picture now.

Continue reading “Using signals in Django to log changes in models”

py_instagram_dl – The Python Package to Download All pictures of an Instagram User

I created a small script to download all pictures of an Instagram user without using APIs as APIs poses few limitations like rate limit.

After few rounds of tweaking, optimisation and beautifying code, I though of creating a python package out of it. If you want to know how to create a distributable python package, this article will be extremely helpful as steps are discussed in great detail.

You can find the  py_instagram_dl  package listed on pypi.
link is –  https://pypi.python.org/pypi/py-instagram-dl.

How to download all pictures of an Instagram user:
  • Create a virtual environment. Optional but strongly recommended. You may follow this simple and step by step pocket guide on Python Virtual Environment.
  • Install dependencies. This package instead few other python packages to work.
  • Now install this package.
  • Use the installed package in your code.
    Parameter Options:
Download  method have one mandatory and two optional parameters as of now.

Mandatory Parameter:
Parameter 1: Valid username of Instagram user.

Optional Parameter:
verbose
: default value – True (boolean) : Decides whether information should be printed on screen. Recommended to have it set to True so that in case of large number of downloads you can make sure script is working and is not just freezed.

wait_between_requests : default value – 0 (integer) : This is the time in seconds for which scripts waits to send new hit to download the picture to Instagram. It is recommended to pass a positive value for this parameter. If you are getting rate limit exceptions after downloading few pictures, pass 1 in this parameter, i.e. wait for 1 second between each request.

Exceptions:

InvalidUsernameException: When a non existent username is provided.
RateLimitException: When rate limit is reached. Use parameter wait_between_requests  to avoid this.

 

Source code.