How to upload and process the Excel file in Django

In this article we will discuss how to upload an Excel file and then process the content without storing file on server. One approach could be uploading the file, storing it in upload directory and then reading the file. Another approach could be uploading file and reading it directly from post data without storing it in memory and displaying the data.

We will work with the later approach here.

You may create a new project or work on existing code.

If you are setting up a new project then create a new virtual environment and install Django 2.0 and openpyxl modules in virtual environment using pip.

 

Continue reading “How to upload and process the Excel file in Django”

Creating sitemap of Dynamic URLs in your Django Application

A site map is a list of a website’s content designed to help both users and search engines navigate the site. A site map can be a hierarchical list of pages, an organization chart, or an XML document that provides instructions to search engine crawl bots.

Why sitemaps is required:

XML Sitemaps are important for SEO because they make it easier for Google to find your site’s pages—this is important because Google ranks web PAGES not just websites. There is no downside of having an XML Sitemap and having one can improve your SEO, so we highly recommend them.

Example:

Sitemap for this blog can be found at http://thepythondjango.com/sitemap_index.xml .

example sitemap
Example sitemap

Steps to add Sitemaps to your Django Application:

Create a file sitemap.py  in your app.

Create two different classes in sitemap.py file, one for static pages and another for Dynamic URLs.

Lets assume your website sell some product where product details are stored in database. Once a new product is added to database, you want that product page to be searchable by search engines. We need to add all such product pages/urls to sitemaps.

Static Sitemap:

Define a class StaticSitemap  in your sitemap.py  file. Define the mandatory function  items  in it which will return the list of objects. These objects will be passed to location method which will create URL from these objects.

Here in items function, we are returning appname:url_name  which will be used by location method to convert into absolute URL. Refer you app’s urls.py file for url names.

Dynamic Sitemap:

Similarly we will create Dynamic sitemap by fetching values from DB.

Here we are getting all products from database and generating URLs like  http:example.com/product/12 .

Adding sitemaps in URLconf:

Now add these sitemap class in URLconf. Edit the project’s urls.py  file and add below code in it.

 

Now reload your server and go to localhost:8000/sitemap.xml  and you will be able to see your sitemap there.

 

Reference : https://docs.djangoproject.com/en/2.0/ref/contrib/sitemaps/

 

Host your Django App for Free.

Adding Robots.txt file to Django Application

Robots.txt is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.

Why robots.txt is important:

Before a search engine crawls your site, it will look at your robots.txt file as instructions on where they are allowed to crawl/visit and index on the search engine results. If you want search engines to ignore any  pages on your website, you mention it in your robots.txt file.

Basic Format:
Example:
Steps to add robots.txt in Your Django Project:

Lets say your project’s name is myproject.

Create a directory templates in root location of your project. Create another directory with the same name as your project inside templates directory.

Place a text file robots.txt in it.

Your project structure should look something like this.

Add user-agent and disallow URL in it.

 

Now go to your project’s urls.py file and add below import statement

Add below URL pattern.

Now restart the server and go to localhost:8000/robots.txt in your browser and you will be able to see the robots.txt file.

Serving robots.txt from web server:

You can serve robots.txt directly from your web server. Below is the sample configuration for apache.

Quick Tips:
  1. robots.txt is case sensitive. The file must be named robots.txt, not Robots.txt or robots.TXT.
  2. robots.txt file must be placed in a website’s top-level directory.
  3. Make sure you’re not blocking any content or sections of your website you want crawled as this will not be good for SEO.

 

Host your Django App for Free.

Python Script 3: Validate, format and Beautify JSON string Using Python

As per official JSON website, JSON is a light-weight data interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition – December 1999.

In this small article we will see how to validate and format the JSON string using python.

Format JSON string:
output:
Continue reading “Python Script 3: Validate, format and Beautify JSON string Using Python”

Displaying custom 404 error (page not found) page in Django 2.0

It happens very frequently that a visitor on your website typed a wrong URL or the page user is looking for no longer exists. What do you do to handle such cases?

You have three options.

  • Redirect the visitor to home page, silently.
  • Show a boring 404 error page and then ask them to click on a link to go to home page.
  • Create your own funny/awesome/informative custom 404 error page.

In this article we will discuss the third option i.e. How to show your own error page in Django 2.0 project when a URL is not found.

Code is available on Github.

Featured Image source:  https://www.pinterest.com/pin/101612535313085400/

Continue reading “Displaying custom 404 error (page not found) page in Django 2.0”

Comparing celery-rabbitmq docker cluster, multi-threading and scrapy framework for 1000 requests

I recently tried scraping the tweets quickly using Celery RabbitMQ Docker cluster. Since I was hitting same servers I was using rotating proxies via Tor network. Turned out it is not very fast and using rotating proxy via Tor is not a nice thing to do.

I was able to scrape approx 10000 tweets in 60 seconds i.e. 166 tweets per second. Not an impressive number. (But I was able to make Celery, RabbitMQ, rotating proxy via Tor network and Postgres, work in docker cluster.)

Above approach was not very fast, hence I tried to compare below three approaches to send multiple request and parse the response.
– Celery-RabbitMQ docker cluster
– Multi-Threading
– Scrapy framework

I planned to send requests to 1 million websites, but once I started, I figured out that it will take one whole day to finish this hence I settled for 1000 URLs.

Continue reading “Comparing celery-rabbitmq docker cluster, multi-threading and scrapy framework for 1000 requests”

Python Script 10: Collecting one million website links

I needed a collection of different website links to experiment with Docker cluster. So I created this small script to collect one million website URLs.

Code is available on Github too.

Running script:

Either create a new virtual environment using python3 or use existing one in your system.

Install the dependencies.

Activate the virtual environment and run the code.

Code:

 

We are scraping links from site http://www.websitelists.in/. If you inspect the webpage, you can see anchor  tag inside td  tag with class web_width . We will convert the page response into BeautifulSoup object and get all such elements and extract the HREF  value of them.

one million site urls

 

Although there is natural delay of more than 1 second between consecutive requests which is pretty slow but is good for server. I still introduced one second delay to avoid 429 HTTP status.

Scraped links will be dumped in text file in same directory.

 

Hosting Django App for free on PythonAnyWhere Server.

Featured Image Source : http://ehacking.net/

Python Script 9: Getting System Information in Linux using python script

Finding system information in Ubuntu like Number and type of processors, memory usage, uptime etc are extremely easy. You can use Linux system commands like  free -m  ,  uname -a  and  uptime  to find these details.

But there is no fun in doing that. If you love coding in python, you want to do everything in python. So we will see how to find this information using python program. And in the process will learn something about Linux system in addition to python.

To find few details we will use python module, platform. We will be running this script using python3 interpreter and this script is tested on Ubuntu 16.04.

General Info:

So platform module is used to Access the underlying platform’s identifying data. We will be using some of the method available in this module.

To get Architecture, call architecture method. It return a tuple (bits, linkage).  platform.architecture() .

To get the Linux distribution, call  dist()  or  linux_distribution()  method. It also returns a tuple.

Now to get other information we need to go into  /proc/  directory of your system. If you look at files you will get an idea where system stores this type of information.

Processor Info:

Processor information is stored in cpuinfo file. Read the file and count the number and model name of processor.

Memory Usage:

Memory details are stored in  /proc/meminfo  file. First line is the Total memory in system and second line is the free memory available at the moment.

Uptime:

How long your system has been up.

Average Load:

Code:

Complete code is available on Github.

Run the script :  python3 system_information.py

Output:

Linux system information using python script

Host your Django App for free on PythonAnyWhere Server.

Using Docker instead of Virtual Environment for Django app development

Docker have all the good featured of virtual machine. It helps developer to set up an environment on development machine which is similar to production environment. Please go through official docker site if you want to know more about Docker.

In this article we will see how to develop a hello world Django project and will run it docker container instead of virtual environment.

Installing Docker:

Please follow this guide to install docker on your machine.

We are using Docker version 17.12.1-ce  for this article.

Starting docker container of application:

Continue reading “Using Docker instead of Virtual Environment for Django app development”

Python Script 7: Scraping tweets using BeautifulSoup

Twitter is one of the most popular social networking services used by most prominent people of world. Tweets can be used to perform sentimental analysis.

In this article we will see how to scrape tweets using BeautifulSoup. We are not using Twitter API as most of the APIs have rate limits.

Continue reading “Python Script 7: Scraping tweets using BeautifulSoup”