Python Tip 1: Accessing localhost Django webserver over the Internet

We use Django development webserver to test our application on localhost.

We can start the development webserver with below command:

Now one problem we face with this approach is that we can access/browse the project over the browser only in the local network to which our system is connected.

What if you want to show your project to someone at another location, in another city over the Internet?

How would you generate a public URL for your localhost webserver?

 

For such situations, there is one software available. ngrok.

How to use it:

– Download the ngrok from its download page .

ngrok download

–  Follow the documentation to unzip the downloaded content and to add the authentication token.

– Go to your project directory and activate your virtual environment.

–  Now start your Django development server at port lets say 8080.

– You can access your project on localhost:8080. We will generate a public URL for this localhost URL by creating a tunnel using ngrok.

– Go to the directory where you unzipped ngrok and run below command.

– You will see something like this:

ngrok tunnel

– Now you can access your project over the public address http://9f134cf4.ngrok.io  .

–  You can access the web interface on http://127.0.0.1:4040 to see which URL got how many hits.

Public URL for localhost Django project

 

You can host your Django App for free on PythonAnyWhere server.

5 common mistakes made by beginner python programmers

During the initial days as python programmer, everyone of us face some or other type of weird bug in our code which, after spending multiple painful hours on Stackoverflow, turns out to be not a bug but python feature. That’s how things works in python. So below are the 5 most common mistakes most of the beginner python programmers make. Lets know a bit about them so that we can save few hours of asking questions on Facebook pages and groups.

 

1. Creating a copy of dictionary or lists.

Whenever you need to make a copy of a dictionary or list, do not simply use the assignment operator.

wrong way:

Now if you edit/update the dict_b , dict_a  will also be updated because by using assignment operator, you are trying to say that dict_b  will point to the same object to which dict_a  is pointing.

Right way:

Use the copy()  or deepcopy()  method.

See the difference between copy and deepcopy.

2. Dictionary keys.

Lets say we put the below values in a dictionary.

If we try to print the dictionary, what will be the output. Let’s see.

what just happened? where is the key  True .

Remember Boolean class is the subclass of Integer. Integer equivalent of True is 1 and that of False is 0. Hence value of key 1 is overwritten.

 

3. Updating lists or dictionaries.

Lets say you want to append an item to the list.

Try to update a dictionary.

Ok, lets try to sort a list.

Why nothing is being printed? What are we doing wrong?

Most the sequence object methods like sort, update, append, add etc works in place to increase performance by avoiding to create a separate copy un-necessarily.

Do not try to assign the output of such methods to variable.

Right way:

 

4.  Interned Strings.

In some cases, python try to reuse the existing immutable objects. String interning is one such case.

Here we tried to create two different string objects. But when checked if both the objects are same, it returned True. This is because python didn’t created another object b  but pointed the b to the first value “gmail”.

All strings of length 1 are interned. String having anything except ASCII characters, digits and underscore in them are not interned. Let’s see.

Also remember ==  is different than is  operator. ==  checks if values are equal or not while is  checks if both variable are pointing to same object.

So keep the above point in mind while using immutable strings or ==  or is  operator.

 

5. Default arguments are evaluated only once.

Consider below example.

What do you think will be the output of above two print statements.

Lets try to run it.

Why the output is [1,2]  in second case. Shouldn’t it be just [2] .

So the catch here is, default arguments of a function are evaluated on once. On first call i.e  func(1) , list lst  is evaluated and is found empty hence 1 is appended to it. But on second call, list is already having one element hence output is [1,2]

 

Bonus: Don’t mix spaces and tabs. Just don’t. You will cry.

Please comment if you find something is not correct.

Few useful python scripts.

 

How to upload and process the Excel file in Django

In this article we will discuss how to upload an Excel file and then process the content without storing file on server. One approach could be uploading the file, storing it in upload directory and then reading the file. Another approach could be uploading file and reading it directly from post data without storing it in memory and displaying the data.

We will work with the later approach here.

You may create a new project or work on existing code.

If you are setting up a new project then create a new virtual environment and install Django 2.0 and openpyxl modules in virtual environment using pip.

 

Continue reading “How to upload and process the Excel file in Django”

Creating sitemap of Dynamic URLs in your Django Application

A site map is a list of a website’s content designed to help both users and search engines navigate the site. A site map can be a hierarchical list of pages, an organization chart, or an XML document that provides instructions to search engine crawl bots.

Why sitemaps is required:

XML Sitemaps are important for SEO because they make it easier for Google to find your site’s pages—this is important because Google ranks web PAGES not just websites. There is no downside of having an XML Sitemap and having one can improve your SEO, so we highly recommend them.

Example:

Sitemap for this blog can be found at http://thepythondjango.com/sitemap_index.xml .

example sitemap
Example sitemap

Steps to add Sitemaps to your Django Application:

Create a file sitemap.py  in your app.

Create two different classes in sitemap.py file, one for static pages and another for Dynamic URLs.

Lets assume your website sell some product where product details are stored in database. Once a new product is added to database, you want that product page to be searchable by search engines. We need to add all such product pages/urls to sitemaps.

Static Sitemap:

Define a class StaticSitemap  in your sitemap.py  file. Define the mandatory function  items  in it which will return the list of objects. These objects will be passed to location method which will create URL from these objects.

Here in items function, we are returning appname:url_name  which will be used by location method to convert into absolute URL. Refer you app’s urls.py file for url names.

Dynamic Sitemap:

Similarly we will create Dynamic sitemap by fetching values from DB.

Here we are getting all products from database and generating URLs like  http:example.com/product/12 .

Adding sitemaps in URLconf:

Now add these sitemap class in URLconf. Edit the project’s urls.py  file and add below code in it.

 

Now reload your server and go to localhost:8000/sitemap.xml  and you will be able to see your sitemap there.

 

Reference : https://docs.djangoproject.com/en/2.0/ref/contrib/sitemaps/

 

Host your Django App for Free.

Adding Robots.txt file to Django Application

Robots.txt is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.

Why robots.txt is important:

Before a search engine crawls your site, it will look at your robots.txt file as instructions on where they are allowed to crawl/visit and index on the search engine results. If you want search engines to ignore any  pages on your website, you mention it in your robots.txt file.

Basic Format:
Example:
Steps to add robots.txt in Your Django Project:

Lets say your project’s name is myproject.

Create a directory templates in root location of your project. Create another directory with the same name as your project inside templates directory.

Place a text file robots.txt in it.

Your project structure should look something like this.

Add user-agent and disallow URL in it.

 

Now go to your project’s urls.py file and add below import statement

Add below URL pattern.

Now restart the server and go to localhost:8000/robots.txt in your browser and you will be able to see the robots.txt file.

Serving robots.txt from web server:

You can serve robots.txt directly from your web server. Below is the sample configuration for apache.

Quick Tips:
  1. robots.txt is case sensitive. The file must be named robots.txt, not Robots.txt or robots.TXT.
  2. robots.txt file must be placed in a website’s top-level directory.
  3. Make sure you’re not blocking any content or sections of your website you want crawled as this will not be good for SEO.

 

Host your Django App for Free.

Python Script 3: Validate, format and Beautify JSON string Using Python

As per official JSON website, JSON is a light-weight data interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition – December 1999.

In this small article we will see how to validate and format the JSON string using python.

Format JSON string:
output:
Continue reading “Python Script 3: Validate, format and Beautify JSON string Using Python”

Displaying custom 404 error (page not found) page in Django 2.0

It happens very frequently that a visitor on your website typed a wrong URL or the page user is looking for no longer exists. What do you do to handle such cases?

You have three options.

  • Redirect the visitor to home page, silently.
  • Show a boring 404 error page and then ask them to click on a link to go to home page.
  • Create your own funny/awesome/informative custom 404 error page.

In this article we will discuss the third option i.e. How to show your own error page in Django 2.0 project when a URL is not found.

Code is available on Github.

Featured Image source:  https://www.pinterest.com/pin/101612535313085400/

Continue reading “Displaying custom 404 error (page not found) page in Django 2.0”

Comparing celery-rabbitmq docker cluster, multi-threading and scrapy framework for 1000 requests

I recently tried scraping the tweets quickly using Celery RabbitMQ Docker cluster. Since I was hitting same servers I was using rotating proxies via Tor network. Turned out it is not very fast and using rotating proxy via Tor is not a nice thing to do.

I was able to scrape approx 10000 tweets in 60 seconds i.e. 166 tweets per second. Not an impressive number. (But I was able to make Celery, RabbitMQ, rotating proxy via Tor network and Postgres, work in docker cluster.)

Above approach was not very fast, hence I tried to compare below three approaches to send multiple request and parse the response.
– Celery-RabbitMQ docker cluster
– Multi-Threading
– Scrapy framework

I planned to send requests to 1 million websites, but once I started, I figured out that it will take one whole day to finish this hence I settled for 1000 URLs.

Continue reading “Comparing celery-rabbitmq docker cluster, multi-threading and scrapy framework for 1000 requests”

Python Script 10: Collecting one million website links

I needed a collection of different website links to experiment with Docker cluster. So I created this small script to collect one million website URLs.

Code is available on Github too.

Running script:

Either create a new virtual environment using python3 or use existing one in your system.

Install the dependencies.

Activate the virtual environment and run the code.

Code:

 

We are scraping links from site http://www.websitelists.in/. If you inspect the webpage, you can see anchor  tag inside td  tag with class web_width . We will convert the page response into BeautifulSoup object and get all such elements and extract the HREF  value of them.

one million site urls

 

Although there is natural delay of more than 1 second between consecutive requests which is pretty slow but is good for server. I still introduced one second delay to avoid 429 HTTP status.

Scraped links will be dumped in text file in same directory.

 

Hosting Django App for free on PythonAnyWhere Server.

Featured Image Source : http://ehacking.net/

Python Script 9: Getting System Information in Linux using python script

Finding system information in Ubuntu like Number and type of processors, memory usage, uptime etc are extremely easy. You can use Linux system commands like  free -m  ,  uname -a  and  uptime  to find these details.

But there is no fun in doing that. If you love coding in python, you want to do everything in python. So we will see how to find this information using python program. And in the process will learn something about Linux system in addition to python.

To find few details we will use python module, platform. We will be running this script using python3 interpreter and this script is tested on Ubuntu 16.04.

General Info:

So platform module is used to Access the underlying platform’s identifying data. We will be using some of the method available in this module.

To get Architecture, call architecture method. It return a tuple (bits, linkage).  platform.architecture() .

To get the Linux distribution, call  dist()  or  linux_distribution()  method. It also returns a tuple.

Now to get other information we need to go into  /proc/  directory of your system. If you look at files you will get an idea where system stores this type of information.

Processor Info:

Processor information is stored in cpuinfo file. Read the file and count the number and model name of processor.

Memory Usage:

Memory details are stored in  /proc/meminfo  file. First line is the Total memory in system and second line is the free memory available at the moment.

Uptime:

How long your system has been up.

Average Load:

Code:

Complete code is available on Github.

Run the script :  python3 system_information.py

Output:

Linux system information using python script

Host your Django App for free on PythonAnyWhere Server.