Python Script 3: Validate, format and Beautify JSON string Using Python

As per official JSON website, JSON is a light-weight data interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition – December 1999.

In this small article we will see how to validate and format the JSON string using python.

Format JSON string:
output:
Continue reading “Python Script 3: Validate, format and Beautify JSON string Using Python”

Python Script 10: Collecting one million website links

I needed a collection of different website links to experiment with Docker cluster. So I created this small script to collect one million website URLs.

Code is available on Github too.

Running script:

Either create a new virtual environment using python3 or use existing one in your system.

Install the dependencies.

Activate the virtual environment and run the code.

Code:

 

We are scraping links from site http://www.websitelists.in/. If you inspect the webpage, you can see anchor  tag inside td  tag with class web_width . We will convert the page response into BeautifulSoup object and get all such elements and extract the HREF  value of them.

one million site urls

 

Although there is natural delay of more than 1 second between consecutive requests which is pretty slow but is good for server. I still introduced one second delay to avoid 429 HTTP status.

Scraped links will be dumped in text file in same directory.

 

Hosting Django App for free on PythonAnyWhere Server.

Featured Image Source : http://ehacking.net/

Python Script 9: Getting System Information in Linux using python script

Finding system information in Ubuntu like Number and type of processors, memory usage, uptime etc are extremely easy. You can use Linux system commands like  free -m  ,  uname -a  and  uptime  to find these details.

But there is no fun in doing that. If you love coding in python, you want to do everything in python. So we will see how to find this information using python program. And in the process will learn something about Linux system in addition to python.

To find few details we will use python module, platform. We will be running this script using python3 interpreter and this script is tested on Ubuntu 16.04.

General Info:

So platform module is used to Access the underlying platform’s identifying data. We will be using some of the method available in this module.

To get Architecture, call architecture method. It return a tuple (bits, linkage).  platform.architecture() .

To get the Linux distribution, call  dist()  or  linux_distribution()  method. It also returns a tuple.

Now to get other information we need to go into  /proc/  directory of your system. If you look at files you will get an idea where system stores this type of information.

Processor Info:

Processor information is stored in cpuinfo file. Read the file and count the number and model name of processor.

Memory Usage:

Memory details are stored in  /proc/meminfo  file. First line is the Total memory in system and second line is the free memory available at the moment.

Uptime:

How long your system has been up.

Average Load:

Code:

Complete code is available on Github.

Run the script :  python3 system_information.py

Output:

Linux system information using python script

Host your Django App for free on PythonAnyWhere Server.

Python Script 7: Scraping tweets using BeautifulSoup

Twitter is one of the most popular social networking services used by most prominent people of world. Tweets can be used to perform sentimental analysis.

In this article we will see how to scrape tweets using BeautifulSoup. We are not using Twitter API as most of the APIs have rate limits.

Continue reading “Python Script 7: Scraping tweets using BeautifulSoup”

Python Script 8: Validating Credit Card Number – Luhn’s Algorithm

The Luhn algorithm , also known as the “modulus 10” algorithm, is a checksum formula used to validate a variety of identification numbers, such as credit card numbers, IMEI numbers, National Provider Identifier numbers in the United States, Canadian Social Insurance Numbers and Israel ID Numbers.

Algorithm:

The formula verifies a number against its included check digit, which is usually appended to a partial account number to generate the full account number.

Generating check digit:

  • Lets assume you have a number given below:
    3 – 7 – 5 – 6 – 2 – 1 – 9 – 8 – 6 – 7 – X
    X is the check digit.
  • Now starting from the right most digit i.e. check digit, double the every second digit.
    New number will be:
    3 – 14 – 5 – 12 – 2 – 2 – 9 – 16 – 6 – 14 – X
  • Now if double of a digit is more then 9, add the digits.
    So the number will become:
    3 – 5 – 5 – 3 – 2 – 2 – 9 – 7 – 6 – 5 – X
  • Now add all digits.
    47 + X
  • Multiply the non-check part by 9.
    47 * 9 = 423
  • Unit digit in the multiplication result is the check digit. X = 3
  • Valid number would be 37562198673.
Validating the generated number:

You can use tools available online to validate that the number generated is valid as per Luhn’s algorithm or not.

You can validate the number by visiting this site.

Python Script to validate credit card number:
 

Code is available on github.

 

Host your Django application for free.

 

Scraping Python books data from Amazon using Scrapy Framework

We learned how we can scrape twitter data using BeautifulSoup. But BeautifulSoup is slow and we need to take care of multiple things.

Here we will see how to scrape data from websites using scrapy.

I tried scraping Python books details from Amazon.com using scrapy and I found it extremely fast and easy. We will see how to start working with scrapy, create a scraper, scrape data and save data to Database.

Scraper code is available on Github. I dumped the data in MySQL database and developed a mini Django app over it which is available here.

Continue reading “Scraping Python books data from Amazon using Scrapy Framework”

Python Script 7: Scraping tweets using BeautifulSoup

Twitter is one of the most popular social networking services used by most prominent people of world. Tweets can be used to perform sentimental analysis.

In this article we will see how to scrape tweets using BeautifulSoup. We are not using Twitter API as most of the APIs have rate limits.

Continue reading “Python Script 7: Scraping tweets using BeautifulSoup”

Python Script 6: Wishing Merry Christmas using Python Turtle

Merry Christmas everyone.

Since this is Christmas today, I thought of wising everyone in a different way. I am python programmer and I love writing code so I decided to do something with python and after 1 hour I was ready with the below script to wish all of you Merry Christmas using python turtle.

Code is available on Github as well.

Code :
 

Output Video:

Happy learning.

 

Reference:
[1] https://coolpythoncodes.com/python-turtle/
[2] https://docs.python.org/3.6/library/turtle.html

How to backup database periodically on PythonAnyWhere server

You can host your Django app effortlessly on PythonAnyWhere server. If you are using the database in your app then it is strongly recommended to take backup of database to avoid loss of data.

This PythonAnyWhere article explain the process to take sql dump. We will extend the same article to take database backup periodically and delete the old files.

Continue reading “How to backup database periodically on PythonAnyWhere server”

Python Script 5: How to find most popular technologies on Stackoverflow

This script crawls the Stackoverflow pages to find the most popular technology by counting the number of tags on each question.

Important: Please do not send too many requests. Respect the robot.txt file.

Code is also available on Github.

You will require to install beautifulsoup  and requests  python package.

Code:
 

best python scripts

Other Scripts:

Opening top 10 Google search results in one hit.
Formatting and validating JSON.
Crawling all emails from a site.