CAPTCHA has been implemented for decades to prevent automated scripts (Bots) from jamming registration or login pages. Even though tons of tools and research have exposed its weakness with the ability to reverse the image into plain text, plenty of insecure images are still out there being used on sensitive login pages such as online banking!

Believe it or not, today we will discuss a real world example on how to crack a login page for one of the biggest leading banks in the Middle East! 

Optical Character Recognition (OCR)

In short, OCR is a technology that allows you to convert scanned images of text into plain text. This enables your script to read the text and submit it into a login form just like a human action.

OCR engine has been developed into many kinds of object oriented OCR applications, such as invoice OCR and legal billing document OCR. However here it will be used in defeating CAPTCHA anti-bot systems.

Under Linux, Tesseract is the most accurate OCR, even though it lacks graphical interface (GUI) – Only CLI is needed to accomplish our purpose. Installing Tesseract is very straight forward, under Ubuntu distribution, issue:

hkhrais@Hkhrais:~$ sudo apt-get install tesseract-ocr

Preparing images for Tesseract

Tesseract is not very flexible about the format of its input images. It will only accept TIFF images. According to user reports, compressed TIFF images are quite problematic, and the same goes for grey-scale and color images. So you’re better off with single-bit uncompressed TIFF images.

The process to prepare them with GIMP is very simple:

  1. Go to the Image→Mode menu and make sure the image is in RGB or Grayscale mode.
  2. Select from the menu Tools→Color Tools→Threshold and choose an adequate threshold value.
  3. Select from the menu Image→Mode→Indexed and from the options choose 1-bit and no dithering.
  4. Save the image in TIFF format with a .tif extension.

Note: Version 3.x includes layout analysis, and if compiled with Leptonica, supports all image formats Leptonica supports. However, to increase results efficiency, we will replicate the above steps automatically using Python script to clean the image noise, concentrate colors, and eventually submit the output image into Tesseract.

Parsing CAPTCHA image

**Disclaimer: Below are an exact samples taken from the login page of X bank without any modification**

We can see the following are common factors in each image:

-All images contain only 4 numbers [Written in English]

-Number color is black

-There are no alphabet letters

-There is no rotation for the numbers [One angle only]

-All the numbers are in a single line

-The noise ‘which is the line that crossing the numbers’ can be removed with some image processing techniques.

Almost every image editor, such as Gimp, can clean these images from noise and concentrate the numbers to be ready for OCR. With quick threshold tuning for colors concentrating in Gimp, we got the following cleaned image:

Now the above output is ready to be used in OCR to print out the numbers. Obviously this step needs to be done automatically by our script.

Automating the process

Assuming our script will navigate to X bank login page, download the CAPTCHA image to a directory ‘/home/hkhrais/Desktop/Downloaded_CAPTCHA/’. The image preparation process would be:

from PIL import Image
import os
import time

def main():

    getlist = os.listdir("/home/hkhrais/Desktop//Downloaded_CAPTCHA/")
    number = int (len(getlist))
    for cap in range(1,number+1):

The script starts with getting a list of downloaded CAPTCHA images saved in “/home/hkhrais/Desktop//Downloaded_CAPTCHA/” and passing the image name to a function called crack.

def crack(cap_name):

    img ='/home/hkhrais/Desktop/Downloaded_CAPTCHA/'+cap_name+'.JPEG')
    img = img.convert("RGB")
    pixdata = img.load()

    for y in xrange(img.size[1]):
        for x in xrange(img.size[0]):
            if pixdata[x, y][0] < 90:
                pixdata[x, y] = (0, 0, 0, 255)
    for y in xrange(img.size[1]):
        for x in xrange(img.size[0]):
            if pixdata[x, y][1] < 136:                 pixdata[x, y] = (0, 0, 0, 255)     for y in xrange(img.size[1]):         for x in xrange(img.size[0]):             if pixdata[x, y][2] > 0:
                pixdata[x, y] = (255, 255, 255, 255)
    ext = ".tif""/home/hkhrais/Desktop/Cleaned_CAPTCHA/"+cap_name + ext)

First, the crack function will load the image into img object, then convert it into RGB mode (remember the process to prepare an image for Tesseract?). The three For iterators are purely an image processing, which will make the numbers much bolder and clean the background noise to white. The cleaned .tif images are saved in /home/hkhrais/Desktop/Cleaned_CAPTCHA/. The output would be:

    command = "tesseract -psm 7 /home/hkhrais/Desktop/Cleaned_CAPTCHA/"+cap_name +".tif "+"/home/hkhrais/Desktop/text"

    Text  = open ("/home/hkhrais/Desktop/text.txt","r")
    decoded = Text.readline().strip('\n')
    if decoded.isdigit():
        print '[+}CAPTCHA number are ' + decoded
        print '[-] Error : Not able to decode'

One of the weak points mentioned earlier was all the numbers are in single line; Tesseract has a plenty of options for specifying the page/image segmentation mode. In our case here we need to specify segmentation mode number 7 which will treat the image as a single text line.

pagesegmode values are:

0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.

The command line for Tesseract is simple:

#tesseract  [-psm # ]   <input type="text" />   <output for="">

The second part of the secript is to increase the efficiency of our script. As we saw, all the images contain only numbers, so if Tesseract’s output was a special character or alphabetic letter, then definitely it’s an error! And before submitting the wrong value to the login page, we technically can discard it and submit the next image. Isdigit() function will take care of this portion.

hkhrais@Hkhrais:~$ sudo python /home/hkhrais/Desktop/
[sudo] password for hkhrais:

[+}CAPTCHA number are 5905

[-] Error : Not able to decode

[+}CAPTCHA number are 7588

[+}CAPTCHA number are 6864

[+}CAPTCHA number are 2939

[+}CAPTCHA number are 9536


-Use a complex CAPTCHA, below are good examples, salting with alphabet letters with some rotations are always good.

-Don’t count on CAPTCHA only, step-2 authentication (token) is a perfect option to add as well.


• Optical character recognition

• Tesseract-OCR

• Tesseract usage

• Python OCR