Use Google Cloud Vision API to process invoices and receipts

Recently Google opened up his beta of the Cloud Vison API to all developers. Cloud Vision allows you to do very powerful image processing. You can recognize objects, landmarks, faces, detect inappropriate content, perform image sentiment analysis and extract text. We’ll focus on the later and test if the OCR capabilities Cloud Vision can be used to process scans if invoices and receipts. Although we will only look at text detection the same logic can be applied for the other detection types the service offers. Using Cloud Vision sounds very attractive as it requires minimal effort to integrate it into any application and pricing starts from $2.50 per thousand request and goes as low as $0.60 per thousand requests. Plus you get a free tier of a thousand requests so anyone can use it.

In the below example we’ll use PHP to create a minimalistic application that allows us to upload an image, process it using the API, draw bounding boxes around the recognized text and list the extracted text.

Requirements

You need to register for a Google Cloud account
Enable the Cloud Vision API
Create an API key for your project to authenticate your application
To draw on the image, we’ll use the GD library for PHP
Make sure curl is enabled

Ok, let’s get started. First we’ll go over the main components and then we’ll wrap everything up into a small sample application.

Start by configuring the settings. Enter your API key to generate the URL and set the detection type to TEXT_DETECTION to apply OCR on the image.

// settings
$api_key = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
$url = "https://vision.googleapis.com/v1/images:annotate?key=" . $api_key
$detection_type = "TEXT_DETECTION";

Next we’re going to create the JSON request. There are two ways to send an image to the Google Cloud Vision API. Either pass the URL of your image if it I stored in a Cloud Storage bucket or you can base64 encode the image and send the encoded string. We’ll use the second option here. You can always check the JSON reference in the developer docs if you wish to use a Cloud Storage bucket. In the request you define the image and the features you want to detect. In our case we only want to apply text detection.

$image_base64 = base64_encode($image);

$json_request ='{
        "requests": [
            {
              "image": {
                "content":"' . $image_base64. '"
              },
              "features": [
                  {
                    "type": "' .$detection_type. '",
                    "maxResults": 1000
                  }
              ]
            }
        ]
    }';



$curl = curl_init();

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_HTTPHEADER, array("Content-type: application/json"));
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $json_request);

$json_response = curl_exec($curl);
$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);

curl_close($curl);

// verify if we got a correct response
if ( $status != 200 ) {	
die("Something when wrong. Status code: $status" );
}

echo '<pre>';
print_r($json_response);
echo '</pre>';

The JSON response looks something like this:

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "nl",
          "description": "T AMUSEMENT\nEET- EN NIEUWSCAFE\nOUDE HAACHTSESTEENWEG 48 DIEGEM\nTEL/FAX 02/720,30,98\nWWW.DIEGEM.BE/TAMUSEMENT\nRPR 0475.199.436\n9/15 11:54\nTOOG[0003] 000000\nTf1 2\nFREKENING*\n1.80 1.80\nCHAUDF.BLAUW\nCOCA COLA LIGHT\n1.80\n1.80\n9,00 18,00\nHAMBURGER 9\n40\n21.60\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 145,
                "y": 84
              },
              {
                "x": 1461,
                "y": 84
              },
              {
                "x": 1461,
                "y": 2077
              },
              {
                "x": 145,
                "y": 2077
              }
            ]
          }
        },
        {
          "description": "T",
          "boundingPoly": {
            "vertices": [
              {
                "x": 368,
                "y": 84
              },
              {
                "x": 428,
                "y": 84
              },
              {
                "x": 428,
                "y": 152
              },
              {
                "x": 368,
                "y": 152
              }
            ]
          }
        },
        ...
        ...
        ...

The first text annotation element contains all the text recognised in one string. The following elements contain each text part that was detected. Each text annotation hold vertices that define the position of the recognised element on the document.

Below you can find the full sample application. Drawing the boxes around the recognised text is mostly eye candy but it for our demo purposes it gives visual feedback which parts of the document were detected and which parts were not.

<form enctype="multipart/form-data" action="" method="POST">
Choose an invoice to upload: <input name="uploaddocument" type="file" /><br />
<input type="submit" value="Upload Document" />
</form>

<?php

// settings
$api_key = 'AIzaSyAZsAx47zd8juZus-h_KvjTQ4mnC2l0lcA';
$url = "https://vision.googleapis.com/v1/images:annotate?key=" . $api_key;
$detection_type = "TEXT_DETECTION";

// allowed image mime types
// Cloud Vision allows more image types but this application will only support three
$allowed_types = array('image/jpeg','image/png','image/gif');

if($_FILES){

	// check if uploaded image has an allowed mime type
	if(in_array($_FILES['uploaddocument']['type'],$allowed_types)){	    	

			// base64 encode image
	    	$image = file_get_contents($_FILES['uploaddocument']['tmp_name']);
			$image_base64 = base64_encode($image);

			$json_request ='{
				  	"requests": [
						{
						  "image": {
						    "content":"' . $image_base64. '"
						  },
						  "features": [
						      {
						      	"type": "' .$detection_type. '",
								"maxResults": 200
						      }
						  ]
						}
					]
				}';

			$curl = curl_init();

			curl_setopt($curl, CURLOPT_URL, $url);
			curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
			curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
			curl_setopt($curl, CURLOPT_HTTPHEADER, array("Content-type: application/json"));
			curl_setopt($curl, CURLOPT_POST, true);
			curl_setopt($curl, CURLOPT_POSTFIELDS, $json_request);

			$json_response = curl_exec($curl);
			$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);

			curl_close($curl);

			// verify if we got a correct response
			if ( $status != 200 ) {
   				 die("Something when wrong. Status code: $status" );
				}

			//echo '<pre>';
			//print_r($json_response);
			//echo '</pre>';

			// create an image identifier for the uploaded file
			switch($_FILES['uploaddocument']['type']){
				case 'image/jpeg':
					$im = imagecreatefromjpeg($_FILES['uploaddocument']['tmp_name']);
					break;
				case 'image/png':
					$im = imagecreatefrompng($_FILES['uploaddocument']['tmp_name']);
					break;
				case 'image/gif':
					$im = imagecreatefromgif($_FILES['uploaddocument']['tmp_name']);
					break;
				}

			$red = imagecolorallocate($im, 255, 0, 42);

			// transform the json response to an associative array
			$response = json_decode($json_response, true);

			// for each of the detected text fragments we'll draw a box
			// the cloud API returns veticis for each fragment
			foreach($response['responses'][0]['textAnnotations'] as $box){
	
				$points = array();

				foreach($box['boundingPoly']['vertices'] as $vertex){
					array_push($points, $vertex['x'], $vertex['y']);
					}
				
				imagepolygon($im, $points, count($box['boundingPoly']['vertices']), $red);

				}

			// give our image a name and store it
			$image_name = time().'.jpg';
			imagejpeg($im, $image_name);
			imagedestroy($im);

			// output the results
			echo'<div style="width:20%; float:left;"><img src="'.$image_name.'" style="width:100%;"/></div>';

			echo'<div style="width:50%; float:left; padding:20px;">';
					// display the first text annotation
					echo'<pre>';
					print_r($response['responses'][0]['textAnnotations'][0]['description']);
					echo'</pre>';
			echo'</div>';

	    	}
	    else{
	    	echo 'File type not allowed';
	    	}

	}
?>

Example output

ocr text recognistion on expense receipt

As with any OCR application the results largely depend on the quality of the scan. Making sure that the edges are straight and that the image has a good contrast with little noise has a big impact on the end result.

We also did some tests with poor quality scan ad still got some reasonable results in terms of the percentage of text detected and the accuracy but it will be harder to process the result as Google Vision API does not correct for text lines that are skew and will mix text that belong to different lines.

ocr text recognistion on expense receipt

Detecting text in the image is only the first part of an invoice and receipt processing application. In the next posts we will address:

Performing entity detection (addresses, VAT numbers, phone numbers, etc.) by cross referencing the extracted text against databases of know entities both using exact and fuzzy matching.
Using layout analysis define the nature of the detected text fragments (invoice total, VAT total, applicable VAT rates, etc.), combine blocks that actually belong together, and detect the invoice/receipt type (supplier specific layout)
Making our application become smarter over time using AI