Search our Blogs
Showing results for 
Search instead for 
Do you mean 
 

Extracting text from images using OCR on Android

This tutorial was kindly contributed to the community by Paco_Vu.

 

Have you ever thought about building an application that can read text and numbers from a restaurant receipt, a check, a note, or a business card etc.; then further process the data, calculate the numbers or save the data in a structured database?

 

That sounds like a huge task right? Well, it is indeed complicated, especially if you have to take care of the character recognition part. Fortunately, using the OCR Document API, you can easily add the feature into your app without prior knowledge about OCR (Optical Character Recognition) technology.

 

In this blog, I will walk you through the essential steps from setting up an account with HP IDOL OnDemand to the completion of an Android project that can capture the image of a receipt, and recognize the text and numbers using the OCR Document API.

 

Using OCR API from HP IDOL OnDemand is very straightforward to use and it could save you hours of coding.

 

Setup an account and get ready for using the IDOL OnDemand APIs

 

To use any API from HP IDOL OnDemand, you will need to sign up and get an API Key. If you have not done so, click here to sign up to IDOL OnDemand.

 

Once you have the API Key, copy it and add it to your Android class. You will need it to authenticate your API request later:

 

private final String apikey = "34a54d30-ddaa-4294-8e45-ebe07eeXXXXX";

 

Get the content (image) ready for OCR document service

 

Now let’s spend a few minutes to go thru some basic steps to access the picture gallery or to take picture from the camera on an Android phone. If you already knew this process, you can skip this part and go to the “Calling OCR document API from Android” section.

 

To access the picture gallery, implement or simply copy and paste the code below into your Activity class:

 

// we need a few parameters for this
private static final int SELECT_PICTURE = 1;
private String mImageFullPathAndName = "";

// function called from a load image button click event
public void DoShowImagePicker (View v) {
Intent intent = new Intent (Intent.ACTION_PICK, android.provider.MediaStore.Images.Media.EXTERNAL_CONTENT_URI);
startActivityForResult (intent, SELECT_PICTURE);
}

@Override
protected void onActivityResult (int requestCode, int resultCode, Intent data) {
super.onActivityResult (requestCode, resultCode, data);
if (requestCode == SELECT_PICTURE && resultCode == RESULT_OK &&
null != data) {
Uri selectedImage = data.getData();
String[] filePathColumn = {MediaStore.Images.Media.DATA};
Cursor cursor = getContentResolver().query(selectedImage,
                    filePathColumn, null, null, null);
           cursor.moveToFirst();
int columnIndex = cursor.getColumnIndex(filePathColumn[0]);
// get the selected image full path and name
           mImageFullPathAndName = cursor.getString(columnIndex);
           cursor.close();
        }
    }

 

After a user picked an image from the gallery, the image full path and name is kept in the mImageFullPathAndName variable. If you would like to display the image on the screen of your app, just add an ImageView to the activity and load the image from file and display it. Otherwise, that is all you need to get ready for the OCR API request.

 

To use the camera for capturing image, the most convenient approach is to launch the camera intent and let the user manually take a snapshot and accept the captured image. However, if you wish to provide a better user experience, you can use the Camera API to access the camera settings and get full control of the camera to make your app automatically capture a new image. In this blog we will choose the camera intent approach to keep the code as simple as possible.

 

Implement or simply copy/paste the code below to your project:

 

private static final int TAKE_PICTURE = 2;

// function called from a launch camera button click event
public void DoTakePhoto(View view) {
Intent intent = new Intent("android.media.action.IMAGE_CAPTURE");
startActivityForResult(intent, TAKE_PICTURE);
    }

 

Then modify the function onActivityResult as follows:

 

@Override
protected void onActivityResult (int requestCode, int resultCode, Intent data) {
super.onActivityResult (requestCode, resultCode, data);
if (requestCode == SELECT_PICTURE || 
requestCode == SELECT_PICTURE) {
	if (resultCode == RESULT_OK && null != data) {
Uri selectedImage = data.getData();
String[] filePathColumn = {MediaStore.Images.Media.DATA};
Cursor cursor = getContentResolver().query(selectedImage,
                    filePathColumn, null, null, null);
           cursor.moveToFirst();
int columnIndex = cursor.getColumnIndex(filePathColumn[0]);
// get the selected image full path and name
           mImageFullPathAndName = cursor.getString(columnIndex);
           cursor.close();
}
        }
    }

 

After a user took a snapshot and accepted the captured image from the camera, the image full path and name is kept in the mImageFullPathAndName variable. If you would like to display the image on the screen of your app, just add an ImageView to the activity and load the image from file and display it. Otherwise, that is all you need to get ready for the OCR API request.

 

Optimize input images for OCR document API

 

The optical character recognition accuracy depends very much on the quality of input images. For the best possible result, you can implement some basic quality-checking features, and provide end-user tips and options for achieving best image quality. You should also include advanced recommendations in the user's manual of your app.

 

Here are some best practices and recommendations if you use the Camera API to take the image:

 

  1. Camera Focus mode: set it to “macro” or “auto” mode if the camera supports it. Avoid the fixed-focus mode as it usually produces unsuitable image for OCR.
  2. Flash OFF. Using flash to take picture of an object in a close distance will create the glare on the image. That will blur the text around the light reflection spot.
  3. Allow the user to control the ISO and aperture in low light condition.
  4. Avoid digital zoom.
  5. Select a reasonable camera resolution. 3MP-5MP images are usually good enough.
  6. Where possible, use the phone’s accelerometer feature to enhance image stabilization.

 

Because the OCR Document API can only scan images with text in upward orientation, you should handle this programmatically if you use the Camera API. Or you can display the image and allow the user to rotate it if the orientation is incorrect.


Correct Orientation.png

 

 Incorrect Orientation.png

 

It is also a good practice to rescale the image to optimize for bandwidth. For a small note, a restaurant receipt or a business card etc., we don’t really need a 5MP image for this purpose.

 

Below are the functions to rescale and rotate an image on Android. Check out the demo project for more example codes.

 

// Resize an image
public Bitmap rescaleBitmap(Bitmap bm, int newWidth, int newHeight) {
    int w = bm.getWidth();
    int h = bm.getHeight();
    float scaleWidth = ((float) newWidth) / w;
    float scaleHeight = ((float) newHeight) / h;
    Matrix matrix = new Matrix();
    matrix.postScale(scaleWidth, scaleHeight);
    Bitmap resizedBitmap = Bitmap.createBitmap(bm, 0, 0, w, h, matrix, false);
    return resizedBitmap;
}

// Rotate an image
private Bitmap rotateBitmap(Bitmap pic, int deg) {
        // Create two matrices that will be used to rotate the bitmap
        Matrix rotate90DegAntiClock = new Matrix();
        rotate90DegAntiClock.preRotate(deg);
        Bitmap newPic = Bitmap.createBitmap(pic, 0, 0, pic.getWidth(), pic.getHeight(), rotate90DegAntiClock, true);
        return newPic;
    }

 

Calling OCR document API from Android

 

Now you have the full path and name of the image, let’s move on to implement the main feature where we call the OCR document API. This API is identified by an application named “ocrdocument”

 

The API requires a HTTP POST request to the server with the full URL consists of 4 parts:

 

  1. The domain: https://api.idolondemand.com/1/api
  2. The mode: /async or /sync
  3. The application: /ocrdocument
  4. The API version: /v1

 

The full URL would look like this

 

"https://api.idolondemand.com/1/api/async/ocrdocument/v1"

 

The parameters to be passed to the body of the POST request are explained as follows:

 

  • The apikey: your API Key
  • The file: A file containing the image to process.
  • The mode (optional): scene_photo | document_photo | document_scan | subtitle

 

Let’s create a POST request with these parameters in our Android code

 

String idol_ocr_service = "https://api.idolondemand.com/1/api/async/ocrdocument/v1";
URI uri = new URI(idol_ocr_service);
HttpPost httpPost = new HttpPost();
httpPost.setURI(uri);
MultipartEntityBuilder reqEntity = MultipartEntityBuilder.create();                reqEntity.setMode(HttpMultipartMode.BROWSER_COMPATIBLE);
reqEntity.addPart("apikey", new StringBody(apikey, ContentType.TEXT_PLAIN));
reqEntity.addBinaryBody("file", new File(mImageFullPathAndName);
reqEntity.addPart("mode", new StringBody("document_photo", ContentType.TEXT_PLAIN));
httpPost.setEntity(reqEntity.build());
HTTPClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(httpPost);

 

After sending the async POST request above, we expect to receive a JSON-formatted text response from the server. The response contains an identifier named “jobID”. We will use the job ID value later in a GET request to fetch the actual result.

 

Below is a typical response with a job ID from server:

 

JobID.png

 

We just need to parse the JSON string to get the job ID value then use it to fetch the actual result:

 

String response = {"jobID":"usw3p_89f2563e-8742-4def-ae25-9162a6a35c9a"}
JSONObject mainObject = new JSONObject(response);
String jobID = mainObject.getString("jobID");

 

The URL for fetching the actual result identified by a job ID is defined as follows:

 

https://api.idolondemand.com/1/job/result/[jobID]

 

Let’s create a GET request with a job ID in our Android code:

 

String idol_job_result = "https://api.idolondemand.com/1/job/result/";
String url = idol_job_result + jobID + "?";
url += "apikey=" +  apikey;
URI uri = new URI(url);
HttpGet httpGet = new HttpGet();
httpGet.setURI(uri);
HTTPClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(httpGet);

 

If the GET request was successful, the response from the server is a JSON-formatted text string. Here is a typical response:

 

response.png

 

We will use Android JSONObject again to parse the response and extract the text we want to display.

 

String foundText = "";
JSONObject mainObject = new JSONObject(response);
JSONArray textBlockArray = mainObject.getJSONArray("actions");
if (textBlockArray.length() > 0) {
for (int i = 0; i < textBlockArray.length(); i++) {
JSONObject actions = textBlockArray.getJSONObject(i);
           JSONObject result = actions.getJSONObject("result");
           JSONArray textArray = result.getJSONArray("text_block");
           int count = textArray.length();
           if (count > 0) {
           	for (int n = 0; n < count; n++) {
JSONObject texts = textArray.getJSONObject(n);
foundText += texts.getString("text");
}
            }
        }
    }

 

That is all you need to implement for the OCR part in your app. The value of the foundText variable is the text recognized from the image. You can download the demo project to see the complete code or click here if you want to learn more about the OCR Document API.

Social Media
About the Author
Topics
† The opinions expressed above are the personal opinions of the authors, not of HPE. By using this site, you accept the Terms of Use and Rules of Participation