Search our Blogs
Showing results for 
Search instead for 
Do you mean 
 

Extract text and document statistics with Python

Calculating text statistics - such as the number of stop words, number of paragraphs, number of characters, percentage of punctuation, percentage of whitespace, percentage of numeric characters, percentage of alphabetic characters, and percentage of uppercase characters - for a block of text, a local file, or publicly accessible URL can be tedious. Fortunately for developers, doing this in Python is simple using Haven OnDemand’s Text Statistics API. All you need to do is install the official Python client library, POST a block of raw text, publicly accessible URL, or local file to Haven OnDemand’s Text Statistics API, and obtain the result.

 

Code

Completed code

 

First, install the official Haven OnDemand Python client library:

 

pip install havenondemand

 

Next, open up the file you will write code in and require Haven OnDemand:

 

from havenondemand.hodclient import *
client = HODClient("APIKEY", "v1")

 

Replace “APIKEY” with your API key, which can be found here after signing up.

 

Next, you’ll call the Text Statistics API by submitting either a block of raw text, publicly accessible URL, or local file:

 

# data = {'url': 'https://www.havenondemand.com/sample-content/documents/HP_License_terms_may2012.doc'} # uncomment if using publicly accessible URL
# data = {'file': 'path/to/file'} # uncomment if using local file
data = {'text': 'There are approximately 1,600 giant pandas (which is not a lot) left living in the wild.'} # uncomment if using raw text
response = client.post_request(data, 'gettextstatistics', async=False)
print (response)

 

When you run the file, it will output the response of the API with the sentiment analyzed. It will look like this:

 

{u'stop_words': {u'distinct': 8, u'total': 8}, u'paragraphs': 1, u'terms': {u'distinct': 17, u'total': 17}, u'characters': {u'percent_control': 0, u'percent_punctuation': 4.55, u'percent_whitespace': 17.05, u'percent_numeric': 4.55, u'percent_alphabetic': 73.86, u'total': 88, u'percent_uppercase': 1.14}, u'sentences': 1}

 

Social Media
About the Author
Topics
† The opinions expressed above are the personal opinions of the authors, not of HPE. By using this site, you accept the Terms of Use and Rules of Participation