Python Grabs WURFL InFuze And Squeezes Out Analytics

Python Grabs WURFL InFuze And Squeezes Out Analytics

by Filip Sufitch , the Python HandlerWURFL InFuze for Python

Python programmers frequently deal with huge amounts of Web log data. Many of them want device capabilities to be a part of that data so they can analyze how various device types and capabilities impact Web usage.

Customers have asked for a WURFL API for Python. ScientiaMobile offers Python access to our WURFL Cloud service.  This Cloud service approach may work for some people, but for folks with high volume who want more control, we offer a different approach with WURFL InFuze for Python.

The beauty of WURFL InFuze is that it provides a high-performance C++ API which Python can import.   It also provides command line utilities that enable filtering for the sake of analytics. With these tools Python programmers can integrate WURFL’s device detection into their Python code base.




WURFL InFuze for Python uses an XML file which contains the device definitions for all the mobile devices on earth.  Only two lines are need to import the library and load the XML file:

from pywurfl.wurfl import Wurfl
WURFL = Wurfl("/home/wurfl.xml")

This first example shows how you can parse a single user agent. Then, it shows how to request single or multiple device capabilities.

def plain_queries():
    dev = WURFL.parse_useragent("Mozilla/5.0 (Linux; Android 4.4; Nexus 5 Build/KRT16M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/ Mobile Safari/537.36")
    # Get one capability
    # Get many capabilities
    print(dev.get_capabilities(["brand_name", "model_name"]))
    #{'brand_name': 'LG', 'model_name': 'Nexus 5'}
    # Get all capabilities
    # result is a huge dict
    # Release the device
    # Release back-end pointers and such, to prevent memory leaking

In this second example, we process many user agents from the “sample_uas.txt” file.  Python performs serial processing of the user agents in this file.  It uses the device capability result from “ux_full_desktop” to compute what percent of visitors to the site are using a full desktop Web browser.

def serial():
    """ Sample serial bulk processing. """
    num_desktop = 0
    num_total = 0
    with open("sample_uas.txt") as fh:
        for line in fh:
            ua = line.strip()
            dev = WURFL.parse_useragent(ua)
            if dev.get_capability("ux_full_desktop")=='true':
                num_desktop += 1
            num_total += 1
    print("%d (%.2f%%) desktop devices" % (num_desktop, float(num_desktop)/num_total*100))

This third example shows how Python can use parallel multi-processing to perform analysis. In this case, “process_ua” is used in parallel to parse user agents and return device capability results. Python chunks the “sample_uas.txt” file into 10 user agents each. Then it pools the results. The results are the same as the serial processing example, just much faster performance.

#### Sample multiprocessing parallel bulk analysis
import multiprocessing as mp
import os
def process_ua(ua):
    dev = WURFL.parse_useragent(ua.strip())
    ux_desktop = dev.get_capability("ux_full_desktop")
    return os.getpid(), int(ux_desktop=='true')
def line_iter(fname):
    with open(fname) as fh:
        for line in fh:
            yield line
def parallel_mp():
    num_desktop = 0
    num_total = 0
    pool = mp.Pool()
    results = pool.imap(process_ua, line_iter("sample_uas.txt"), chunksize=10)
    pids = set()
    for result in results:
        pid, desktop_cnt = result
        num_total += 1
        num_desktop += desktop_cnt
    print("%d (%.2f%%) desktop devices" % (num_desktop, float(num_desktop)/num_total*100))
    print("PIDs involved: %s" % list(pids))

If you want to start crunching device capability statistics, then WURFL InFuze for Python is a great tool.  It gives programmers the tools to quickly integrate device detection and perform powerful device analysis at the same time.