Introducing User-Agent Client-Hints support in WURFL (and a Rant)

As we explained in the past, ScientiaMobile has been following the evolution of the Client-Hints specification really closely over the past few years and we are now updating the WURFL API in preparation of Google’s attempt to redefine HTTP.

Honestly, we have not been very happy about this “evolution”. From our viewpoint, it represents a regression from something that has been working (and still works!) reasonably well for the mobile ecosystem. More specifically, we don’t have an issue with the introduction of Client-Hints per se. Rather, we object to Google’s declared intent to freeze the User-Agent string, a move that makes no logical sense, unless a larger design of exploiting its dominant position is assumed.

Rant: We think that we are witnessing an attempt by Google to redefine internet standards that have enabled the web and the Ad Tech industry since their inception. This will hurt everyone in the industry with only Google left to benefit. Nothing of what Google is saying about the need to reduce the usefulness of the UA string is very logical. The UA string is not about identifying users. In fact, the UA string is about detecting what device make and model is being used. Replacing the simple UA data point with a complex HTTP dance between browser and server can easily be read as an attempt by Google (who already has a first-party relationship with virtually all internet users) to make the lives of its Ad Tech competitors harder. Also, arguably, the introduction of Client-Hints is likely to ultimately increase the fingerprinting surface (i.e. the ability to identify and track users) of HTTP requests, effectively achieving the opposite result of what Google claims. We hope that the EU, the US government and other governments around the planet will recognize this and tell Google to stop. End of the rant.

As all WURFL users know, the User-Agent string has been around virtually forever, its use nicely described in the HTTP spec. If things end up going according to Google’s plans, though, a partial freeze of the User-Agent will be rolled out on desktop browsers first during 2022, and the move will extend to Android devices in Q1 2023. The frozen User-Agent string will no longer reveal the device model, and it will brazenly lie about the version of the browser and operating system.

Honestly, we are curious to see whether Samsung, Huawei, Amazon, Motorola, OnePlus, and the long list of Android device manufacturers will bend to a choice that will be problematic for them too. Having said this, we have decided to prepare for the worst case scenario and support User-Agent Client-Hints in Q1 2022. This is not a small change.

User-Agent freeze represents a drastic overhaul that will impact how WURFL works under the hood; and that’s not even the biggest issue. The problem is that thousands of companies in the mobile ecosystem will be forced to change how their current systems work. They’ll need to implement a messier workflow to determine a device model and will have a much harder time identifying and fixing issues when applications break. Also, companies and organizations will find themselves in a situation where Google can stop the device data they receive from devices in the future: Google will be able to do so at its whim through the Privacy Budget.

As we know, the User-Agent allows services to recognize the make and model of a device in order to support a good UX (user experience) and a variety of other use cases (video, analytics, fraud detection, bug analysis and workarounds, just to name a few).

Identifying a browser and the device it runs on (along with its properties) is akin to a basic human right for companies in the mobile ecosystem. For this reason, not even Google wanted to be caught red-handed trying to kill the User-Agent string. Rather it has opted for the User-Agent freeze: the UA string will no longer be changed to reflect a device make and model, – or even just the browser and OS version. Since freezing the UA is still a way to kill it, Google has opted to rename the operation “User-Agent reduction”. In Google’s eyes, this makes it harder to see what is really going on, as the Mountain View giant effectively kills the User-Agent string.

The (Very) Gory Details

Let’s look at the practical implications for WURFL users, assuming that Google’s roadmap pans out according to the published plan.

Today, a Google Pixel 6 Pro sends HTTP requests that, by default, look like this:

User-Agent: Mozilla/5.0 (Linux; Android 12; Pixel 6 Pro) AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/95.0.4638.74 Mobile Safari/537.36
Accept-Encoding: gzip, deflate, br
Accept: text/html,application/xhtml+xml,application/xml;q=0.9, image/avif,image/webp,image/apng,*/*; \
q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Language: en-US,en;q=0.9
Version: HTTP/1.1

The User-Agent string allows WURFL to reconcile the HTTP request with a profile of the Pixel 6 Pro and access all of its capabilities. According to Google’s plan, here’s what that same device is going to send one year from now:

User-Agent: Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/93.0.0.0 Mobile Safari/537.36
Sec-CH-UA-Mobile: ?1
Sec-CH-UA-Platform: "Android"
Sec-CH-UA: "Google Chrome";v="95", "Chromium";v="95", ";Not A Brand";v="99"
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,\
image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Version: HTTP/1.1

In addition to a UA string that blatantly lies about the OS and browser versions, this request does not carry enough information to determine what device model a service is dealing with. A different Android device will produce identical headers and nobody will be able to determine the device model that sent this request – nobody except Google, who can access that information through its first party relationship with the user.

According to Google, servers who need device information should respond to users with one of two possible HTTP Response Headers:

Accept-CH: Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version

Critical-CH:  Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version

In the case of Accept-CH, the browser is invited to send more information at its convenience at a later moment. In the case of Critical-CH though, all ongoing operations – including the rendering of the page – should be aborted, and a new request that includes the critical CH headers should be initiated.

Once the demand for model information is relayed to the client, the client may (just may) decide to honor it and send subsequent requests with an augmented set of Client-Hint headers:

User-Agent: Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.0.0 Mobile Safari/537.36
Sec-CH-UA-Mobile: ?1
Sec-CH-UA-Platform: "Android"
Sec-CH-UA: "Google Chrome";v="95", "Chromium";v="95", ";Not A Brand";v="99"
Sec-CH-UA-Model: "Pixel 6 Pro"
Sec-CH-UA-Platform-Version: "12.0.0"
Sec-CH-UA-Full-Version: "95.0.4638.74"
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,\
image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Version: HTTP/1.1

The Good News First: WURFL can detect a device through User-Agent Client-Hints.

As far as WURFL is concerned, the version of WURFL that is being released in Q1 2022 will be able to handle the User-Agent Client-Hint headers and reconcile them with the right WURFL profile.

Note: For the record: the current version of the WURFL API, the one available to ScientiaMobile customers as we write (January 2022), already supports User-Agent client-hints. That function is opt-in. We implemented it that way to allow ourselves and certain customers to tinker with the new paradigm before prime time.

This means that from a WURFL API perspective, not a lot has changed: you give WURFL an HTTP request, and WURFL will return device information. If you have been relying on the User-Agent alone, it is time to start collecting all CH headers and use those. You will need to “rebuild” HTTP requests with Client-Hints from your logs if you intend to analyze your data and interpolate it with WURFL device data (more about this later). You won’t need to worry about whether the User-Agent string is still significant (i.e. not frozen) or not. The WURFL API will figure that out for you and go for the Client-Hints if available.

Of course, in case a frozen UA is all WURFL has, it will act on it and return the “generic” values that are not very significant. There is a new feature, though. The isUaFrozen() utility function will let you know programmatically whether a user-agent string is “frozen” or not. This will allow WURFL users to exclude traffic from ad targeting or place it in a separate bucket when performing data analysis.

The following example will show how the “new” API will work in practice (the word “new” is in quotes to indicate that existing WURFL users will be right at home with an API that is virtually identical to the one they have used all these years.)

// Create a WURFL Engine
$wurfl_engine = new \ScientiaMobile\WURFL\WURFLEngine($container);

$headers = [
            "HTTP_USER_AGENT" => "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 \
               (KHTML, like Gecko) Chrome/97.0.0.0 Mobile Safari/537.36",
            "HTTP_SEC_CH_UA" => '" Not A;Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
            "HTTP_SEC_CH_UA_PLATFORM" => '"Android"',
            "HTTP_SEC_CH_UA_PLATFORM_VERSION" => '"12.0.0"',
            "HTTP_SEC_CH_UA_MODEL" => '"Pixel 5"',
            "HTTP_SEC_CH_UA_MOBILE" => "?1",
            "HTTP_SEC_CH_UA_FULL_VERSION" => '"97.0.4692.70"',
        ];

$request = new \ScientiaMobile\WURFL\Request\HttpRequest($headers);
$requestingDevice = $wurfl_engine->getDeviceForRequest($request);

$wid = $requestingDevice->getID();
echo "Current headers:\n";
echo "\n";
echo "Is UA Frozen:";
echo $request->isUaFrozen()?"True":"False";
echo "\n";
echo "Header Quality:{$request->headerQuality()}\n";
echo "WURFL ID: {$wid}\n";
echo "**Virtual Capabilities**\n";
echo "Complete Device Name: {$requestingDevice->getVirtualCapability("complete_device_name")}\n";
echo "Device OS: {$requestingDevice-> \
         getVirtualCapability("advertised_device_os")}\n";
echo "Device OS Version: {$requestingDevice-> \
         getVirtualCapability("advertised_device_os_version")}\n";
echo "Device Browser: {$requestingDevice-> \
         getVirtualCapability("advertised_browser")}\n";
echo "Device Browser Version: {$requestingDevice-> \
         getVirtualCapability("advertised_browser_version")}\n";

Running this code snippet will yield:

$ php php_api_test_ch.php
Current headers:

Is UA Frozen:True
Header Quality:Full
WURFL ID: google_pixel_5_ver1_suban120
**Virtual Capabilities**
Complete Device Name: Google Pixel 5
Device OS: Android
Device OS Version: 12.0.0
Device Browser: Chrome Mobile
Device Browser Version: 97.0.4692.70
$

It goes without saying that the new WURFL API is a virtual drop-in replacement for older versions. If you are using the API with full requests (as opposed to passing the UA string alone), then chances are that you won’t need to modify your application at all.

As the technology around Client-Hints evolves, WURFL OnSite and WURFL InFuze will remain your best chance to hide all that complexity behind a stable API that will figure out the details for you. Of course, that’s assuming your server is configured to receive the high-entropy CH headers, as we explained above. You will also need to start configuring your server to ensure that browsers send over those crucial client-hints.

Requesting Client-Hints through HTTP Response Headers

Telling you how to configure your servers to request headers is borderline outside the scope of this document. To inspire you, here’s a taste of how Apache users may want to do it, but we won’t go very deep with it here.

Note: Users of NGINX, Varnish Cache, HAProxy and other proxies/load-balancers should refer to the respective documentation on how to configure logging.

You’ll first need to enable mod_headers and follow this guide to add custom HTTP headers.

You can enable mod_headers with sudo a2enmod mod_headers and restart apache with sudo systemctl restart apache2.

Here is the list of UA-CH headers. Pick the ones that you want to add to your site’s Apache configuration file (case matters!). As a minimum, you’ll want to add Sec-Ch-Ua-Model, Sec-CH-UA-Full-Version-List and Sec-CH-UA-Platform-Version. Here’s what our test webserver config looks like:

$ cat mytestserver.conf
<VirtualHost *:443>
   LoadModule http2_module modules/mod_http2.so
   Protocols h2 h2c http/1.1
   ServerName mytestserver.local
   DocumentRoot /var/www/mytestserver
   Header set Accept-CH "Sec-Ch-Ua,Sec-Ch-Ua-Arch,Sec-Ch-Ua-Bitness, \
        Sec-Ch-Ua-Full-Version,Sec-Ch-Ua-Full-Version-List,Sec-Ch-Ua-Mobile, \
        Sec-Ch-Ua-Model,Sec-Ch-Ua-Platform,Sec-Ch-Ua-Platform-Version"
   SSLEngine on
   SSLCertificateFile /etc/ssl/certs/apache-selfsigned.crt
   SSLCertificateKeyFile /etc/ssl/private/apache-selfsigned.key
</VirtualHost>

Once your server is restarted, you can confirm that the Response headers (that request the CH headers) are being sent correctly. You can use curl for this:

$ curl -k -I https://mytestserver.local
HTTP/2 200 
date: Fri, 14 Jan 2022 16:36:12 GMT
server: Apache/2.4.41 (Ubuntu)
last-modified: Fri, 14 Jan 2022 15:06:51 GMT
etag: "f2-5c4bb350661f9"
accept-ranges: bytes
content-length: 242
vary: Accept-Encoding
accept-ch: Sec-Ch-Ua,Sec-Ch-Ua-Arch,Sec-Ch-Ua-Bitness,Sec-Ch-Ua-Full-Version,
     Sec-Ch-Ua-Full-Version-List,Sec-Ch-Ua-Mobile,Sec-Ch-Ua-Model,
     Sec-Ch-Ua-Platform,Sec-Ch-Ua-Platform-Version
content-type: text/html

Logging Client-Hint Headers

If you want to do device detection on your logs, then you will now need to log CH headers as well. How to do it across servers is also outside of the scope of this document, but here’s an example of how this is done in Apache.

Apache only logs a few header fields by default. This page is a good reference to understand how logging works in Apache and how you can customize it.

We used a Custom Log directive to log specific UA-CH headers and simply added a Custom Log Format to the apache config file.

CustomLog /var/log/apache2/uach.log "{ \"time\":\"%{%Y-%m-%d}tT%{%T}t.%{msec_frac}tZ\",
 \"process\":\"%D\", \"filename\":\"%f\", \"remoteIP\":\"%a\", \"host\":\"%V\", 
\"request\":\"%U\", \"query\":\"%q\",\"method\":\"%m\", \"status\":\"%>s\", 
\"referer\":\"%{Referer}i\", \"user-agent\":\"%{User-Agent}i\", 
\"sec-ch-ua\":\"%{Sec-Ch-Ua}i\", \"sec-ch-ua-mobile\":\"%{Sec-Ch-Ua-Mobile}i\", 
\"sec-ch-ua-platform\":\"%{Sec-Ch-Ua-Platform}i\", \"sec-ch-ua-arch\":\"%{Sec-Ch-Ua-Arch}i\", 
\"sec-ch-ua-model\":\"%{Sec-Ch-Ua-Model}i\", 
\"sec-ch-ua-platform-version\":\"%{Sec-Ch-Ua-Platform-Version}i\", 
\"sec-ch-ua-full-version\":\"%{Sec-Ch-Ua-Full-Version}i\", 
\"sec-ch-ua-full-version-list\":\"%{Sec-Ch-Ua-Full-Version-List}i\"}"

This gets Apache to log some specific header fields at a specific location. You can add more header fields with this directive:

\"<field-name>\":\"%{Sec-Ch-Ua-Header-Name}i\"

To test the new logging format, visit your test site from a device or browser that sends UA-CH headers. As explained, the first hit won’t send the headers by default, so you will need to refresh the pace. The Accept-CH header that was sent in the previous server response has enabled the “high-entropy” CH values which should now be sent. The log is available at the location specified in the Custom Log directive above. In our case:

$ sudo tail -1 /var/log/apache2/uach.log 
{ "time":"2022-01-24T10:35:35.723Z", "process":"718", "filename":"/var/www/mytestserver/index.html", 
"remoteIP":"127.0.0.1", "host":"127.0.0.1", "request":"/index.html", "query":"", 
"method":"GET", "status":"200", "referer":"https://127.0.0.1/", 
"user-agent":"Mozilla/5.0 (Linux; Android 12; Pixel 6 Pro) AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/97.0.4692.98 Mobile Safari/537.36", "sec-ch-ua":"\" Not;A Brand\";v=\"99\", 
\"Google Chrome\";v=\"97\", \"Chromium\";v=\"97\"", 
"sec-ch-ua-mobile":"?1", "sec-ch-ua-platform":"\"Android\"", "sec-ch-ua-arch":"", 
"sec-ch-ua-model":"\"Pixel 6 Pro\"", "sec-ch-ua-platform-version":"\"12.0.0\"", 
"sec-ch-ua-full-version":"\"97.0.4692.98\"", "sec-ch-ua-full-version-list":"-"}

And Now the Bad News

At this point, we would like to say “easy peasy lemon squeezy”, but we can’t. For developers, this is a lot of aggravation that was not supposed to be there. If you are a WURFL user and you have read this far, you probably have already realized.

The problem here is the modified workflow. Essentially, Google is changing the way HTTP has functioned so far.

Firstly, connections must go through HTTPS as a prerequisite to receive any of the CH headers. This is not a deal breaker in itself, but developers know that this extra level of complexity doesn’t make their lives easier. But that’s not the biggest issue.

Google’s new mechanism forces an additional round-trip in situations where such roundtrips have never been necessary. There is no good reason to hide the make and model of a device from requests. Hundreds of thousands – often even millions – of people own the same device, be it an iPhone or a popular smartphone. Removing that information to reduce the so-called fingerprinting surface is a lame excuse. An additional HTTP round trip is something developers want to avoid at all costs for very obvious reasons: speed, complexity and ultimately user-experience. Yet this complex negotiation between browser and server is exactly what Google intends to force on everyone a few months from now. This is ironic given how much emphasis Google has placed on the ecosystem to deliver faster websites (including having Google Search demote slow sites).

But there is more. The Google spec assumes that service providers have a direct relationship with their users, i.e. that they are in the position to demand that a new HTTP Request is initiated and that a fresh set of headers is sent. Yet this is not the case in a myriad of Ad Tech use cases, Real-Time Bidding (RTB) being the most conspicuous example of this.

The Client-Hints spec seems built exactly with the objective of giving companies in the Ad Tech space a hard time: interacting with the browser to request more information is simply not an option. Furthermore, even assuming that complex mechanisms are implemented to receive the new crucial client hints, nothing keeps Google from changing the default settings of some future versions of the Privacy Budget. This move can effectively separate Ad Tech companies from the user information they depend on to run their business (as Google has already stated, Client-Hints are “potentially deniable”).

Of course, Google is totally aware of the madness in the proposal of forcing an additional round trip where one is not necessary today. Instead of backtracking (and leaving the UA string be), it is messing with the lower levels of the HTTP protocol (such as the pre-encryption TLS handshake) to convey information about the critical client hints. If and when these proposals are implemented, building and debugging services will not be an activity for the faint of heart.

More Bad News: Feature Policy / Permission Policy applies to Client-Hints too

It’s not that Feature Policy has two names (the other being Permission Policy), it’s just that these specs are still work in progress, names are still changing and available documentation still reports both names – yet this has not stopped Google’s plan to kill the User-Agent string.

Generally speaking, permission policy is a good thing. Iframes are powerful and they are embedded in web pages that sit on a company website (domain and everything). It goes without saying that developers want to be explicit about what is allowed and what is not allowed to happen in those frames and in other embedded resources.

When it comes to Client-Hints, though, developers are now burdened with all the additional complexity also to detect device features that were once readily available through HTTP. This is going to be an additional encumbrance for no additional value and no real added privacy protection. Websites that multiserve their images based on device capabilities will now need to handle additional work if those images sit on a subdomain (or a different domain altogether). Because of the application of the Permission Policy to the Client-Hints (and the UA freeze), developers will need to tinker with response headers and send things like this to browsers (example courtesy of Jon Arne):

Accept-CH: sec-ch-ua-platform,sec-ch-ua-arch,sec-ch-ua-model,sec-ch-ua-platform-version,
sec-ch-ua-full-version,sec-ch-ua-bitness,sec-ch-ua-full-version-list

permissions-policy: ch-ua-bitness=("https://classic-mountainous-actress.glitch.me"), 
ch-ua-arch=("https://classic-mountainous-actress.glitch.me"), 
ch-ua-model=("https://classic-mountainous-actress.glitch.me"), 
ch-ua-platform=("https://classic-mountainous-actress.glitch.me"), 
ch-ua-platform-version=("https://classic-mountainous-actress.glitch.me"), 
ch-ua-full-version=("https://classic-mountainous-actress.glitch.me"), 
ch-ua-full-version-list=("https://classic-mountainous-actress.glitch.me")

As an important aside, setting Accept-CH (and/or Critical-CH) and Permission Policy headers will be required for users of Image CDNs (such as ImageEngine) and for users of WURFL.js, in order for those services to keep working optimally. Users of WURFL Cloud will also need to be aware of this. They’ll need to collect certain high entropy CH headers before they send them to the WURFL Cloud to perform Device Detection.

GREASE-ing of Sec-ch-ua (can’t make this stuff up!)

You may have noticed this CH header in our example above, and you might be wondering what the heck it is:

Sec-ch-ua: "Google Chrome";v="95", "Chromium";v="95", ";Not A Brand";v="99"

This header should carry a client’s “brand and version” (which in practice means the browser and its version), but… Have you ever heard the saying, “a camel is a horse designed by a committee”? This header is one such camel, and it’s hard to understand how such a poorly designed proposal could make it this far.

In an attempt to wrap its unilateral attack on established internet protocols in a shroud of legitimacy, Google deployed an army of engineers in several groups of W3C and IETF. When creators of minor browsers raised the point that their browser might be ‘discriminated’ by content providers (a non-issue for which there is only anecdotal evidence and that can’t really be avoided, but let’s not digress), Google jumped on the opportunity to show the world how much it cares about others. The discussion and the quest for “consensus” lead to mutuating the GREASE strategy from the TLS world. Adding spurious values to the TLS protocol probably makes sense in that context, but it absolutely makes no sense in the world of browser detection. If a content provider decides to explicitly exclude a browser (or just accept one or two specific browsers on their site), it won’t be the “;Not a Brand“ substring (placed in a random position!) that prevents them from shutting down the offending unwanted visitors.

On the other hand, if a minor browser works sufficiently well, content providers won’t have a reason to go out of their way to block them. The Sec-ch-ua header is a powerful example of committee design gone bad: the funny header will be sent trillions of times (the header belongs to the list of so-called ‘low entropy’ headers, and as such it will travel with every HTTPS request from Chrome for the foreseeable future!)

BTW, there’s an upside for WURFL users here. The WURFL API will parse the Sec-ch-ua header and make the correct values available (as capability values) the way it always has, effectively insulating WURFL users from the added complexity.

A Note about the Mythical Fingerprinting Area

There is one paradoxical side to all this. If Google’s plan of UA string elimination becomes a reality – and requesting CH headers becomes common practice – this might very well result in a much larger fingerprinting area. Client-Hints will collectively offer “bad agents” more ammo for building pseudo user IDs than the UA string alone.

This is exactly the opposite of what Google claims to want to achieve in its self-appointed web police role. At the end of the day, if fingerprinting is bad, then fingerprinting itself should be regulated, as opposed to making a simple protocol fuzzier, effectively hurting the good actors, hoping that bad actors won’t figure it out.

Conclusions

Google is considering the User-Agent string a high-entropy header and, because of this, they are going to freeze it. We think that that choice is misguided. The UA string exposes negligible fingerprinting surface, particularly if compared to users’ IP numbers and, of course, cookies!

At the same time, the proposed alternative, UA freeze + Client-Hints, risks exposing a larger fingerprinting surface and is going to be a lot more complicated to deploy. Possibly impossible in certain scenarios, as services are not always in the position to solicit a new request from browsers or doing so is expensive.

Assuming that you, as a WURFL user, have access to high-entropy Client-Hint headers, not a lot changes: WURFL will parse both Client-Hints and UA string in its attempt to squeeze the last drop of device information out of those headers. The big problem might be that you’ll have to change the workflow of your application significantly. Please don’t hold a grudge against WURFL for this. Google is to blame.

Introducing User-Agent Client-Hints support in WURFL (and a Rant)

Jan 24th, 2022

Related blog posts