User-Agent, Client-Hints and the Future of Device Detection

Luca Passani, CTO @ScientiaMobile

TL;DR: We know that W3C and Google are pushing to have the User-Agent string frozen and eventually deprecated, but there will always be ways to identify the device. If and when it happens, there will be other mechanisms that fulfill an essential developer need: relying on device capabilities for the purposes of better UX, analytics and bug fixing. At any rate, using WURFL is your best bet to be on top of the evolution in this space.

We have been following this discussion on User-Agent Client-Hints (CH) and the possible freeze/deprecation of the User-Agent string for a while now, so yesterday’s “announcement” wasn’t a big surprise for us here at ScientiaMobile.

But it was a surprise to many, obviously, as it deflagrated on Twitter and that’s when people started paying attention, a lot of attention. Some customers even contacted us for our position about the whole thing.

There are many aspects to this, and it is hard to condense a complete picture in a short blog post, but let me try.

User privacy is generally a big concern. Until a few years ago, we assumed that profiling was good for selling goods and services to people. Cambridge Analytica showed that you can do a lot more with that information, and it’s not all kosher. Now this is a huge topic that we are not going to enter into, but, as a side-effect, the discussion around privacy percolated into finer grained technology discussions and involved time-honored internet protocols. I am referring to protocols that were not meant to identify users, but that, in this age of Machine Learning and big data, can be leveraged for “fingerprinting”, i.e. tracking users beyond what cookies already allow.

HTTP headers do not correlate with a user profile generally, but, when used in combination, they can become a proxy for a user ID for certain companies (i.e. those in the business of profiling users). All these bits of information, when analyzed collectively, represent what some call “fingerprinting surface”: the more data bits there are, the wider the fingerprinting surface, andthe more precisely Machine Learning techniques can identify and track users.

The Role of the User-Agent String

The User-Agent string is not some random HTTP header. It’s been there since the beginning of time, which for us is HTTP 1.0. This magic User-Agent string will tell you what browser in a given mobile device (or tablet, or smartTV, or console, or wristwatch) is requesting your pages; and this has been going on for a few decades. Why is the User-Agent string so relevant? Because there are so much variation and feature diversity and bugs in HTTP clients out there that identifying devices is what ultimately allows developers to create good services.

I know that, for some time, there’ve been some “purists” in the web community that advocated against doing server-side detection (or what others disparagingly called browser-sniffing), and I’ll concede that looking at the raw User-Agent string is not the most elegant of solutions, but…it works. And if you use WURFL all that complexity is hidden from you. Device Detection has worked for twenty plus years and developers know how to rely on this.

So, why would anyone want to remove the User-Agent? The answer is simple: once Google devised Client-Hints (which honestly are a good idea), they have had to compromise with the privacy concerns expressed by W3C: axing the User-Agent string is a way to reduce the fingerprinting area, and they want to do it even before Client-Hints are fully rolled-out, tested in the field and operational. Unsurprisingly, this is causing concern among companies whose business depends on the presence of a User-Agent string.

Are Clients-Hint Better than User-Agent Strings?

Promoters of Client-Hints claim that theirs is a better mechanism than User-Agent strings, but is it really the case?

We don’t know for sure, because we will really need to see what headers the actual devices will be sending, but let’s assume that nothing similar to the User-Agent string is available.

The use cases for User-Agent strings and Client-Hints are overlapping, but they are not the same. This means that one can come up with examples where CH are better suited, but there are other cases where User-Agent based Device Detection is the viable option.

If a device with a 640×960 screen is used in landscape mode, Client-Hints can do a better job at telling you that 960×640 is more suited for tailoring your UX.

On the other hand, there are plenty of other cases where Client-Hints may not be quite up to the task (it’s all still fuzzy, so it is difficult to predict things with absolute certainty).

Some examples:

Image/video resampling
Android OS / hardware capabilities detection for providing correct APK build for each user.
Content-Security-Policy: Dynamic on-the-fly adaptation so different browsers get different CSP headers.
Fraud detection (forged User-Agent detection).
Browser bug workarounds.
Content negotiation.
Detecting whether users are browsing within an Android/iOS app WebView.
Analytics

These are all situations in which companies have come to rely on the User-Agent and it is hard to see how Client-Hints will be an alternative to it in that context.

And then of course there is Ad Tech. That’s an area where user privacy and effective targeting are clashing, but why prevent ad networks from taking advantage of, for example, the release date of users’ devices? Client-Hints won’t provide that information, but a Device Detection solution such as WURFL will.

But there’s a bigger issue with Client-Hints (at least the way it’s being proposed). With the traditional approach, you are getting the User-Agent string with the first request. The Client-Hints approach will try to reduce the fingerprinting area as much as possible. You will receive the first HTTP request, realize that there are no Client-Hints headers, enrich your response with a request for the relevant Client-Hints, and finally cross fingers that the browser will send the information you need.

Will you get that information or not? Will that depend on the browser? Or the user settings? According to which logic? What are the default settings? None of this is clear at the moment. This is a huge TBD, albeit Google and W3C seem to have already determined that the User-Agent string has to go no matter what.

The perspective is that Client-Hints will fall short of what the User-Agent string is delivering today. This would be bad, since Client-Hints could be rolled out without the need to remove what works currently.

Fraud detection, analytics, customer support, and bug identification may become a lot trickier. Today, developers can look into their logs and map issues to a specific device make and model that creates them. If and when those logs become a big, indistinct pool of empty UAS (or generic User-Agent strings, which would be only marginally better), organizations will no longer be able to use a wide variety of valuable tools.

What’s Next?

So is there something that the developer community and our industry in general should do about this?

If you are reading this blog post, chances are that the proposed changes have you worried and you don’t like that. In this case, it probably makes sense to contact Google and the W3C and let them know. Let them know about your use case; and let them know that what they are doing may negatively affect your business.

If and when the User-Agent string becomes generic, if this prevents you from delivering your service, it may make sense to let your users know about the problem. After all, sites do tell users that their cookies are not enabled or that they are using an Ad-Blocker, right? And those sites are not doing it because they are evil, but because they rely on certain functions to be in place to provide good UX and ultimately support a business model.

If poorly designed changes in technology keep you from doing your job, then it’s a bad situation that you don’t deserve to be in. Telling your users to change their settings if they want to use your service is OK. And if that is not enough, asking them to install a different browser might also be an option.

Anyway, this would be a worst case scenario. And we don’t think we will get to that point in the near future. We do not see the User-Agent string disappearing very soon. A lot of details still have to be figured out and we won’t know what happens for sure until we look at those headers and understand what to do based on real HTTP traffic.

As we mentioned, Device Detection is a basic developer need. Client-Hints proponents are aware that a whole industry ecosystem depends on Device Detection. The attempt to remove the User-Agent string before Client-Hints are deployed (and functionally equivalent to what we have today) will result in a backlash that will have browser manufacturers support alternative measures (such as leaving the User-Agent string where it is or providing it in a different header for a while).

One thing is for sure: whether the User-Agent string is here to stay OR future HTTP versions will evolve in the direction of “depowering” the user-agent and empowering Client-Hints, this is, from a certain viewpoint, immaterial to our customers: WURFL will follow suit to make sure that the last drop of device information is obtained through analysis of HTTP requests (augmented with our own proprietary database of device information, of course).

At the end of the day, using WURFL is everyone’s best bet for keeping abreast of the evolution of the mobile device market.

User-Agent, Client-Hints and the Future of Device Detection

Feb 21st, 2020

The Role of the User-Agent String

Are Clients-Hint Better than User-Agent Strings?

What’s Next?

Related blog posts