The largest data broker in the USA (very) poorly anonymizes its data

The largest data broker in the USA (very) poorly anonymizes its data

François Manens


February 20, 2020

The anonymization of data is supposed to protect customers against misuse of their data by a third party. At least in theory, because in practice it is extremely complicated – if not impossible – to implement.

Is data anonymization really possible? Data brokers – or data brokers – resell data provided by one company to other companies specialising in advertising, marketing or analysis. In order to protect customers, the data broker usually anonymizes its data. As a result, the buyer should not be able to trace the identity of the persons to whom the data belongs. Except that sometimes the precautions taken by brokers are insufficient.

Yodlee, the largest reseller of banking data in the United States, would be in this case, according to Motherboard. The U.S. site got hold of an internal document from 2019, and found that Yodlee’s method of anonymizing data did not prevent the re-identification of individuals. Their customers could therefore quite easily track a person’s buying behaviour. And if these data sets were intercepted by criminals, the consequences would be even more serious. They would, for example, make it possible to create tailor-made phishing schemes to steal banking data.

This investigation comes at a bad time for Yodlee. Several US senators were already asking the US regulator, the Federal Trade Commission (FTC), to investigate its parent company, Envestnet. They suspect it of reselling transaction data without users’ consent.

If we find the name of the person in 2 clicks, it means there is no anonymity.

If we find the name of the person in 2 clicks, it means there is no anonymity.

Like other data brokers, part of Yodlee’s business is reselling data related to the financial transactions of tens of millions of Americans. The data is purchased by investment and financial research firms to better understand the buying habits of customers. This practice is legal, although the role of data brokers is regularly decried.

On the other hand, this system of reselling customer data must respect certain precautions. The data must be anonymized before being resold. This process consists of removing all indicators that would make it possible to retrieve the surnames and first names of individuals from the data set.

Not so thorough data cleansing

In principle, therefore, purchasers of customer data cannot identify exactly who owns it. This cleansing makes the data less interesting for the advertising or marketing companies that buy them, since their targeting will be less accurate as a result. However, the data can still be used for analysis and to identify more general behaviour.

Problem: This anonymization process, which is regularly questioned in itself, does not offer sufficient guarantees in the case of Yodlee. « I’m going to be blunt. This pseudo-anonymization sucks, » says UC Berkeley researcher Nicholas Weaver, to whom Motherboard showed the documents. Yet the company promotes its service as the most comprehensive on the market, thanks to « the strength of its data acquisition capabilities, its extensive data cleansing, and its ability to do it on a very large scale. »

Time and location data to trace back to the customer’s identity

Time and location data to trace back to the customer's identity

Specifically, several financial companies such as HSBC, Citigroup or Bank of America send their transaction data to Yodlee. Yodlee then allows its customers to download the data as a text file. In the meantime, it will have carried out a cleansing – fully automated – which consists of removing all the first names, surnames and email addresses that appear on the documents. It also hides account, telephone and social security numbers, which are replaced by crosses (in the form of « XXX »).

But that’s not enough: the remaining data may be enough to re-identify certain individuals. They would therefore be « pseudonymized » and not « anonymized », an important difference. It is true that the data sold to the final buyers do not contain information on the identity of individuals. On the other hand, Yodlee does not delete all kinds of spatio-temporal data: date of the transaction, name of the seller, location of the sale… data that can easily be cross-referenced with other data. If one of the buyers has additional data, he will be able to re-identify the bank’s customer.

In 2015, following a previous Wall Street Journal survey, Yodlee defended that it was doing the « technical and administrative work that regulators have recommended » to maintain the anonymity of the data.

This time, he said, « We follow industry best practices on data security and privacy issues, and we employ systems that remove all known identifiers from the data that is collected. ». He adds that he strictly follows the California Consumer Protection Act, which emphasizes de-identification processes.

It is possible to question Yodlee’s practices… or the concept of anonymization itself.

We use cookies and retain anonymized data to help us do our audience measurement work better, to help our business partners pay us and to help our advertising partners deliver ads that are relevant to you. In short, nothing that comes out of our media business.

However, you can adjust the parameters concerning you: you won’t see fewer ads on Numerama, but they will be less targeted. By clicking on « I accept », you accept Numerama’s use of advertising cookies and fine audience measurement.

The largest data broker in the USA (very) poorly anonymizes its data
4.9 (98%) 32 votes