Applications of NLP in Privacy Policies

How many privacy policies have you accepted without knowing what the policy states? We're guessing that number is probably pretty high. And you are not alone. A survey of 2100 individuals conducted reported that 87% of individuals accept policies without reading them. Common reasons for this included the length and general readability of the policies. Of the top 500 websites, the average reading score was comparable to that of a high school student, while the average reading level of US adults is that of the 7th grade.

Luckily, Natural Language Processing (NLP) can help. For our project, we set out to detect the readability of current privacy policies (spoiler alert: you'd need to have at least completed high school to make sense of what you agree to). We also wanted to see how we could make the readability of the policies better.

So what did we do with the privacy policies?

We used privacy policies from the OPP-115 Corpus and also scrapped some more policies from websites. We tried to accommodate a variety of domains like health, information technology, social media, news, entertainment and finance. The policies were too big for language models to handle at once, so we chunked them into smaller fragments then prompted the models to generate summarized versions. The prompts we used included "TL;DR in a few words" and "explain the policy in plain language that a second grader can understand." The models we used included BART, GPT-3 and Pegasus.

TED privacy policy (original version)

Did this help in making the readability better?

To answer this question, we used two approaches:

We used scores like Flesch Reading Ease Score and SMOG score for automatic evaluation. The scores for some of the policies have been shown below.

SMOG Score and Flesch Reading Ease Score for a few privacy policies

GPT-3 constantly improved the readability of the privacy policies. In contrast, BART improved on the SMOG and FOG scores but sacrificed reading ease. GPT-3 outputs were more open-ended but also more understandable than BART outputs. Pegasus failed to improve readability adequately and was removed from further tests.

Human Evaluation: We shared the original privacy policy along with some of the summarized versions and asked others which one they would prefer to read, if at all they would read the policies. For example, we showed them the output of the TED privacy policy, as generated by GPT-3 and BART:

TED privacy policy (summarized versions)

Some of the questions, along with the responses, are as follows:

Questions and some responses

We found that GPT-3 was generally preferred over BART. In some cases, people don't choose either of the two models. This could be because of language barriers (the first language of most of the participants is not English) or the difficulty of text even after simplification. We also found that people generally look for particular aspects of privacy policies, and these preferences vary broadly. For example, someone may be concerned about the security afforded to their data and scan the policy for only that aspect.

In contrast, another person may be much more concerned about whether the website allows them to terminate their account if they wish to. Unfortunately, summarized policies do not capture human preferences, and this can be the goal of future work on this topic. Most responders, however, showed a preference for model generated outputs.

What did we learn from doing this project?

Users are concerned about what they agree to when they agree to privacy policies, and shorter policies with less complicated words can help towards this goal. However, language models are not a universal solution for making policies more readable: human guidance is still required. The models are information lossy in nature, and key legal terms might be lost. Moreover, abstractive models have biases. Some of these biases manipulate information: information found in the policy might be misrepresented (intrinsic bias), and information not found in the policy might be added (extrinsic bias). The model training is also highly sensitive to hyperparameters. Despite these shortcomings, people still generally consider reading shortened versions of the policies because they value their privacy and longer policies are much harder to read.

This project was done by two MS by Research in CSE students, Pawan Sasanka Ammanamanch and Ankita Maity, as part of the Monsoon 2021 course on Online Privacy taught by Prof. Ponnurangam Kumaraguru at IIIT Hyderabad.

Search This Blog

cs4.407 Online Privacy