ERRATA Factual errors / omissions in the talk: 1. About the legality of what the UM is doing: the cookies are actually legal - the Dutch cookie law carves out an exception for analytics to the permission requirement, see section 3b of the law (in Dutch): https://wetten.overheid.nl/BWBR0009950/2022-03-02/#Hoofdstuk11_Paragraaf11.1_Artikel11.7a . Nevertheless, marking the Google Analytics cookie as "strictly necessary" is misleading and the pop-up misrepresents what the law says. The tracking in the emails is likely (in my evaluation) to be illegal as currently implemented. 2. The five utm_ cookies were actually not on the main UM website but on an affiliated website. They have also since been removed. 3. Cookies are not the only legitimate way to do authenticate a user on a website: another option is HTTP basic auth or (rarely seen) client certificates. 4. Compared to IPv4, IPv6 addresses are 4 times as long and there are 2^96, or approximately 10^29 (a 1 followed by 29 zeroes) times more possibilities. 5. Copyright status of the alligator: apparently (according to Wikipedia), though USA copyright law says that works produced by their federal government employees as part of their job are in the public domain, technically this might only apply within the USA and the government there claims that copyright still applies in other countries. Thus, whether the alligator is in the public domain for you might depend on where you live. This is not the case with the other image - that is in the public domain simply because it's old. I think my use of the alligator should fall under fair use regardless, so for this does not mean that the talk is in violation of copyright. Additionally, many language mistakes are corrected (additions in [ square brackets ]) in the transcript below. TRANSCRIPT 05:34 Let's start, it's now 16:00 more or less. Sorry the talk had to be rescheduled, I ran into some issues. So yeah, let's begin. This talk is about the state of digital privacy at Maastricht University (UM) and tracking methods more broadly. 06:08 A quick outline of the presentation: I'm first going to discuss tracking in emails - I'll give a demo of UM's use of this and an explanation of how it works and some discussion. Second is tracking on the web, I'll discuss cookies and show UM's use of cookies and I'll also cover some other types of web tracking. And then some very brief general discussion: why is privacy even important, some resources... 1. Tracking in emails 07:04 Let's first just look at an email sent by the UM. I have here in the web mail, I have an email open. The question we're going to look at is: where do the links go? That's a question you would normally not even ask yourself, because, obviously - these are, like, announcements [ more like news items ] - so obviously the link will go to the announcement [ or whatever the announcement is about ]. That seems obvious, right? That's what we all assume. Unfortunately, the UM webmail doesn't actually show the links [ I mean URLs ] they go to - that's its own whole issue. But let's look at a similar email in a text... email viewing program, I guess [ an email program with a textual user interface ]. It has these citation numbers [ in the email content ] and at the end you get [ a list of ] all the links. And what we see 08:34 ... is that the links do not go to the places you're expecting them to go to. They all go to mailing.maastrichtuniversity.nl . And what we also see is that all of these links have this same component [ it looks like a password / random noise ] , which is... interesting, we'll see what that is in a minute. I can show one more email... we see in a similar email, also one of these news emails, we can see - you can check this - that this last component is the same as in [ the links in ] the other one that was sent to me. And this other component seems to be specific to this one email, then. What we also can see is that this is not only in the UM news emails, but it also is the case in any official announcement emails, basically any automated email sent by the UM email systems. So what are these things [ the unexpected URLs & the weird components ]? This is pretty shocking, right? You're expecting the link to go to the website that it's supposed to take you to, but instead it takes you somewhere else. And there are these weird things in the link. 10:28 So... what? What the hell is happening here? 1.1 Explanation 10:33 Okay, let's first recap a bit: what even is... a link (I guess)? Well, the internet is a big network of computers and every computer has an address - we'll see what that looks like in a minute, it's some numbers with dots between them. Some computers also have names. Your own computer probably doesn't have a name, but if you want to talk to the mailing server of UM, you would address the name mailing.maastrichtuniversity.nl . Then "the web": this is more or less equivalent to the hypertext transfer protocol (HTTP) - it is one thing that can be done on the internet, it is a way (a "protocol" we say) for computers to talk to each other. So when you open a webbrowser, you're basically making a bunch of HTTP requests to get various files from various other computers. 12:20 So, how that more or less works is (this is a simplified example) that if I want the about page of the UMprivacy website: First of all I look up the address of this website - the server has a name in this case, but I still have to translate that into an address so I can talk to it. Then, my computer sends what amounts to a question of "Can I have this page, /about.html ", and then the other computer replies: "Sure" and the requested file follows, and in most cases that requested file is a webpage. 13:19 So what happens in this case, with these creepy links with those weird components in them, what happens is more or less the same: My computer asks, "Can I have ...?" and then this whole thing [ this random-looking URL ], this link, and then the server replies "The actual link you want to go to is here:" and redirects me to this other web address. 14:02 So, what is this used for, these, we call them "identifiers", in the URLs? When you make a web request, what information does the server have? Well, first of all, it has your IP address, because that's the way it talks to you, it has the current time, and it has this URL. Now the thing is, these identifiers, you can make them something different for every person you send the email to. So basically - and this is what's happening in this case - because of these identifiers, these sequences of random letters and numbers in the URLs, because of them the server can know that the one making the request is you. So what does this amount to? The server knows which links in the email you clicked on, it knows at what time you clicked on them, and from which IP address. That's not good, right? 1.2 Discussion 15:30 This is a technique frequently used by e-marketers and their goal is obviously to make more money out of you. Because if you can track what links people click on, you can tailor advertisements better to them. Collecting this information on who is clicking on what links &cetera, it gives the e-marketers - or in our case the UM - a form of power, a way to manipulate you maybe. Now I'm pretty sure that UM doesn't do much with this information, or at least I hope so, but we can't check that, right? There are these tracking identifiers in the links in all these emails that are automatically sent by the university. A second question for the ethics of this is: do people even know what's really going on when they click on that link? And I would say, no they don't. I know of myself that I wasn't aware of this and when I found out I was quite shocked. This is not something people knowingly agree to. 17:19 Then also a bit more general discussion about [ email ]. Those emails that you saw, they are all HTML emails. HTML stands for hypertext markup language, and that is the language that webpages are written in, it is the format counterpart to HTTP. The thing is, it's also possible to send emails which are just plain text. Normal text without any way of hiding links behind other text - in a plaintext email you just have the URL as text, instead of a hyperlink where you have some text that takes you to the URL when you click on it, but it doesn't show the URL. Normal text emails have a lot of advantages: First of all it's a lot less expensive in terms of bandwidth - HTML emails are often four [ just a guess ] times bigger than they need to be, than a normal plaintext email would be which conveys the same information. Also, with HTML email what you have is that the sender basically dictates the appearance of the email. With plaintext email, it's more the other way around: your email program can decide how to display the email. With HTML, it is more baked into the email itself how it should be displayed. So plaintext email offers the advantage of a customisable reading experience. There are some downsides, you might say. For example, you can't inline images as part of the text itself in plaintext emails. But of course, email attachments still work, so you can just attach the image if it's important - and if it isn't important then maybe it should be left out, right? And you don't have italics and bold markup, but for emphasis you can just use, *asterisks* or something [ ALLCAPS, _underscores_, ... ]. Also, about images: they are often not loaded by default anyway, because you have exactly the same problem as with links: You can see here [ switching back to the webmail ]: "To help protect your privacy, some content in this message has been blocked". That content is the images, because what can you have? These images would otherwise be downloaded from some server - the email itself contains only the link to the image and the email program has to download them. So what can you do in these image links? You can also put these tracking identifiers in them. Not sure if this particular email does that, but that is the reason that most email clients have turned off loading images automatically. Because the links could contain those identifiers, and that would mean that the sender is notified whenever you open the email, as then all the images are downloaded. The thing is, HTML email is nowadays often turned on by default, if you want to write an email it will often make it HTML, but this is really not what you want, or at least what I want. 22:44 So, recommendations for the UM: just switch to plaintext emails for these announcements [ and news emails ]. Because automatically this means you can't hide the links, so you'll have to just give the actual links and not these tracking links that redirect you to the actual links. And secondly it's just better all way round, basically. Of course, the UM email systems don't send plaintext emails. What we can see is that there is actually a plaintext part to the email, and it's empty. They take the effort to add a plaintext part to the email, and then they make it completely empty, or in the other ones it gives you a link to the web version. Kind of a slap in the face, but okay. Alright, next part: 2. Tracking on the web 24:18 Let me start by explaining: what are cookies even? What I showed you earlier was a simplified version of HTTP, but actually, what the request looks like is this: GET /about.html HTTP/1.1 Host: umprivacy.nl.eu.org Otherheader: value So if my computer is requesting some page, about.html for example, then it sends this request, which is "GET" and then the page it wants and the HTTP version. Then beyond that, there are these "headers" as they're called. They're basically each time some name of the header type and then the value of that header. Similarly, the response format also has headers. We can see here the actual response that the server sends: HTTP/1.1 200 OK Content-Type: text/html Header-Name: value Again the HTTP version, a status code ("OK") which indicates that it found the page, and then there are also these headers, for example the content-type header which indicates, what kind of content it is (makes sense, right?). And there can basically be an arbitrary number of these headers before you get to the actual content. 26:05 So what are cookies? They are information that your browser stores for the website. When you visit a website, it could be that the website responds with some set-cookie headers, and what that does is, well, it sets a cookie. So what we see here is two examples: HTTP/1.1 200 OK Content-Type: text/html Set-Cookie: lang=en; Expires=Thu, 31 Mar 2022 01:00:00 GMT Set-Cookie: session-id=0eLlIXY ... ; Expires=Fri, 25 Mar 2022 01:00:00 GMT "lang" is "en": that means the language that the page should be displayed [ sent ] in is English. And we see it's given an expiry time. Similarly, we have another cookie called "session-id"; session-id cookies are usually used to keep someone logged in. This "session" is your session on that website and as long as you have that cookie, as long as you send it along with your web requests, then you stay logged in. And then every time you request a page from this website in the future, your browser sends along these cookies. It sends along this language cookie, it sends your session cookie so that the server knows it's you if you have, like, an account on that website. [ repetition elided ] 28:14 So let's take a look at the UM website. What do we have here? It says "This website uses cookies... (bla bla bla)", this is a GDPR statement. And what we see here is that in big bold colours there is "Allow all cookies". Then below that there is a less nice-looking "Customise", which will probably be work. Then below that there's the option in white, a little less eye-catching, and that is to "Use the necessary cookies only". So already, it's kind of encouraging people to just allow all cookies, to just not bother, basically. If we go to the details, 29:26 ... what we see is that these cookies that the website wants to set are split into various sections. We have 66 whole marketing cookies that it wants to set - the purpose of marketing cookies is to optimise the advertisements to you. And for advertising companies this is a source of money: they're trying to manipulate you the best with their advertisements. That sounds like maybe a mean way to say it but it is what they're doing. Then there are statistics cookies, which are supposed to give UM some statistics about how its website is used. [ Then ] preferences cookies - that seems reasonable, right? Cookies to store your preferences so you don't have change the language setting from nl to en every time. And then there are the "necessary" cookies. Now this is very interesting, because we [ can ] see there are some "interesting" cookies in here. There are for example all these "utm_" cookies, and the description given [ for all ~5 of them ] is: "Used by the website operator in order to measure the efficiency of their marketing" . That is odd, right? That this is marked as a "necessary cookie". And we also see [ still in the "necessary" section ] ... google analytics. "Analytics" is basically: google tracking all your users and then giving you some statistics on them. Privacy-concerned people, they might not want the website to contact google every time they visit. I mean, that seems like a bad idea, to contact an advertisement company every time you visit a website. And what we see here is that this cookie actually contains a unique ID. So, google stores a unique ID in your webbrowser as a cookie and this is sent along every time such that google can keep track of you on behalf of the UM, and all these cookies are in the """""necessary""""" part, so they're always enabled, even if you choose "necessary cookies only". And what we see here [ in the "about" tab of the cookie consent pop-up ] is "The law states that we can store cookies on your device if they are strictly necessary for the operation of this site". And under "strictly necessary" they understand under that to fall certain cookies that really belong in the "statistics" or "marketing" part. That's pretty bad, right? I hope people can agree that that's not something the UM should be doing. 33:52 If the law, the GDPR says that you can only use the strictly necessary cookies and for the other ones you have to ask, then you shouldn't and it seems actually illegal to put these unnecessary cookies into the "necessary" part such that everybody has to agree to them. This seems like a pretty blatant GDPR violation to me . That is just not okay, right? 34:31 Anyway, so let's think about what are the legitimate uses of cookies. The examples I gave are already basically the two types of legitimate uses for cookies. One is to have accounts on the website: without cookies - well, we'll get to some other ways that websites can recognise their users, but to keep track of people [ so they can use ] an account, you need cookies because otherwise the webserver just can't recognise who is who. So for that you need , like, session-id cookies. Because otherwise, after filling in the log-in form, the webserver would immediately forget you, so that is one legitimate use of cookies. Another one is these setting cookies - I gave the example of a language setting cookie. Those are used to store one setting, and that seems like a perfectly honest use of cookies, right, so people don't have to change the settings on the website every time they visit. Now for other cookies, it's hard to see how they could be necessary while not falling into one of those categories. The site has to work if those cookies are deleted, because otherwise the first time someone visits the website, they don't have these cookies and the site also has to work. So this whole "necessary cookies" thing, it's hard to see - I guess some of these cookies in their list are probably required for their software, but with other software they could be left out - but still, there should not be that many... there are not that many [ legitimate ] uses for cookies, actually. So a better way to do things would be to do per-cookie consent. If you restrict yourself to these two types of cookies, you don't do marketing or statistics cookies or whatever, then every time a user wants to log in, you can say next to the login button: "Note: logging in requires a cookie" and if they click the login button, they agree to the cookie. And every time a user wants to save their settings, you can display next to the save settings button: "Note: this uses some cookies on your computer". And then you don't have to do this ugly agree-to-all-the-cookies consent pop-up, which is annoying, it causes distrust in people - and rightly so, because in this case it's really trying to trick you into agreeing to things you shouldn't be agreeing to. So yeah, if you just cut out all this part and restrict yourself to legitimate cookies (at least what I consider to be legitimate cookies) then you can do away with the whole "This website uses cookies" thing. 39:22 Some more considerations: first of all, these statistics that they're getting, are these actually useful? First of all, what does not show up in your statistics is the loss of trust that you've incurred by shoving this cookie consent pop-up in people's face. It gives immediately the message that UM doesn't care for you, does not really respect you as a user of their website. And all the people who know a bit about this stuff, they know enough to turn these cookies off, so you do not capture basically all the sensible people who have enough knowledge to protect themselves, you don't see statistics on those. You only see statistics on those who just click "Allow all cookies". And really, why is it even a popular idea that feedback for free on your website [ should be expected ]. I mean, you can just pay someone to give you some feedback on the website. And actually, that will be a lot more useful, because that feedback will be contextual, you will be able to ask questions, "Why do you think that?" &cetera This statistical feedback is very limited in what information is even contained in there, it does not paint an accurate picture. If you really want to know how to improve your website, you should just pay someone, pay a random student 20 euros to give you some feedback - I mean, I'll do it for free, I'll happily... I have a lot of feedback to give about the UM website. I'll give it away for free. These statistics... honestly I don't even understand what is the motivation for UM behind having this, and the marketing... people who know a bit about it, they disable those cookies. That's also the second point: it really isn't a long-term viable strategy, because of course the younger people know [ questionable ] how to work around these consent pop-ups, people know about web tracking, this knowledge spreads, eventually everybody is going to be turning off this stuff. So it's also not something you can rely on in the long term to give you any information. So my recommendation to the UM would be to try, for example, per-cookie consent, or just throw all the marketing and statistics cookies out, remove Google Analytics from your website. You don't need it, you can just get real feedback from an actual person instead, if you really want to improve your website (and I don't think you do). Then we move on to the next part, which is: 2.2 Fingerprinting 43:43 As the name implies, this is a set of techniques whereby the website somehow takes your "digital fingerprint", as it were, such that they can recognise you even if you don't have any cookies with identifiers in them. The thing is, these settings cookies, they can actually also be dangerous, can allow a website to determine who you are. Why? Because if you have 10 of these cookies and each of them is set to a highly specific setting, all that information combined gives the website more or less... Well, if you are the only person who has all these settings set to those values, then the website knows that it's you. So the danger with even normal settings cookies is that many small pieces combined can reduce the number of people that it can be drastically. 45:30 Another part of the HTTP protocol is that another header that most browsers send with the request is the "User-Agent" header. This contains things like your operating system, your browser with the version. All this information, it might not be unique among all visitors to the website, especially if you have a popular operating system and browser, then there might be many people who have the same user-agent. But especially with those version numbers, it is very easily possible that the number who have the same settings [ I mean the same user-agent ] is very small, or you might be the only one with this exact user-agent. So the headers in the request are another place where fingerprinting can happen, where a website can tell who you are even though you've not told them who you are. 47:04 And then [ we come to ] by far the largest threat in terms of fingerprinting, that's JavaScript (JS). What is JavaScript? JS is code/software that the browser executes on your computer for the website. On a webpage, there might be a link to a JavaScript file, and what your browser does is it downloads that JavaScript script, that code, and it runs it - at least, if you have a browser with JS enabled, then it executes that code on behalf of the website. Now the problem is that quite a lot of information is exposed to JS code. The JavaScript code can access some information about your device, for example, and together this is an enormous amount of information. And of course, it is possible that the JS code sends this information back to the website, or that [ it gives ] you a link with the information contained in it and then you click on that link and then the website knows. This is a big danger, it's a way that websites can track you without cookies. And finally there's also one particular thing that is exposed in JavaScript and that is the "Canvas" property - nicely sharing its name with our university's digital learning system, but that's something else. The Canvas property allows very exact, unique fingerprints of your device [ to be taken ]. Basically with the Canvas fingerprinting, it's possible for a website to uniquely pin the request down to one device so it can always recognise you. Now luckily, at least in Firefox, Canvas fingerprinting is now turned off, it gives you a warning if a website tries to use it, but this is not done everywhere yet. 50:05 Finally there's also IP address tracking. That's yet another... danger, I guess, another way for websites to recognise you between web requests even though you don't have an account on that site. That is that your IP address doesn't change much - at least, it can happen that it doesn't change much, it doesn't have to, but for example your IP address at home is often a fixed IP address for the whole building or something. That is again one opportunity for websites to "log you in without logging you in", let's say. The upside is that IP version 4 is slowly being replaced with IP version 6, and the thing with version 6 is that it has much bigger addresses, like earlier we saw addresses like this [ going back to previous slides ]: 51:43 it's 4 numbers up to 250, 256 to be exact (well, 255, but okay), and in IP version 6 you have, and now I have to calculate, I think... okay, now I don't know exactly how much address space there is . But it's so much bigger that it's much more likely that you'll get a unique IP address every time you connect to the internet, and then IP tracking becomes less of an option, like almost not at all. So that's positive news, but this transition to version 6 is going very slowly, unfortunately. 52:51 Then quickly about online trackers: these are websites that don't offer content, they just track users around. Many websites will have something in their webpages which is some content [ for example an image that ] is one pixel big, you can't even really see it, and that pseudo-content comes from a tracker website. That means that this tracker gets contacted when you visit [ one of ] many other websites on the internet. They use techniques like fingerprinting and cookies; often the browser allows these trackers to store cookies on your device even though it's not the website you originally went to. And that means they can track you across many different sites and collect information on your behaviour across all of them. So that's a problem. For example Google Analytics is arguably one of them. We saw that the UM website contacted it and that in the cookie pop-up you agreed to let Google Analytics store a cookie on your device. Google Analytics doesn't offer content, the only thing on the actual site is these tracking pixels, so that they then follow you around, collect data on your browsing behaviour across many different sites. Those online trackers are really the biggest offenders, the biggest danger also, because they have access to so much information across many sites instead of just one site. So those are a big threat to privacy, and if you can block them, which we can, as we will see, you've avoided a lot of the worst traking on the web. 56:06 One other point: in the previous section I talked a bit about HTML email versus just plain & simple text email, and again we see drawbacks of complexity. We saw that JavaScript creates privacy risks, but [ it ] not only [ do ] that: a complicated website is also just bad in general. It's bad for people with slow internet, it's bad for people with a slow computer, or [ people ] who use an old webbrowser program, people a screen reader - that's the same, by the way, for HTML email - that just doesn't work nearly as well when the website is very complex. Also just in general, the more complex a software system is, the more likely it is to have some mistake in it. So all around, a complicated website is bad. 57:22 Now obviously, I say this with a website like this [ showing the UMprivacy website ]... this is maybe a bit extreme, but it's definitely possible to make a website that is good, without [ switching to the UM website ]... not sure we can see it, but the UM website requires, like, 67 JavaScript files to run. That kind of stuff is just insane. And on my device, currently, it's working quite nicely, but on a bit [ of an ] older device, on a device where JavaScript is turned off maybe, which I really should have done here, there it breaks down, that is the problem 3. General discussion 3.1 Why privacy matters 58:28 Finally a bit about privacy [ in general ] - I don't want to spend too much time on it, but often people say stuff like "Well, I have nothing to hide, Google can have all my data, I don't care, blablabla..." First of all, I don't think that's true, most of us have things that we don't want everyone to know, and it's important to remember, as I said earlier, that information does give people power over you. When privacy is eliminated across a society, that is really a dangerous situation, because then whoever has all the information on everybody, they have immense power. That is also why privacy is critical to a free society. Also - I think this is an Edward Snowden quote - "Saying that you don't want privacy because you have nothing to hide, that's like saying you don't want there to be freedom of speech because you don't have anything to say". If privacy is not there on a society-wide level, it stifles any sort of independent thinking, really. And that ultimately is detrimental to society. Then, 3.2 Why should UM care? 1:00:28 First of all, the university is a public institution, it should be working for the good of society, and this user-hostile behaviour on their website and in their email systems, that is just completely opposed to any sort of claim of being an institution for the betterment of society. They can do all they want on other fronts, but it just completely clashes with this approach on their website and in their emails. Also, as I said, I think it's pretty counter-productive: you lose trust, you really do lose trust by including this shit in your website and in your emails. And as long as people don't notice the links in the emails, maybe you're fine, but people will find out - I mean, I found out and I was quite furious I can tell you. Ultimately it costs more than it really gains you. 3.3 Resources 1:02:04 Finally some resources. First of all, I want to quickly cover some browser extensions that you can use to hopefully protect yourself. One important one is Privacy Badger. Basically it detects websites that seem to be trackers [ switching to UM website ] so here we see [ in the extension menu ] that the UM website is contacting blueconic.net and Google Analytics. It recognises these trackers based on their behaviour and it blocks them. It makes sure your browser doesn't make any requests to those websites. That's a very useful one. A second one is Cookie AutoDelete. This doesn't do much by default, you have to [ enable ] "autoclean". [ Now it automatically deletes cookies after you close a website. ] Then if there is a website where you do want to keep certain cookies, you will want to put that explicitly on the list of allowed sites. Also, I would recommend, in the settings, to also check all of these boxes [ "Enable blablabla cleanup" ] which are different ways that a site can store information on your computer. Finally, there is JShelter, and what this tries to do is restrict the amount of information that is available to the website via JavaScript, so that fingerprinting becomes less effective or maybe, what it also does some times is it fakes the values - every time the website tries to take a fingerprint they get a different fingerprint, which means that the webstie can't recognise you after all. I've picked these 3 extensions because if you leave them in their default settings they will basically not break anything, or they shouldn't, at least. I mean, JShelter is new, it might be a bit experimental, I would be a bit more cautious - if suddenly websites start breaking after you install it then I guess it's not quite ready for use. They are also all by... what was I gonna say... okay, nevermind. Another piece of software that is definitely recommended is Tor - Tor is basically *the* solution if you're concerned about IP tracking, because basically it just hides your IP address. 1:05:38 The second part of the resources: some websites. First of all there is the Electronic Frontier Foundation, [ eff.org ], they have a lot of information on surveillance and how to avoid it. They also are the makers of the Privacy Badger extension. Secondly, if you want to replace user-hostile web services with better alternatives - like, for example, you want to find an alternative to Google Maps, or Google Anything basically - these two [ websites ] are highly recommended: degooglisons-internet.org/en/alternatives - it's a French site but it has an English version, it's from the organisation Framasoft which makes Peertube among other things, where this talk is being hosted. And also prism-break.org , they have a lot of alternatives to various services and software. 1:06:48 Finally, the image credits. We saw an alligator and a very disappointed man. These are both in the public domain . They are available via WikiMedia Commons. 1:07:04 And that was the presentation, thank you so much for listening if you did, and please ask questions. I'm not even sure there are people in the chat - there aren't. Well, that's unfortunate, but you can ask questions afterwards. So yeah, that's it, thank you for listening, goodbye.