When people say that apps are stealing your data, what exactly does that mean?
Apps like Temu or TokTok. Or those cheap electronic devices where you have to download a questionable app and register an account. What exactly is being stolen and what is being done with it? Who is doing it? Why?
Data scientist here! In addition to the data points others have mentioned, there is actually a lot more data available than you would think in the form of metadata. We call the process feature engineering - essentially building a set of inputs that help determine an output, or prediction. How long you spend in the app, how long you stay on a screen before changing, how long you view a TikTok before swiping, which of the default settings you change, into what, all of this is used in machine learning models to help build a more accurate advertiser profile for you. Even if you don't volunteer data about yourself, your behavior in a way informs on you, even if you don't realize it. Through inference, a machine learning model could accurately deduce your age based on your behavior, for example.
I anecdotally got into a CEO data conference, where leaders were discussing strategy and tactics. Biggest topic of the day was, why can't I track how many times someone sees my physical store/billboard/sign and makes a decision. Geofencing + your cellphone GPS isn't accurate enough for these guys, they want to know how long you stared at the store, what made you move in, what demographics you belong to, and how can they maximize your likelihood to purchase more stuff.
Why does this matter? People are more likely to buy more stuff in a store wandering around than on a market place where they just swap tabs to get the same thing from somewhere else.
If I can make my store front like temu to get you in and keep you there, then it's likely you'll be interested in buying more stuff you didn't know you wanted.
Yep, I've been at conferences for data science where I hear talking about tracking position in a store using things like Apple air tags for the same reason.
So, the goal typically is to gather as much information about a user in order to define a profile that advertisers will use to serve ads that are more relevant to the end user? Is there any other end goal, such as to build a better app or inform decisions that will ultimately lead to a better user experience?
To add to the other great explanation here, if you want to research a machine learning model that could do this, I would start with a model called logistic regression.
It's a fancy way of doing statistics and connecting the information they have. Are you interested in Minecraft? That probably says something about you. Do you look up maths homework assignments? We now know exactly which grade you're in. Do you buy gifts for a graduation ceremony? Diapers or baby shower utensils? Either you or your friends are in the age of having a family. If more data points are connected, you can probably make a very precise prediction.
A machine learning model can learn which things people care for, look up or buy at a certain age and then do the predictions. Giving input data to a model and then letting it compute a corresponding output is called 'inference'.
Phone number, email, anything else you put in, plus device and connection data. Also, depending on the app, it could steal passwords, cookies, banking info, etc.
Apps are also interested in how long you stay on a particular page, whether you tap on any ads, and how often you visit particular parts of the application.
The theft is not generally that they're collecting the data, the theft comes from them not paying you for it, and also usually not telling you they are collecting it. Taking something of value from someone without compensation and permission.
In terms of what they do with it, it isn't really important since the theft has already happened. But usually the data is sold to advertising agencies, or other application developers, sometimes it is used for research, and it can often make its way to illegal blackmarkets as well depending on the source of the data.
In a report released last month, privacy commissioners said people who downloaded the Tim Hortons app had their movements tracked and recorded every few minutes — even when the app was not open on their phones.
Would they not have had to give access to location services for this to happen though? Google is very good at giving me a "only while using this app" option for this kind of stuff now.
They surely agreed to it, the mixup is that people in general don't realize how much data and how often Tim Horton's wants to collect it.
Tim Horton's should probably just know which Tim Horton's you're closest to when you go to place an order, and that's about it. There's no reason they should even be allowed to ask to track you all day every day, even if you agree.
The biggest problem i have with my data being collected, analyzed and used is in the fact that it will almost certainly be used to teach a ML model about how to better manipulate with people like me - the people that are privacy conscious and are trying as much as possible to reduce their fingerprint.
That data is invaluable, and if there does exist a way how to target even people like that, which there probably does since we're only humans after all, the ML model will eventually figure it out. And they have literally billions of people to experiment and learn on.
Now, we already know from a few leaked studies made by Facebook that they cab already pretty well manipulate people into mostly whatever they choose. Take a hypothetical situation where you get a crazy out-of-touch billionaire, who decides to buy a large social network company, and then decides "Hey, I really want this candidate to win. Tune up the algorithms!".
And the ML models will get a clear goal, that has been already proven to just work pretty well at influencing user behavior. And any data you give them, it helps the model to fine tune into influencing people like you . Which would also be really hard to prove, because ML models are by definition black boxes that are really hard to reverse engineer, and proving that it was trained to do this is AFAIK almost impossible.
I don't want no part in that. Thankfully, all the large social networks have CEOs that are reasonable and would never try something like that, right?
And one more thing - you may not think that data about your behavior are of interest to anyone right now. But look at China and their Social Credit. And imagine how would have I.e holocaust turned out, if the government had access to all the data, opinions and profiles of people that are being collected now.
Oh, you mentioned you sympathize with the Jews three years ago in a private message? Well, let's hope the country you live in never ends up in a situation where that could be a huge problem for you or your family.
So, every time any site is offering a "personalized, curated list" for you (I.e the google search result, or YouTube recommended videos), assume you are potentionally being manipulated, and avoid the site altogether- because there's no other way how to prevent it. The ML model knows that you know, and is already trying to figure out how to manipulate people that are taking care not to be. And if there is a way, it will figure it out with some success.
The potential future authoritarian government has been my primary concern when it comes to data collection and profiling by corporations like Google and Meta for years. The governments don't even have to build their information gathering networks, although they still will, but so much of the surveillance has been done for them, goes back years (literally an entire lifetime for many people now), and is just a request away. I can't judge how the climate will be in two years, let alone a decade or two from now, but that information isn't going anywhere.
One of the projects I have in mind is to explore some kind of "offensive privacy", where the focus would not be on not being trackable, but on your computer spewing random bullshit and behavior into the algorithm to confuse it, and have it learning on behavior that's not really true, but only generated. This will enable you to kind of fight back and if done by enough users even reduce the effectivness of ML algorithms, since they would be learning bullshit. Unfortunately, the scale required to effectively affect the learning process of ML models would be enormous, so it's not really feasible, but I think it's still better than just "staying hidden".
With the advances in AI, creating a tool like that, that would simulate several random user behaviors on your IP/fingerprint, shouldn't really be that hard.
And as an added bonus - if it clicks on adverts, it's costing someone money. Fuck corporations.
I find the motion sensing and gps tracking to be the creepiest. Using motion sensing they can know when you put your phone down and pick it up, if it was screen down or face up, and knows when you are walking, running, driving, etc. Combined with GPS it can be used to pretty accurately judge when you wake up, where you go, and how you get there. Lots of apps also don't "close" when you swipe it away, they continue running in the background, so if you have the setting "only collect data when using the app" it will still collect data until you close it in the background or force stop it.
It is even worse than that. Given the list of data you have provided it is actually possible to discern general activity. You can determine if you are playing video games, working out, watching TV, out on a date, hanging with friends. As long as your phone is in your possession, the patterns for every behavior have a distinct fingerprint for each person. With enough collection, they can be filtered and categorized.
I also read about how they can correlate data between users and devices, too; maybe you don't have location on, but your app can correlate accelerometer data from your device with matching data from the same time from another device on the bus that does have location on. Boom, now they know you ride that bus. Or: everyone connecting from a particular IP address visits a particular restaurant's menu site from a QR code. Pretty good chance, then, that that IP address is the restaurant's wifi. Now they can correlate all that data and find out who your friend group is. Even something as simple as knowing that you were near your friends for an extended period of time while they were in an Uber to a venue before a show can help them build a profile about you and your cohort's interests and behaviors.
That's true, although I believe you still have to give permission to an app to use this (at least on Android). Not to say that people won't accept things way too fast.
To your last point, yes. The average user doesn't even glance at the permissions before blindly accepting them. It is also true that an alarmingly high number of users/consumers /don't care/ about basic privacy concerns that affect things like targeted ads, PII, and information that could be used to affect things like credit score.
Along with what others said, things you are interested in, demographic data, etc. The content you choose to watch on tiktok or products you click on on temu reveals a lot of valuable information about what ads might be most effective on you so they can target ads to you.
The more worrisome of these would be all your contacts, your location (even with Location permissions denied it can still be extrapolated up to a point if allowed to access to information on "WiFi networks nearby") - which can be used to derive workplace, living place, hobbies and, when crossed with other people's data, even who you regularly meet with - call history, files in your phone (such as personal photos and stuff you downloaded), sites visited and, even more seriously, actually record what's being said around your phone and even image as well as track something as intimate as how your phone (and hence you, if its in a pocket) move and when.
All of this is beyond the whole tracking of app usage (what do you do, see and for how long in it) which at least makes some sense to track for quality improvement.
That said, what makes it a problem is not that the app can get that infomation from the phone's systems but that it can, without your authorization, send it all to a central server - if it couldn't do the latter all that data capture for processing inside your phone would be absolutelly fine.
simple, valid personal information can be valuable in aggregate. it is accumulated and sold to ad companies.
these apps are often given permission to look through your phone and report back other data.. more than 'simple'. browsing/shopping history at best, account creds at worst.. its mostly for the same reason; advertising.
Everything. Basically, if it's not nailed down, they want to take it.
The short list of most common data taken would be app usage stats, not necessarily just for the app in question (eg, tiktok may pull data on how many hours of screen time other apps get, like YouTube or Instagram or literally anything else), GPS info, data about how often you handle your phone (from accelerometer readings), wifi networks including the bssid (mac address) of your router, which cannot be easily changed or masked, sometimes even data from your mic when you're not using the phone at all.
They know when you're sleeping, they know when you're awake, they know when you've been bad or good.... Oh wait, that last bit is Santa... Isn't it?
Anyways, I wouldn't be surprised if a few are bold enough to upload your pictures regardless of if you are posting the images, your browser history, security, device make/model, storage of your device, the list of files in storage, text messages...
Basically, anything that might help them identify you, what you do, where you work, when you work, how you travel, whether you're in a relationship, how happy you are in that relationship and how long it has been going on... Anything that might lead them to provide more targeted ads. Been in a relationship for a while and you seem happy? Check out these engagement rings. Already married? Here's some ads about parent stuff. Even something as simple as, hey, you're single and it's February, why not try Tinder or Grindr, or (insert app for your preference here).
They want to know everything there is to know so they can get you to buy more crap you probably don't need, for more than it's worth, and keep that economic gravy train rolling.
Also to add to bssid, it is possible (in the majority of cases) to get the exact (and i do mean exact) geolocation of the router whose bssid you have. See geomac by drygdryg on Github.
Fun story: I purchased several wireless access points from an eBay seller, years back, and when I brought them online, our geolocation services on all our phones thought we were several hundred miles away from where we lived for many months. I assume the bssid data was feeding the incongruency.
After a few months, however, whatever database was feeding our devices with bad geolocation data, was updated, and we were once again "located" in the correct spot.
The accuracy of these systems is incredible, it will actually use, not only your own bssid, but also that of complete strangers to try to figure out where you are without turning on GPS. If your personal bssid is weak but your neighbors bssid is stronger, it will adjust your position based on the relative signal strength of each bssid that is detected. In the same way triangulation works with most radio signals.
I've seen such systems estimate, with a fair amount of accuracy, client location data on a floorplan where there are a few dozen access points in the space.... So it works both ways. In that case I was part of a team at a job where the client had a couple thousand square feet of floor space, and about 12-15 access points to blanket the space in coverage. We could, with some degree of accuracy, follow the location of someone as they moved through the space; knowing where they spent most of their time, and what services in the space were utilized by the guest.
The basic idea is that you build a dossier on everyone. You discover what kind of food they eat, where they live, The size and makeup of their family, their sexual preferences, pregnancies, what kind of porn they watch, where they shop for groceries, where they shop for electronics. You tie together purchases with their credit card to purchases in other apps or even brick and mortar stores. You figure out where they owe money with their education looks like. You look at these things even altogether at some point in your life and go why the hell do I care.
Then 20 years down the road when Chinese companies start pushing out American banks all of a sudden you can't get a loan for a house or a car . Or maybe you're going for a job at some point in this data is leaked back out now it's part of your indelible history.
Perhaps somebody takes it all and throws it into a large language model, All of a sudden they've got clarity into your post history on all social media even stuff you thought was private because they know your phone serial number or your home IP address.
Corporations and governments don't have any business knowing about your private life. They shouldn't get to make decisions based on your private choices and preferences.
When most people talk about companies 'stealing' their data, it's just companies doing what they explicitly stated in the terms and conditions and these people agreed to.
The whole Google incognito mode drama right now is a great example of this. It literally always said 'incognito will not prevent employers, websites you visit, or your ISP from collecting data' when opening a incognito tab. So yeah, obviously Google also knows what you are looking up and they never implied otherwise at all.
Edit: A lot of down votes, but no one ever clarified how and when exactly it was that Google was misleading. And if there actually is anyone who was legit surprised by this whole thing, can you please explain to me what you thought incognito mode did exactly?... And if there isn't anyone who was surprised, as seems the case so far....that's sort of my point.
they explicitly stated in the terms and conditions and these people agreed to
Unwieldy TOS' have already been found to not be enough because no reasonable person reads all of it. It also doesn't answer OPs questions
The whole Google incognito mode drama right now is a great example of this. It literally always said 'incognito will not prevent employers, websites you visit, or your ISP from collecting data' when opening a incognito tab. So yeah, obviously Google also knows what you are looking up and they never implied otherwise at all.
That's not what the lawsuit is about, and even if that was the point, which one of "employers, websites you visit, or your ISP" is Google/the browser?
And yet I somehow knew Google was collecting my personal info because it was obvious. That's the entire point of the company lol.
When someone searches 'big donkey dicks' in the url bar .. where exactly did they think the browser was pulling those results from? Could it be a website... called Google?
It did exactly what it was described as doing it, which is basically no cookies and no user history (for the user or other users of their computer to see). The TV commercials about buying presents for loved ones never implied anything more.