As a data scientist, the number one question I hear from friends is “How did the algorithm get that so wrong?” People don’t know it, but that’s a data science question.
For example, Facebook apparently thinks I’m trans, so they keep on advertising HRT to me. How did they get that one wrong? Surely Facebook knows I haven’t changed pronouns in my entire time on the platform.
I really don’t know why the algorithm got it wrong in any particular case, but it’s really not remotely surprising. For my job, I build algorithms like that (not for social media specifically, but it’s the general idea), and as part of the process I directly measure how often the algorithm is wrong. Some of the algorithms I have created are wrong 99.8% of the time, and I sure put a lot of work into making that number a tiny bit lower. It’s a fantastically rare case where we can build an algorithm that’s just right all the time.
If you think about it from Facebook’s perspective, their goal probably isn’t to show ads that understand you on some personal level, but to show ads that you’ll actually click on. How many ads does the typical person see, vs the number they click on? Suppose I never click on any ads. Then the HRT ads might be a miss, but then so is every other ad that Facebook shows me, so the algorithm hasn’t actually lost much by giving it a shot.
So data science algorithms are quite frequently wrong simply as a matter of course. But why? Why can’t the algorithm see something that would be so obvious to any human reviewer?