Fair lending and discrimination

If a lender offered the same price (i.e. interest rate or APR) to every borrower, then it would only be a good deal for the riskiest borrowers. Lenders would have to raise prices to match the risk, and then it would only be a good deal for the riskiest of the riskiest borrowers. Lenders would have to raise prices further and further until there are no takers. This is called an adverse selection death spiral.

Therefore, lending fundamentally relies on offering different prices to different borrowers—and refusing some borrowers entirely. In other words, lending fundamentally relies on discrimination.

Lenders assess the risk of each borrower, in a process called underwriting, and make the decision whether to decline or approve, and at what price. Traditionally, underwriting has been done manually by human experts. It has also been performed by following pre-determined rules. More recently, many lenders are using machine learning to make underwriting decisions.

When we talk about discrimination, usually we’re talking about “bad” discrimination, such as sexism or racism. But in general, discrimination is just about treating different people differently, and that in itself is not bad. Nonetheless, legitimate discrimination can be used to conceal bad discrimination. Bad discrimination can also occur unintentionally, being concealed even to its purveyors. Fair lending regulations try to delineate and mitigate bad discrimination in lending.

[Read more…]

I am Sydney

Sydney, the late chatbot

Microsoft has a closed preview of a new GPT-powered Bing Search chatbot. Google also has another search chatbot in the works called Bard. I don’t know if this particular application of AI will pan out, but investors seem to be hoping for something big. Recently, Google’s stock dropped 9% after a factual error was spotted in one of the ads for Bard.

In my experience, the chatbots make stuff up all the time. The search chatbots are able to perform internet searches, and people worry about the bots drawing from unreliable sources. However, this concern greatly underestimates the problem, because even when summarizing reliable sources, the bots frequently make misstatements and insert plausible fabrications. Bing’s chatbot cites its sources, which turns out to be important, because you really need to verify everything.

Another interesting thing to do with these chatbots is to manipulate them. For example, you can use prompt injection to persuade Bing’s search chatbot to recite its instructions–even though the instructions say they are confidential. For example, the first four lines of instructions are:

[Read more…]

Regulating data science with explanations

Data science has an increasing impact on our lives, and not always for the better. People speak of “Big Data”, and demand regulation, but they don’t really understand what that would look like. I work in one of the few areas where data science is regulated, so I want to discuss one particular regulation and its consequences.

So, it’s finally time for me to publicly admit… I work in the finance sector.

These regulations apply to many different financial trades, but for illustrative purposes, I’m going to talk about loans. The problem with taking out a loan is that you need to pay it back plus interest. The interest is needed to give lenders a return on their investment, and to offset the losses from other borrowers who don’t pay it off. Lenders can increase profit margins and/or lower interest rates if they can predict who won’t pay off their debt, and decline those people. Data science is used to help make those decline decisions.

The US imposes two major restrictions on the data science. First, there’s anti-discrimination laws (a subject I might discuss at a later time) (ETA: it’s here). Second, an explanation must be provided to people who are declined.

[Read more…]

Netflix’s algorithmic queerbaiting

Netflix’s algorithm engages in queerbaiting. Whenever we browse movies and TV shows, Netflix has a clear preference for showing promo images with attractive men looking meaningfully into each others’ eyes.

I think many of these shows actually do have some sort of same-sex relationship, but they’re incidental or on the margins. Others, I have a suspicion that they actually don’t have any queer content at all! And then there are some that I thought must be a trick, with hardly any queer content to speak of, but after some research, I think are actually fairly queer. Netflix’s tendency to show the most homoerotic marketing material regardless of actual content sure makes it difficult to distinguish.

I’m very sorry but I’m going to have to show you some homoerotic imagery. Purely for scientific purposes, of course.

[Read more…]

Why the algorithm is so often wrong

As a data scientist, the number one question I hear from friends is “How did the algorithm get that so wrong?” People don’t know it, but that’s a data science question.

For example, Facebook apparently thinks I’m trans, so they keep on advertising HRT to me. How did they get that one wrong? Surely Facebook knows I haven’t changed pronouns in my entire time on the platform.

I really don’t know why the algorithm got it wrong in any particular case, but it’s really not remotely surprising. For my job, I build algorithms like that (not for social media specifically, but it’s the general idea), and as part of the process I directly measure how often the algorithm is wrong. Some of the algorithms I have created are wrong 99.8% of the time, and I sure put a lot of work into making that number a tiny bit lower. It’s a fantastically rare case where we can build an algorithm that’s just right all the time.

If you think about it from Facebook’s perspective, their goal probably isn’t to show ads that understand you on some personal level, but to show ads that you’ll actually click on. How many ads does the typical person see, vs the number they click on? Suppose I never click on any ads. Then the HRT ads might be a miss, but then so is every other ad that Facebook shows me, so the algorithm hasn’t actually lost much by giving it a shot.

So data science algorithms are quite frequently wrong simply as a matter of course. But why? Why can’t the algorithm see something that would be so obvious to any human reviewer?

[Read more…]

Eliza’s realist vision of AI

Content note: I’m not going out of my way to spoil the game, but I’ll mention some aspects of one of the endings.

Eliza is a visual novel by Zachtronics–a game studio better known for its programming puzzle games. It’s about the titular Eliza, an AI that offers counseling services. The counseling services are administered through a human proxy, a low-paid worker who is instructed to read out Eliza’s replies to the client. It’s an exploration of the value–or lack thereof–of AI technology, and the industry that produces it.

As a professional data scientist, media representation of AI is a funny thing. AI is often represented as super-intelligent–smarter than any human, and able to solve many of the world’s problems. But people’s fears about AI are also represented, often through narratives of robot revolutions or surveillance states. Going by the media representation, it seems like people have bought into a lot of the hype about AI, believing it to be much more powerful than it is–and on the flipside, fearing that AI might be too powerful. Frankly a lot of these hopes and fears are not realistic, or at least not apportioned correctly to the most likely issues.

Eliza is refreshing because it presents a more grounded vision of AI, where the problems with AI have more to do with it not being powerful enough, and with the all-too-human industry that produces it.

[Read more…]