In this day and age, leaked emails have become one of the means by which information is released about the machinations of government. ProPublica was recently the recipient of emails that revealed unflattering information about Donald Trump’s personal lawyer Marc Kasowitz, The publication of these emails set in motion a sequence of events in which Kasowitz let loose a tirade against an ordinary citizen. Since then, Kasowitz has become one of the many people who has been cast out through the revolving door that characterizes the Trump administration.
But if a news organization gets a set of emails from a source, how do they go about making sure that the emails are genuine? Jeremy B. Merrill explains how the ProPublica team sets about validating any emails that they receive and how they applied them in the Kasowitz case. These techniques go by the acronyms DKIM (Domain Keys Identified Mail) and ARC (Authenticated Received Chain) that form part of the message headers.
You can use them to authenticate emails that come in over the transom. It takes a tiny bit of command-line work and maybe a little coaxing of your source, but it can offer you a mathematical guarantee that the email you have on your screen is identical to the one that the source received, with no possibility of intermediate tampering.
…The obscure header we’re interested in is called the DKIM Signature. It’s kind of like the shipper’s packing list. The DKIM Signature field contains two things: First, a set of instructions for making a summary of the email, mushing up some of the headers and the message itself, and, second, a version of that summary — technically, a “hash” — that’s cryptographically signed by the sending server.
It’s meant to give the receiving server the ability to see if the contents of the email changed in transit, the digital equivalent of detecting whether the mailman steamed open the envelope and modified the contents of a letter. We can put it to good use as journalists by creating our own version of the hash and then decrypting the one made by the sending server. If the hash we create from those instructions matches the decrypted one from the message exactly, we have mathematical proof that our email is the same as the one that was sent/received.
…ARC is similar to DKIM, but instead of being used by the sending server, it’s used by intermediaries in the email process, like listservs or servers that receive email. Many emails that arrive into Gmail are signed by Google, but this is a new development — the ARC protocol isn’t even formally approved yet.
I am nowhere close to being an expert on online cryptography and just pass on this information to those who are interested and have the expertise.
Marcus Ranum says
In various places where I was referring to missing/”lost” emails, I’ve talked about ‘other information’ that has to line up with the message to validate it’s real: this is the sort of thing I was talking about. Also, some mail servers log things like character-count and line count, message size (always), delivery status, queue-IDs, etc. Some of that winds up in message headers and some of it doesn’t. Some of it winds up on different servers. T o validate a message it all has to line up perfectly.
There have been times in the past when I’ve had to call someone who runs a server and ask them “please ‘grep’ you syslogs for the following string -- you don’t need to tell me anything other than if you did see it on such-and-such date.” That string being the message queue-ID which gets stamped into the headers as a message passes through a system. One old-school indicator if fake mail is when a message that should have taken 1 or 2 hops, shows 3 or 4 -- if the servers don’t have that transaction queue-ID in their logs it means that someone injected the mail with some fake headers, into one of the unnecessary hops.
It’s super fun when you catch someone who’s trying tradecraft -- they faked up a message and even injected it into a credible ingestion-point -- but they forgot something, like that the ingestion-point is an Exchange server and uses Microsoft-style Received: lines and the message shows UNIX-style log lines at that stage of the transaction. Then you have an absolutely unexplainable anomaly: there’s no way that “just happened.” There’s no way someone can say “I dunno how that got there!”
DKIM and ARC are newer stuff and not every server does it. Of course, that a server doesn’t is also a signature: if you got an email from me and it claimed to be from a server that did DKIM, it’s not from me (unless my hosting service has upgraded my mail server without telling me) Back in the day, some of us used to deliberately inject signatures into our mail, but nowadays that stuff is mostly a lost art. Back when I was setting the email server up for whitehouse.gov there were a couple times I got calls from Washington asking me to explain to (certain people) that they had not gotten an invitation to lunch with Bill Clinton; it was very useful to be able to tell them “all legitimate outbound emails from that system -- and there are very very few of them -- have queue-IDs in the form sma-XXXXXXXX” and the queue-ID is anything else, it’s a fake.”