Let’s Read: Transformer Models, Part 3

This is the final part of my series reading “Attention is all you need”, the foundational paper that invented the Transformer model, used in large language models (LLMs). In the first part, we covered some background, and in the second part we reviewed the architecture of the Transformer model. In this part, we’ll discuss the authors’ arguments in favor of Transformer models.

Why Transformer models?

The authors argue in favor of Transformers in section 4 by comparing them to previously extant options, namely recurrent neural networks (RNNs) and convolutional neural networks (CNNs).

[Read more…]

Let’s Read: Transformer Models, Part 2

This article is a continuation of my series reading “Attention is all you need”, the foundational paper that invented the Transformer model, which is used in large language models (LLMs).

In the first part, I covered general background. This part will discuss Transformer model architecture, basically section 3 of the paper. I aim to make this understandable to non-technical audiences, but this is easily the most difficult section. Feel free to ask for clarifications, and see the TL;DRs for the essential facts.

The encoder and decoder architecture

The first figure of the paper shows the architecture of their Transformer model:

Figure 1 from “Attention is all you need”

[Read more…]

Let’s Read: Transformer Models, Part 1

Large Language Models (LLMs) are a hot topic today, but few people know even the basics of how they work. I work in data science, but I also didn’t really know how they work. In this series, I’d like to go through the foundational paper that defined the Transformer model on which LLMs are based.

“Attention is all you need” by Ashish Vaswani et al. from the Proceedings of the 31st International Conference on Neural Information Processing Systems, December 2017. https://dl.acm.org/doi/10.5555/3295222.3295349 (publicly accessible)

This series aims to be understandable to a non-technical audience, but will discuss at least some of the technical details. If the technical parts are too difficult, please ask for clarification in the comments. You’re also welcome to just read the TL;DR parts, which should contain the essential points.

[Read more…]

Origami: Stars

Five- and Six-pointed Stars. Designer unknown.

Back in 2019, we had a small wedding celebration–we didn’t actually hold a wedding reception, and that’s a story that I’ve already told. As decorations for the celebration, I made a dozen giant paper cranes (actually Tsuru Roses) from wrapping paper, and you can see a photo of those at the bottom of my story. I also made 50 origami stars from foil paper and holographic paper, seen above.

We’ve officially reached our 5th anniversary! I am not inclined to be sentimental, but I am grateful for how incredibly fortunate we are.

A Trivial Knot

Everything is simple except when not

The rise of the manosphere

A New Thing: 1

I think I just sold a house

Using AI to Write

Why I taught my son about Santa Claus

Popularity is (almost) scale-free

General Biochemical Patterns in Purine Biosynthesis and Protein Relatives. Part 2.

Another Doctor's Perspective About a Squashed Tick

Neither Elon Musk, nor Space Force Marines, nor a food bank protester could derail Clow UFO Base’s 2024 Holiday Concert (Fiction)

Let’s Read: Transformer Models, Part 3

Let’s Read: Transformer Models, Part 2

Let’s Read: Transformer Models, Part 1

Origami: Stars