Attention is All You Need

Abstract

In today’s episode we are going to talk about Transformers. I’m pretty sure you’ve heard of them at least once recently since they have become extremely popular in the world of artificial intelligence. They are among the newest and most powerful classes of models invented to date. But I’m not that confident that everyone knows the intuition behind them and how they work; that’s what this episode is for.

Basically, a transformer is a deep learning model capable of learning context and thus meaning by tracking relationships in sequential data. These models use a collection of mathematical techniques, called attention or self-attention, to detect subtle correlations between data elements in a series.

Transformers were first introduced in 2017, by a team at Google Brain, in the famous paper ‘Attention is All you Need’, from which we also took inspiration for the title of this episode.

Initially, the architecture was proposed to solve problems in the field of natural language processing. However, since its debut, the transformer model has evolved and branched out into many different variants, expanding beyond language tasks into other areas (including, for instance, computer vision). Researchers are still exploring ways to improve transformers and use them in new applications.

Date
Jul 28, 2022 3:00 PM — 4:00 PM
Location
Twitch, Sparkd.AI
Federica Baldi
Federica Baldi
Computer Engineer with a major in Artificial Intelligence and Data Engineering

My research interests include Computer Vision, NLP, and Artificial Intelligence for Healthcare and Society.