Entries by Yasuto Tamura

Stop saying “trial and errors” for now: seeing reinforcement learning through some spectrums

Some of you might have got away with explaining reinforcement learning (RL) only by saying an obscure thing like “RL enables computers to learn through trial and errors.” But if you have patiently read my articles so far, you might have come to say “RL is a family of algorithms which simulate procedures similar to dynamic programming (DP).”

Four essential ideas for making reinforcement learning and dynamic programming more effective

This is the third article of the series My elaborate study notes on reinforcement learning. 1, Some excuses for writing another article on the same topic In the last article I explained policy iteration and value iteration of dynamic programming (DP) because DP is the foundation of reinforcement learning (RL). And in fact this article […]

Graphical understanding of dynamic programming and the Bellman equation: taking a typical approach at first

This is the second article of the series My elaborate study notes on reinforcement learning. *I must admit I could not fully explain how I tried visualizing ideas of Bellman equations in this article. I highly recommend you to also take brief at the second section of the third article. (A comment added on 13/3/2022) […]

Understanding the “simplicity” of reinforcement learning: comprehensive tips to take the trouble out of RL

This is the first article of my article series “My elaborate study notes on reinforcement learning.” *I adjusted mathematical notations in this article as close as possible to “Reinforcement Learning:An Introduction.”  This book by Sutton and Barto is said to be almost mandatory for those who studying reinforcement learning. Also I tried to avoid as […]

Rethinking linear algebra part two: ellipsoids in data science

*This is the fourth article of my article series “Illustrative introductions on dimension reduction.” 1 Our expedition of eigenvectors still continues This article is still going to be about eigenvectors and PCA, and this article still will not cover LDA (linear discriminant analysis). Hereby I would like you to have more organic links of the […]

How to make a toy English-German translator with multi-head attention heat maps: the overall architecture of Transformer

If you have been patient enough to read the former articles of this article series Instructions on Transformer for people outside NLP field, but with examples of NLP, you should have already learned a great deal of Transformer model, and I hope you gained a solid foundation of learning theoretical sides on this algorithm. This […]

Positional encoding, residual connections, padding masks: covering the rest of Transformer components

This is the fourth article of my article series named “Instructions on Transformer for people outside NLP field, but with examples of NLP.” 1 Wrapping points up so far This article series has already covered a great deal of the Transformer mechanism. Whether you have read my former articles or not, I bet you are […]

Multi-head attention mechanism: “queries”, “keys”, and “values,” over and over again

*A comment added on 04/05/2022: Thanks to a comment by Mr. Maier, I found a major mistake in my visualization. To be concrete, there is a mistake in expressing how to get each colored divided group of tokens by applying linear transformations. That corresponds to the section 3.2.2 in the paper “Attention Is All You […]