Teacher forcing
- Maximum Likelihood Training (i.e., teacher forcing)
- Trained to generate the next word $\color{blue}y_t^$ given a set of preceding words $\color{red}{y_{<t}^}$.
- λͺ¨λΈμ μΆλ ₯κ°μ΄ λ€μ νμ μ€ν
μ μ
λ ₯κ°μΌλ‘ λ€μ΄κ°λ λ°©μ.
$$L = -\sum_{t=1}^4 logP({\color{blue}y_t^} | {\color{red}{y_{<t}^}})$$