Search
Top-k sampling
Last updated
Mar 15, 2023
Edit Source
- Problem: Vanilla sampling makes every token in the vocabulary an option
- Even if most of the probability mass in the distribution is over a limited set of options, the tail of the distribution could be very long
- Many tokens are probably irrelevant in the current context
- Why are we giving them individually a tiny chance to be selected?
- Why are we giving them as a group a high chance to be selected?
- Solution: Top-k sampling
- Only sample from the top k tokens in the probability distribution.
- Increase k for more diverse/risky outputs
- Decrease k for more generic/safe outputs

- Top-k sampling can cut off too quickly!
- ๊ฐ ํ ํฐ์ด ๋น์ทํ ํ๋ฅ ๋ถํฌ๋ฅผ ๊ฐ์ง๋ฉด ๋น ๋ฅด๊ฒ cut-off ํ๋ค.
- Top-k sampling can also cut off too slowly!
- ๊ทธ๋ฌ๋, ์์์ ํ ํฐ์ด ๋์ ํ๋ฅ ์ ๊ฐ๊ณ ๋๋จธ์ง๊ฐ ์์ฃผ ๋ฎ์ ํ๋ฅ ์ ๊ฐ๋๋ค๋ฉด, ์๋นํ ๋๋ฆฌ๊ฒ cut-off ํ ๊ฒ์ด๋ค.