🪴 Hayul's digital garden

Search

Search IconIcon to open search

Neural Language Generation

Last updated Mar 14, 2023 Edit Source

What is natural language generation?
Any task involving text production for human consumption requires natural language generation.

# Formalizing NLG: a simple model and training algorithm

# Decoding from NLP models

Note

Softmax temperature is not a decoding algorithm! It’s a technique you can apply at test time, in conjunction with a decoding algorithm (such as beam search or sampling)

Decoding: Takeaways

  • Decoding is still a challenging problem in natural language generation
  • Human language distribution is noisy and doesn’t reflect simple properties (i.e., probability maximization)
  • Different decoding algorithms can allow us to inject biases that encourage different properties of coherent natural language generation
  • Some of the most impactful advances in NLG of the last few years have come from simple, but effective, modifications to decoding algorithms

# Training NLG models

Training: Takeaways

  • Teacher forcing is still the premier algorithm for training text generation models
  • Diversity is an issue with sequences generated from teacher forced models
    • New approaches focus on mitigating the effects of common words
  • Exposure bias causes text generation models to lose coherence easily
    • Models must learn to recover from their own bad samples (e.g., scheduled sampling, DAgger)
    • Or not be allowed to generate bad text to begin with (e.g., retrieval + generation)
  • Training with RL can allow models to learn behaviors that are challenging to formalize
    • Learning can be very unstable.

Evaluating NLG systems

Note

Don’t compare human evaluation scores across differently-conducted studies. Even if they claim to evaluate the same dimensions!

Evaluation: Takeaways

  • Content overlap metrics provide a good starting point for evaluating the quality of generated text, but they’re not good enough on their own.
  • Model-based metrics are can be more correlated with human judgment, but behavior is not interpretable.
  • Human judgments are critical.
    • Only ones that can directly evaluate factuality – is the model saying correct things?
    • But humans are inconsistent!
  • In many cases, the best judge of output quality is YOU!