GPT
Last updated
Was this helpful?
Last updated
Was this helpful?
(PPO) - an RL algorithm, PPO is better than state-of-the-art approaches while being much simpler to implement and tune and is the default reinforcement learning algorithm at OpenAI.
(human in the loop) - a method used to infer what humans want by being told which of two proposed behaviors is better.
- arguably better at following user intentions than GPT-3 while also making them more truthful and less toxic, using human in the loop.
explaining next word prediction in detail.
? A Comprehensive Study - "PPO is able to surpass other alignment methods in all cases and achieve state-of-the-art results in challenging code competitions."
GPT 4
Sentence Embeddings
- has many bots, prompts.