GPT

Precursor

  1. Proximal Policy Optimizationarrow-up-right (PPO) - an RL algorithm, PPO is better than state-of-the-art approaches while being much simpler to implement and tune and is the default reinforcement learning algorithm at OpenAI.

  2. Learning from human preferencearrow-up-right (human in the loop) - a method used to infer what humans want by being told which of two proposed behaviors is better.

  3. instructGPTarrow-up-right - arguably better at following user intentions than GPT-3 while also making them more truthful and less toxic, using human in the loop.

Articles

  1. what is chatGPT doing and why does it work?arrow-up-right explaining next word prediction in detail.

  2. Is DPO Superior to PPO for LLM Alignmentarrow-up-right? A Comprehensive Study - "PPO is able to surpass other alignment methods in all cases and achieve state-of-the-art results in challenging code competitions."

Competitions

Tools

Virtual assistants

  1. flowGPTarrow-up-right - has many bots, prompts.

Last updated

Was this helpful?