๐Ÿ“’
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page

Was this helpful?

  1. Experimental Design

A/B Testing

PreviousDOE ToolsNextMulti Armed Bandits

Last updated 2 years ago

Was this helpful?

  1. (highly recommended - buy) by Ron, Diane, Ya

  2. (really good) - dynamic fields

  3. CUPED- CUPED is an acronym for Controlled experiments Using Pre-Experiment Data [1]. It is a method that aims to estimate treatment effects in A/B tests more accurately than simple differences in means. As reviewed in the previous section, we traditionally use the observed difference between the sample means" by , , ,

  4. (youtube)

  5. (Paper) - "Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale. To understand the top practical challenges in running OCEs at scale and encourage further academic and industrial exploration, representatives with experience in large-scale experimentation from thirteen different organizations (Airbnb, Amazon, Booking.com, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, Yandex, and Stanford University) were invited to the first Practical Online Controlled Experiments Summit. All thirteen organizations sent representatives. Together these organizations have tested more than one hundred thousand experiment treatments last year. Thirty-four experts from these organizations participated in the summit in Sunnyvale, CA, USA on December 13-14, 2018. While there are papers from individual organizations on some of the challenges and pitfalls in running OCEs at scale, this is the first paper to provide the top challenges faced across the industry for running OCEs at scale and some common solutions"

  6. (book) - has a/b testing

  7. - A/B Testing is the gold standard to estimate the causal relationship between a change in a product and its impact on key outcome measures. It is widely used in the industry to test changes ranging from simple copy change or UI change to more complex changes like using machine learning models to personalize user experience. The key aspect of A/B testing is evaluation of experiment results. Designing the right set of metrics - correct outcome measures, data quality indicators, guardrails that prevent harm to business, and a comprehensive set of supporting metrics to understand the โ€œwhyโ€ behind the key movements is the #1 challenge practitioners face when trying to scale their experimentation program. On the technical side, improving sensitivity of experiment metrics is a hard problem and an active research area, with large practical implications as more and more small and medium size businesses are trying to adopt A/B testing and suffer from insufficient power. In this tutorial we will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions.

    Slides:

  8. - In this article, we share details about our teamโ€™s journey to bring the statistical method known as CUPED to Etsy, and how it is now helping other teams make more informed product decisions, as well as shorten the duration of their experiments by up to 20%. We offer some perspectives on what makes such a method possible, what it took us to implement it at scale, and what lessons we have learned along the way.

  9. (Microsoft)

  10. (good) - Microsoftโ€™s variance reduction thatโ€™s becoming industry standard. "Controlled-experiment Using Pre-Existing Data (CUPED) is a variance reduction technique created by . Since then, it has been implemented at , , , and many others.

    In short, CUPED uses pre-experiment data to control for natural variation in an experimentโ€™s north star metric. Be removing natural variation, we can run statistical tests that require a smaller sample size. CUPED can be added to virtually any A/B testing framework; itโ€™s computationally efficient and fairly straightforward to code."

  11. (Netflix)

  12. (Booking) ,

  13. (BBC) - This article discusses how the Experimentation team have been accounting for pre-experiment variance in order to increase the statistical power of their experiments

  14. (TripAdvisor) (good)

  15. (medium)

  16. Hubspot

  17. Vidhya on

  18. (Yandex) -

A/B Testing Tools

,

trustworthy controlled experiments
A comprehensive A/B testing course
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data
Ron Kohavi
Alex Deng
Ya Xu
Toby Walker
(unread) a/b testing
Bayesian a/b testing
Lukas Vermeer: Democratizing Online Controlled Experiments at Booking.com - ELITE CAMP 2018
Top Challenges from the first Practical Online Controlled Experiments Summit
modern epidemiology
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
Part I. Introduction
Part II. Best Practices
Increasing experimentation accuracy and speed by using control variates
Why Tenant-Randomized A/B Test is Challenging and Tenant-Pairing May Not Work
How to Double A/B Testing Speed with CUPED
Microsoft in 2013
Netflix
Booking.com
BBC
Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix
How Booking.com increases the power of online experiments with CUPED
comment
Increasing experimental power with variance reduction at the BBC
Reducing A/B test measurement variance by 30%+
practitioner guide to
Howto Do A/B Testing: 15 Steps for the Perfect Split Test
Sample size calculation
statistical tests
A/B Testing Measurement Frameworks โ€Š- โ€ŠEvery Data Scientist Should Know
A/B Testing for Data Science using Python โ€“ A Must-Read Guide for Data Scientists
Online Evaluation for Effective Web Service Development
R-shiny Tool
Wasabi open source platform
Git
Optimizely