📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • TYPES
  • Gaussian \ Normal Distribution
  • COMPARING DISTRIBUTIONS

Was this helpful?

  1. Foundation Knowledge

Distribution

PreviousMulti Label ClassificationNextDistribution Transformation

Last updated 3 years ago

Was this helpful?

TYPES

(What are?) probabilities in a distribution always add up to 1.

Comparing distributions:

Gaussian \ Normal Distribution

  1. It is the most common distribution in nature (as distributions go)

  2. An enormous number of statistical relationships become clear and tractable if one assumes the normal.

Sure, nothing in real life exactly matches the Normal. But it is uncanny how many things come close.

this is partly due to the Central Limit Theorem, which says that if you average enough unrelated things, you eventually get the Normal.

  • the Normal distribution in statistics is a special world in which the math is straightforward and all the parts fit together in a way that is easy to understand and interpret.

  • It may not exactly match the real world, but it is close enough that this one simplifying assumption allows you to predict lots of things, and the predictions are often pretty reasonable.

  • statistically convenient.

  • represented by basic statistics

    • average

    • variance (or standard deviation) - the average of what's left when you take away the average, but to the power of 2.

In a statistical test, you need the data to be normal to guarantee that your p-values are accurate with your given sample size.

If the data are not normal, your sample size may or may not be adequate, and it may be difficult for you to know which is true.

COMPARING DISTRIBUTIONS

  1. Also check KL DIVERGENCE in the information theory section.

(figure2)

Categorical data can be transformed to a histogram i.e., #class / total and then measured for distance between two histograms’, e.g., train and production. Using earth mover distance , linear programming, so its slow.

.

et al, transfer objective for learning to disentangle casual mechanisms - We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional changes

A very wordy explanation
Poison and poison process
Kolmogorov smirnov not good for categoricals.
Comparing two
Khan academy
Visually
When they are not normal
Using train / test trick
Code for Identifying distribution type and params, based on best fit.
“ if you collect data and it is not normal, “you need to collect more data”
Beautiful graphs
The normal distribution is popular for two reasons:
python
git wrapper to c
Earth movers
EMD paper
Bengio
More distribution explanations
A very good explanation