📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • BOX COX
  • MANN-WHITNEY U TEST
  • NULL HYPOTHESIS

Was this helpful?

  1. Foundation Knowledge

Distribution Transformation

PreviousDistributionNextNormalization & Scaling

Last updated 3 years ago

Was this helpful?

. Log, square root, box cox transformations

BOX COX

(What is the Box-Cox Power Transformation?)

  • a procedure to identify an appropriate exponent (Lambda = l) to use to transform data into a “normal shape.”

  • The Lambda value indicates the power to which all data should be raised.

  • Many statistical tests and intervals are based on the assumption of normality.

  • The assumption of normality often leads to tests that are simple, mathematically tractable, and powerful compared to tests that do not make the normality assumption.

  • Unfortunately, many real data sets are in fact not approximately normal.

  • However, an appropriate transformation of a data set can often yield a data set that does follow approximately a normal distribution.

IMPORTANT:!! After a transformation (c), we need to measure the normality of the resulting transformation (d) .

  • The correlation is computed between the vertical and horizontal axis variables of the probability plot and is a convenient measure of the linearity of the probability plot

  • In other words: the more linear the probability plot, the better a normal distribution fits the data!

GUARANTEED NORMALITY?

  • NO!

  • This is because it actually does not really check for normality;

  • the method checks for the smallest standard deviation.

  • The assumption is that among all transformations with Lambda values between -5 and +5, transformed data has the highest likelihood – but not a guarantee – to be normally distributed when standard deviation is the smallest.

  • it is absolutely necessary to always check the transformed data for normality using a probability plot. (d)

+ Additionally, the Box-Cox Power transformation only works if all the data is positive and greater than 0.

+ achieved easily by adding a constant ‘c’ to all data such that it all becomes positive before it is transformed. The transformation equation is then:

MANN-WHITNEY U TEST

In other words: This test can be used to determine whether two independent samples were selected from populations having the same distribution.

NULL HYPOTHESIS

  1. Analytics vidhya

      1. if the means of two or more groups are significantly different from each other. ANOVA checks the impact of one or more factors by comparing the means of different samples.

      2. A one-way ANOVA tells us that at least two groups are different from each other. But it won’t tell us which groups are different.

      3. For such cases, when the outcome or dependent variable (in our case the test scores) is affected by two independent variables/factors we use a slightly modified technique called two-way ANOVA.

  2. multivariate case and the technique we will use to solve it is known as MANOVA.

This increases the applicability and usefulness of statistical techniques based on the normality assumption.

One measure is to compute the correlation coefficient of a => (d).

Finally: An awesome in python with , there is also another code example “Simply pass a 1-D array into the function and it will return the Box-Cox transformed array and the optimal value for lambda. You can also specify a number, alpha, which calculates the confidence interval for that value. (For example, alpha = 0.05 gives the 95% confidence interval).”

* Maybe there is a slight problem in the python vs R code, , but needs investigating.

() - the Mann–Whitney U test is a of the that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample.

Unlike the it does not require the assumption of . It is nearly as efficient as the t-test on normal distributions.

- always good

, one way, two way, manova

The Box-Cox transformation is a useful family of transformations.
normal probability plot
*NOTE: another useful link that explains it with figures, but i did not read it.
COMMON TRANSFORMATION FORMULAS (based on the actual formula)
tutorial (dead),
here is a new one
code examples
here
details here
what is?
nonparametric
test
null hypothesis
t-test
normal distributions
What is chi-square and what is a null hypothesis, and how do we calculate observed vs expected and check if we can reject the null and get significant difference.
What is hypothesis testing
Intro to t-tests analytics vidhya
Anova analysis of variance
Top 3 methods for handling skewed data
Power transformations