📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • Unbalanced labels
  • Label Propagation / Spreading
  • Label Noise

Was this helpful?

  1. Machine Learning

Label Algorithms

PreviousProcess MiningNextClustering Algorithms

Last updated 3 years ago

Was this helpful?

Unbalanced labels

  1. - is an open-source, MIT-licensed library that provides tools when dealing with classification with imbalanced classes.

  2. this article has a very nice trick in adding a reward component to the loss function in order to mitigate for unbalanced class label problem, instead of the usual balancing.

Label Propagation / Spreading

Note: very much related to weakly and semi supervision, i.e., we have small amounts of labels and we want to generalize the labels to other samples, see also weak supervision methods.

    1. Harmonic Function (HMN) [Zhu+, ICML03]

    2. Local and Global Consistency (LGC) [Zhou+, NIPS04]

    3. Partially Absorbing Random Walk (PARW) [Wu+, NIPS12]

    4. OMNI-Prop (OMNIProp) [Yamaguchi+, AAAI15]

    5. Confidence-Aware Modulated Label Propagation (CAMLP) [Yamaguchi+, SDM16]

Label Noise

    1. can be used for positive unlabeled learning

Step 1: graph using KNN, distance metric is minkowski with p=2, i.e. euclidean distance.

,

(propagation upgrade), Essentially a community graph algorithm, however it resembles KNN in its nature, using semi supervised data set (i.e., labeled and unlabeled data) to spread or propagate labels to unlabeled data, with small incrementations in the algorithm, using KNN-like methodology, each unlabeled sample will be given a label based on its 1st order friends, if there is a tie, a random label is chosen. Nodes are connected by using a euclidean distance.

between propagation and spreading is a laplacian matrix, vs normalized LM

Youtube , , ,

,

, , , , , 5,

,

Presentation ,

Neo4j , 2, 3,

- "cleanlab is the data-centric ML ops package for machine learning with noisy labels. cleanlab cleans labels and supports finding, quantifying, and learning with label errors in datasets. See datasets cleaned with cleanlab at . Check out the: . cleanlab is powered by confident learning, published in this | ."

Reference 1: by , ,

Reference 2:

- "Positive-unlabeled learning (aka PU-learning) is a machine learning scenario for binary classification where the training set consists of a set of positively-labeled examples and an additional unlabeled set that contains positive and negative examples in unknown proportions (so no training example is explicitly labeled as negative). Positive-unlabeled learning methods aim to incorporate the unique structure of this scenario into the learning process, in a way that improves generalization of the learned notion of the positive class, when compared to simply treating all unlabeled examples as negative examples, or alternatively discarding them and training a one-class classifier over only the positive samples."

, - "Positive and Unlabeled Materials Machine Learning (pumml) is a code that uses semi-supervised machine learning to classify materials from only positive and unlabeled examples."

build a laplacian
Step by step tutorial
part 2
Spreading
Difference
Laplacian matrix on youtube, videos 30-33
Really good example notebook
Spreading vs propagation
https://en.wikipedia.org/wiki/Label_Propagation_Algorithm
1
2
3
Medium
Sklearn
1
2
3
4
Git
incremental LP
Git2
1
2
1
clean lab
labelerrors.com
cleanlab code documentation
paper
blog
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks
Curtis G Northcutt
Anish Athalye
Jonas Mueller
Confident Learning: Estimating Uncertainty in Dataset Labels
PULearn
PUMML
Medium
imbalance learn
Classifying Job Titles With Noisy Labels Using REINFORCE
Imbalance Learn comparison