📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • Ensembles in WEKA
  • BAGGING - bootstrap aggregating
  • BOOSTING
  • XGBOOST
  • Gradient Boosting Classifier
  • CatBoost

Was this helpful?

  1. Machine Learning

Ensembles

PreviousRegressionNextReinforcement Learning

Last updated 2 years ago

Was this helpful?

  1. (good)

  2. Machine learning Mastery on

      1. Stacked Generalization Ensemble

      2. Multi-Class Classification Problem

      3. Multilayer Perceptron Model

      4. Train and Save Sub-Models

      5. Separate Stacking Model

      6. Integrated Stacking Model

      1. Plurality Voting.

      2. Majority Voting.

      3. Unanimous Voting.

      4. Weighted Voting.

      1. Voting Ensembles

      2. Weighted Average

      3. Blending Ensemble

      4. Super Learner Ensemble

    1. - Dynamic Ensemble Selection algorithms operate much like DCS algorithms, except predictions are made using votes from multiple classifier models instead of a single best model. In effect, each region of the input feature space is owned by a subset of models that perform best in that region.

      1. k-Nearest Neighbor Oracle (KNORA) With Scikit-Learn

        1. KNORA-Eliminate (KNORA-E)

        2. KNORA-Union (KNORA-U)

      2. Hyperparameter Tuning for KNORA

        1. Explore k in k-Nearest Neighbor

        2. Explore Algorithms for Classifier Pool

      1. Mixture of Experts

        1. Subtasks

        2. Expert Models

        3. Gating Model

        4. Pooling Method

      2. Relationship With Other Techniques

        1. Mixture of Experts and Decision Trees

        2. Mixture of Experts and Stacking

    2. - Weak learners are models that perform slightly better than random guessing. Strong learners are models that have arbitrarily good accuracy. Weak and strong learners are tools from computational learning theory and provide the basis for the development of the boosting class of ensemble methods.

  3. (samuel jefroykin)

    1. Basic Ensemble Techniques

    2. 2.1 Max Voting

    3. 2.2 Averaging

    4. 2.3 Weighted Average

    5. Advanced Ensemble Techniques

    6. 3.1 Stacking

    7. 3.2 Blending

    8. 3.3 Bagging

    9. 3.4 Boosting

    10. Algorithms based on Bagging and Boosting

    11. 4.1 Bagging meta-estimator

    12. 4.2 Random Forest

    13. 4.3 AdaBoost

    14. 4.4 GBM

    15. 4.5 XGB

    16. 4.6 Light GBM

    17. 4.7 CatBoost

- bagging (random sample selection, multi classifier training), random forest (random feature selection for each tree, multi tree training), boosting(creating stumps, each new stump tries to fix the previous error, at last combining results using new data, each model is assigned a skill weight and accounted for in the end), voting(majority vote, any set of algorithms within weka, results combined via mean or some other way), stacking(same as voting but combining predictions using a meta model is used).

BAGGING - bootstrap aggregating

Overfitting - not an issue with bagging, as the mean of the models actually averages or smoothes the “curves”. Even if all of them are overfitted.

BOOSTING

Adaboost: similar to bagging, create a system that chooses from samples that were modelled poorly before.

  1. create bag_1 with n’ features <n with replacement, create the model_1, test on ALL train.

  2. Create bag_2 with n’ features with replacement, but add a bias for selecting from the samples that were wrongly classified by the model_1. Create a model_2. Average results from model_1 and model_2. I.e., who was classified correctly or not.

  3. Create bag_3 with n’ features with replacement, but add a bias for selecting from the samples that were wrongly classified by the model_1+2. Create a model_3. Average results from model_1, 2 & 3 I.e., who was classified correctly or not. Iterate onward.

  4. Create bag_m with n’ features with replacement, but add a bias for selecting from the samples that were wrongly classified by the previous steps.

XGBOOST

  • Threads

  • Rounds

  • Tree height

  • Loss function

  • Error

  • Cross fold.

  • Here is an example configuration for multi-class classification:

  • weka.classifiers.mlr.MLRClassifier -learner “nrounds = 10, max_depth = 2, eta = 0.5, nthread = 2”

  • classif.xgboost -params "nrounds = 1000, max_depth = 4, eta = 0.05, nthread = 5, objective = \"multi:softprob\"

Copy: nrounds = 10, max_depth = 2, eta = 0.5, nthread = 2

#Random Forest™ - 1000 trees bst <- xgboost(data = train$data, label = train$label, max_depth = 4, num_parallel_tree = 1000, subsample = 0.5, colsample_bytree =0.5, nrounds = 1, objective = "binary:logistic") #Boosting - 3 rounds bst <- xgboost(data = train$data, label = train$label, max_depth = 4, nrounds = 3, objective = "binary:logistic")

RF1000: - max_depth = 4, num_parallel_tree = 1000, subsample = 0.5, colsample_bytree =0.5, nrounds = 1, nthread = 2

XG: nrounds = 10, max_depth = 4, eta = 0.5, nthread = 2

Gradient Boosting Classifier

CatBoost

- best example so far, create m bags, put n’<n samples (60% of n) in each bag - with replacement which means that the same sample can be selected twice or more, query from the test (x) each of the m models, calculate mean, this is the classification.

Mastery on using : Gradient Boosting with Scikit-Learn, XGBoost, LightGBM, and CatBoost

XGBoost is an optimized distributed gradient boosting system designed to be highly efficient, flexible and portable

, short and makes sense, with info about the parameters.

- mostly practical in jupyter but with some insight about the theory.

y - slides, video, lots of info.

, then XGBOOST in weka through R

for weka mlr class.xgboost.

:

tutorial

(great)

review on voting, bagging, boosting stacking, cascading methodologies
How to combine several sklearn algorithms into a voting ensemble
Stacking api, MLXTEND
stacking neural nets - really good
How to Combine Predictions for Ensemble Learning
Essence of Stacking Ensembles for Machine Learning
Dynamic Ensemble Selection (DES) for Classification in Python
A Gentle Introduction to Mixture of Experts Ensembles
Strong Learners vs. Weak Learners in Ensemble Learning
Vidhya on trees, bagging boosting, gbm, xgb
Parallel grad boost treest
A comprehensive guide to ensembles read!
Kaggler guide to stacking
Blending vs stacking
Kaggle ensemble guide
Ensembles in WEKA
Bagging
all the boosting algorithms
What is XGBOOST?
#2nd link
Does it cause overfitting?
Authors Youtube lecture.
GIT here
How to use XGB tutorial on medium (comparison to GBC)
How to code tutorial
Beautiful Video Class about XGBOOST
Machine learning master
R Installation in Weka
Parameters
https://cran.r-project.org/web/packages/xgboost/xgboost.pdf
Special case of random forest using XGBOOST
Loss functions and GBC vs XGB
Why is XGB faster than SK GBC
Good XGB vs GBC
XGB vs GBC
what is so special?
the fastest algo
a new game in ML
use it here is why