📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • PROBABILISTIC ALGORITHMS
  • REGRESSION ALGORITHMS

Was this helpful?

  1. Machine Learning

Probabilistic, Regression

PreviousMeta Learning (AutoML)NextData Mining

Last updated 1 year ago

Was this helpful?

PROBABILISTIC ALGORITHMS

NAIVE BAYES

BAYES, BAYESIAN BELIEF NETWORKS

  1. - a very good blog post

  2. A that shows how to build the network’s tables

  3. A regarding BBS

  4. (log likelihood) - proofs for bernoulli, normal, poisson.

MARKOV MODELS

Random vs Stochastic ( and ):

  • A variable is 'random'.

  • A process is 'stochastic'.

Apart from this difference the two words are synonyms

In other words:

  • A random vector is a generalization of a single random variables to many.

  • A stochastic process is a sequence of random variables, or a sequence of random vectors (and then you have a vector-stochastic process).

  • It provides a way to model the dependencies of current information (e.g. weather) with previous information.

  • It is composed of states, transition scheme between states, and emission of outputs (discrete or continuous).

  • Several goals can be accomplished by using Markov models:

    • Learn statistics of sequential data.

    • Do prediction or estimation.

    • Recognize patterns.

  • to be effective the current state has to be dependent on the previous state in some way

  • if it looks cloudy outside, the next state we expect is rain.

  • If the rain starts to subside into cloudiness, the next state will most likely be sunny.

  • Not every process has the Markov Property, such as the Lottery, this weeks winning numbers have no dependence to the previous weeks winning numbers.

  1. They show how to build an order 1 markov table of probabilities, predicting the next state given the current.

  2. Then it shows the state diagram built from this table.

  3. Then how to build a transition matrix from the 3 states, i.e., from the probabilities in the table

  4. Then how to calculate the next state using the “current state vector” doing vec*matrix multiplications.

  5. Then it talks about the setting always into the rain prediction, and the solution is using two last states in a bigger table of order 2. He is not really telling us why the probabilities don't change if we add more states, it stays the same as in order 1, just repeating.

MARKOV MODELS / HIDDEN MARKOV MODEL

HMM tutorials

  1. HMM tutorial

  2. Medium

HMM variants

    1. General mixture models

    2. Hmm

    3. Basyes classifiers and naive bayes

    4. Markov changes

    5. Bayesian networks

    6. Markov networks

    7. Factor graphs

It breaks down the formula to:

  • transition probability formula - the probability of going from Zk to Zk+1

  • emission probability formula - the probability of going from Zk to Xk

  • (Pi) Initial distribution - the probability of Z1=i for i=1..m

This is the iconic image of a Hidden Markov Model. There is some state (x) that changes with time (markov). And you want to estimate or track it. Unfortunately, you cannot directly observe this state (hidden). That's the hidden part. But, you can observe something correlated with the state (y).

OBSERVED DATA -> INFER -> what you CANT OBSERVE (HIDDEN).

Considering this model:

  • where P(X0) is the initial state for happy or sad

  • Where P(Xt | X t-1) is the transition model from time-1 to time

  • Where P(Yt | Xt) is the observation model for happy and sad (X) in 4 situations (w, sad, crying, facebook)

INPUT OUTPUT HMM (IOHMM)

CONDITIONAL RANDOM FIELDS (CRF)

REGRESSION ALGORITHMS

  1. Sk-lego monotonic

  1. Linear regression TBC

  2. SVR - regression based svm, with kernel only.

  • Assumes binary outcome

  • Assumes no outliers

  • Assumes no intercorrelations among predictors (inputs?)

Regression Measurements:

    1. Too many variables

    2. Overfitting

    3. Time series - seasonality trends can cause this

KERNEL REGRESSION

  • variance, sigma^2. Informally, this parameter will control the smoothness of your approximated function.

  • Smaller values of sigma will cause the function to overfit the data points, while larger values will cause it to underfit

  • There is a proposed method to find sigma in the post!

  • Gaussian Kernel Regression is equivalent to creating an RBF Network with the following properties: - described in the post

DIMENSIONALITY REDUCTION

PRINCIPAL COMPONENT REGRESSION (PCR) / PARTIAL LEAST SQUARES (PLS)

One can describe Principal Components Regression as an approach for deriving a low-dimensional set of features from a large set of variables. The first principal component direction of the data is along which the observations vary the most. In other words, the first PC is a line that fits as close as possible to the data. One can fit p distinct principal components. The second PC is a linear combination of the variables that is uncorrelated with the first PC, and has the largest variance subject to this constraint. The idea is that the principal components capture the most variance in the data using linear combinations of the data in subsequently orthogonal directions. In this way, we can also combine the effects of correlated variables to get more information out of the available data, whereas in regular least squares we would have to discard one of the correlated variables.

The PCR method that we described above involves identifying linear combinations of X that best represent the predictors. These combinations (directions) are identified in an unsupervised way, since the response Y is not used to help determine the principal component directions. That is, the response Y does not supervise the identification of the principal components, thus there is no guarantee that the directions that best explain the predictors also are the best for predicting the response (even though that is often assumed). Partial least squares (PLS) are a supervised alternative to PCR. Like PCR, PLS is a dimension reduction method, which first identifies a new smaller set of features that are linear combinations of the original features, then fits a linear model via least squares to the new M features. Yet, unlike PCR, PLS makes use of the response variable in order to identify the new features.

( A Markov Model is a stochastic(random) model which models temporal or sequential data, i.e., data that are ordered.

() Markov Chains is a probabilistic process, that relies on the current state to predict the next state.

Part , , ,

/ MM

(sklearn, still being developed)

(this is good)

,

(old)

HMM ( - the idea is that there are things that you CAN OBSERVE and there are things that you CAN'T OBSERVE. From the things you OBSERVE you want to INFER the things you CAN'T OBSERVE (HIDDEN). I.e., you play against someone else in a game, you don't see their choice of action, but you see the result.

Python , previously part of

Python - supervised multinomial HMM

This youtube video - explains about the hidden markov model. It shows the visual representation of the model and how we go from that the formula:

In of the video:

*

Probably the simplest explanation of Markov Models and HMM as a “game” -

This explains that building blocks of the needed knowledge in HMM, starting probabilities P0, transitions and emissions (state probabilities)

This , explains HMM and ties our understanding.

:

for unsupervised / semi-supervised / supervised IOHMM - training is there, prediction is missing.

Neural network CRF

wrapper

to fit with intervals a linear regressor on top of non linear data

- lightning is a library for large-scale linear classification, regression and ranking in Python.

CART - , basically the diff between classification and regression trees - instead of IG we use sum squared error

- regression based NN, one output node

- Logistic regression - is used as a classification algo to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Output is BINARY. I.e., If the likelihood of killing the bug is > 0.5 it is assumed dead, if it is < 0.5 it is assumed alive.

R^2 -

does–it takes a weighted average of the surrounding points

- basically PCA and linear regression , however PLS makes use of the response variable in order to identify the new features.

Vidhya on NB
Baysian tree
NB, GNB, multi nominal NB
Mastery on bayes theorem
Introduction To BBS
complementing SLIDE presentation
very nice presentation
Maximum Likelihood
Another example
here
here
What is a Markov Model?)
sunny cloudy explanation
1
2
3
4
Intro to HMM
Paper like example
HMM with sklearn and networkx
Stack exchange on hmm
HMM LEARN
Pomegranate
GHMM with python wrappers
Hmms
what is? And why HIDDEN?)
code
sklearn
seqLearn
part2
HMM in weka, with github, working on 7.3, not on 9.1
link
video
post
A cute explanation on quora
Incomplete python code
Machine learning - a probabilistic approach, david barber.
Make sense intro to CRF, comparison against HMM
HMM, CRF, MEMM
Another crf article
NNCRF
Another one
scikit-learn inspired API for CRFsuite
Sklearn wrapper
Python crfsuite
Pycrf suite vidahya
Sk-lego
classification and regression tree
NNR
LOGREG
several reasons it can be too high.
RMSE vs MAE
Gaussian Kernel Regression
Principal component regression (PCR) Partial least squares and (PLS)
part1
Lightning